Event-Driven Network Automation with NetBox Branching

Context

This lab presents an implementation of event-driven network automation, using NetBox as the source of truth and the NETCONF protocol to apply configurations to devices.

The operator declares the desired network state in NetBox (interfaces, IPs, descriptions), and the system is responsible for converging devices to that state, either in response to a branch merge (event-triggered) or through periodic reconciliation (closed-loop). The lab covers two platforms - Cisco IOS-XR and Huawei VRP - both using OpenConfig models.

The contribution of this work is modest in scope, but it addresses a specific aspect with little detailed documentation, especially in Portuguese: using the NetBox branching plugin to optimize reconciliation scope.

The complete lab walkthrough, including installation instructions and exercises, is available in the project repository.

What this lab demonstrates

The implemented flow demonstrates an optimization applicable to closed-loop architectures: using the branching plugin to identify which devices need to be checked, avoiding full reconciliation of the entire infrastructure on every event. In addition, an optional periodic full-reconciliation loop detects and corrects configuration drift, operating in two modes: alert_only (report only) or auto_fix (automatic correction).

flowchart TB
    subgraph branch["Branch: update-dc1-network"]
        direction LR
        c1["Change interface R1"]
        c2["Change interface R2"]
    end

    branch -->|"merge"| main[(Main)]
    main -->|"webhook"| wh["Webhook Listener"]

    wh -->|"GET /changes/"| api["Branching API"]
    api -->|"modified objects"| wh

    wh --> dispatch["Per-device dispatch"]

    dispatch --> d1["R1: reconcile"]
    dispatch --> d2["R2: reconcile"]
    dispatch --> d3["R3: skip"]
    dispatch --> d4["R4: skip"]


    style d1 fill:#4a4,stroke:#6c6
    style d2 fill:#4a4,stroke:#6c6
    style d3 fill:#444,stroke:#667
    style d4 fill:#444,stroke:#667

The flow can be summarized in three steps:

Detection: Waits for branch merge events from the branching plugin
Scoping: Queries the plugin API to identify which devices were affected by the change
Reconciliation: For each affected device, executes level-based reconciliation:
Reads the complete desired state from NetBox
Reads the device's current state
Compares both states and applies the difference

Further Resources

This lab is not intended to be a complete reference on network automation. For a broader view of the topic, we recommend the following resources:

GTER54 - Do GIT ao Router (NIC.br) - Presentation covering the automation flow from version control to deployment on network devices.
Event-Driven Network Automation na Pratica (NIC.br) - Practical demonstration of event-driven automation in network environments.
Event-Driven Network Automation with NetBox and Ansible (NetBox Labs) - Article presenting an implementation that integrates NetBox, webhooks, and Ansible Automation Platform for event-driven automation.
Closed-Loop Network Automation - Zero to Hero workshop (NetBox Labs) - Lab where a fully functional closed-loop network automation stack is implemented, including observability and network-discovery feedback loops.

Configuration Architectures: Loops vs. Events

The closed-loop model

Closed-loop configuration architectures operate through periodic reconciliation: a process continuously checks whether the current state of devices matches the desired state (defined in the source of truth) and applies corrections when divergences are detected.

flowchart LR
    SoT[(Source of Truth)]

    subgraph Orch["Orchestrator"]
        direction TB
        A["Read desired"] --> B["Read current"]
        B --> C{"Diff?"}
        C -->|Yes| D["Apply"]
        C -->|No| E["OK"]
        E -.->|"Loop"| A
    end

    Dev[("Devices")]

    SoT -.->|desired state| A
    B <-.->|get-config| Dev
    D -.->|edit-config| Dev

This model has clear advantages: - Resilience: detects and corrects configuration drift regardless of cause - Eventual consistency: guarantees convergence even after temporary failures - Conceptual simplicity: the loop is self-contained and does not depend on external events

However, there are associated costs: - Resource consumption: each verification cycle consumes CPU, memory, and network bandwidth on the orchestrator - Convergence latency: changes are only applied in the next verification cycle - Frequency vs. cost trade-off: more frequent cycles reduce latency but increase overhead; less frequent cycles reduce overhead but increase time to convergence

Combining events and reconciliation: an established practice

Combining event-driven automation with periodic reconciliation is a well-established architectural pattern in large-scale distributed systems.

The Kubernetes model

The most prominent example is Kubernetes itself. Kubernetes controllers are designed to be level-based, not edge-based - an important distinction documented in the controller-runtime source code:

The Kubebuilder documentation explains that the level-based architecture was chosen to enable self-healing and periodic reconciliation. Unlike an edge-based system (which would react to each individual event), the level-based model allows events to be aggregated and intermediate or stale values to be ignored, working directly from the current desired state.

Why combine both approaches

The combination is not arbitrary. Each approach covers weaknesses of the other:

Aspect	Events only	Reconciliation only	Combined
Latency for intentional changes	Low	Depends on interval	Low
External drift detection	Does not detect	Detects	Detects
Resilience to event failures	Low	N/A	High
Resource consumption	Low	Proportional to frequency	Optimized

Periodic reconciliation guarantees convergence even when events are lost, duplicated, or arrive out of order. Events allow immediate response without requiring frequent verification cycles.

Complexities of the hybrid approach

This architecture introduces its own challenges that must be considered:

Mandatory idempotency: the apply logic must produce the same result regardless of whether it is triggered by an event or by reconciliation
Consistency: during windows between event and reconciliation, there may be temporary divergence between source of truth and real state

Applying this to the network context

Why optimize reconciliation scope?

Performing full reconciliation of all devices on every change can be prohibitive. At the same time, adopting a purely edge-based approach (applying only the event diff) would sacrifice robustness.

The solution adopted in this lab is to use the branching diff to reduce scope without abandoning the level-based model:

The event identifies which devices to verify (optimization)
Verification of each device is level-based (robustness)

The role of the branching plugin

The NetBox branching plugin provides features that make this optimization easier:

Change grouping: multiple changes in a branch result in a single merge event
Affected-object identification: the API allows querying which objects were modified in the branch
Traceability: each change is associated with an identifiable branch

In this lab, we use the branching API to:

Capture the merge event via webhook
Query which objects (and therefore which devices) were affected
Trigger level-based reconciliation only for relevant devices

Managed fields

The automation supports the following interface fields on both platforms:

NetBox field	Device effect
`enabled`	Administrative state (up/shutdown)
`description`	Interface description
IP Address + prefix	Interface IPv4 address

Each vendor has an adapter that translates desired state into NETCONF payloads using OpenConfig models (openconfig-interfaces, openconfig-if-ip).

Scope and limitations

This is an educational lab, not a production-ready solution. The code prioritizes clarity over robustness, deliberately omitting aspects that would be required in a real environment:

Resilience: no retry with backoff and no handling of out-of-order events
High availability: the webhook listener is a single point of failure
Periodic reconciliation: available as an option, but still without a distributed queue, retry with backoff, and HA coordination
Field coverage: only a subset of interface fields is mapped to NETCONF operations