Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use "use as" to avoid BgpPeer name collision with maghemite #5894

Closed
wants to merge 11 commits into from

Conversation

Nieuwejaar
Copy link
Contributor

No description provided.

@Nieuwejaar Nieuwejaar marked this pull request as draft June 13, 2024 17:02
@Nieuwejaar
Copy link
Contributor Author

I tested this on madrid by

  • installing an omicron version from several days ago
  • configuring it with BGP peers on both switches
  • upgrading to this release
  • verified that we successfully fell back from a V2 bootstore to a V1 bootstore, and correctly configured uplinkd in the switch zone.

I added extra logging in sled-agent to verify that the upgrade happened as expected:

20:58:43.766Z WARN SledAgent: Failed to deserialize EarlyNetworkConfig as v2, trying next as v1: invalid type: string "172.20.15.38/29", expected struct UplinkAddressConfig at line 1 column 329                                         
    file = sled-agent/src/bootstrap/early_networking.rs:814                                                          
    sled_id = 2cb922cf-0ddd-4451-a8b8-f23680667ce6                                                                   
20:58:43.768Z INFO SledAgent: upgrading from v1: EarlyNetworkConfigV1 {                                              
        generation: 1,                                                                                               
        schema_version: 1,                                                                                           
        body: EarlyNetworkConfigBodyV1 {                                                                             
            ntp_servers: [                                                                                           
                "ntp.eng.oxide.computer",                                                                            
            ],                                                                                                       
            rack_network_config: Some(                                                                               
                RackNetworkConfigV1 {                                                                                
                    rack_subnet: Ipv6Net {                                                                           
                        addr: fd00:1122:3344:100::,                                                                  
                        width: 56,                                                                                   
                    },
                    infra_ip_first: 172.20.15.37,
                    infra_ip_last: 172.20.15.38,
                    ports: [
                        PortConfigV1 {
                            routes: [
                                RouteConfig {
                                    destination: V4(
                                        Ipv4Net {
                                            addr: 0.0.0.0, 
                                            width: 0,
                                        },
                                    ),
                                    nexthop: 172.20.15.33, 
                                    vlan_id: None,
                                },
                            ],
                            addresses: [
                                V4(
                                    Ipv4Net {
                                        addr: 172.20.15.38,
                                        width: 29,
                                    },
                                ),
                            ],
                            switch: Switch0,
                            port: "qsfp0",
                            uplink_port_speed: Speed40G,
                            uplink_port_fec: None,
                            bgp_peers: [
                                BgpPeerConfig {
                                    asn: 65002,
                                    port: "qsfp18",
                                    addr: 172.20.15.51,
                                    hold_time: None,
                                    idle_hold_time: None,
                                    delay_open: None,
                                    connect_retry: None,
                                    keepalive: None,
                                    remote_asn: None,
                                    min_ttl: None,
                                    md5_auth_key: None,
                                    multi_exit_discriminator: None,
                                    communities: [],
                                    local_pref: None,
                                    enforce_first_as: false,
                                    allowed_import: NoFiltering,
                                    allowed_export: NoFiltering,
                                    vlan_id: None,
                                },
                            ],
                            autoneg: false,
                        },
[...]
20:58:43.768Z INFO SledAgent: to v2: EarlyNetworkConfig {                                                            
        generation: 1,                                    
        schema_version: 2,                                
        body: EarlyNetworkConfigBody {
            ntp_servers: [                                
                "ntp.eng.oxide.computer",
            ],                                            
            rack_network_config: Some(
                RackNetworkConfigV2 {        
                    rack_subnet: Ipv6Net {          
                        addr: fd00:1122:3344:100::,
                        width: 56,                                                                                   
                    },                                    
                    infra_ip_first: 172.20.15.37,
                    infra_ip_last: 172.20.15.38,
                    ports: [                                                                                         
                        PortConfigV2 {            
                            routes: [
                                RouteConfig {
                                    destination: V4(
                                        Ipv4Net {
                                            addr: 0.0.0.0, 
                                            width: 0,                                                                
                                        },        
                                    ),
                                    nexthop: 172.20.15.33, 
                                    vlan_id: None,
                                },          
                            ],            
                            addresses: [                
                                UplinkAddressConfig {
                                    address: V4(
                                        Ipv4Net {
                                            addr: 172.20.15.38,
                                            width: 29,
                                        },             
                                    ),              
                                    vlan_id: None,       
                                },                   
                            ],                          
                            switch: Switch0,        
                            port: "qsfp0",           
                            uplink_port_speed: Speed40G,
                            uplink_port_fec: None,     
                            bgp_peers: [                                                                             
                                BgpPeerConfig {     
                                    asn: 65002,      
                                    port: "qsfp18",                                                                  
                                    addr: 172.20.15.51,                                                              
                                    hold_time: None,                                                                 
                                    idle_hold_time: None,
                                    delay_open: None,
                                    connect_retry: None,
                                    keepalive: None,
                                    remote_asn: None,
                                    min_ttl: None,
                                    md5_auth_key: None,
                                    multi_exit_discriminator: None,
                                    communities: [],
                                    local_pref: None,
                                    enforce_first_as: false,
                                    allowed_import: NoFiltering,
                                    allowed_export: NoFiltering,
                                    vlan_id: None,
                                },
                            ],
                            autoneg: false,
                        }

I also verified the bgp state in the switch zone.

With the old, v1 configuration:

root@oxz_switch1:~# mgadm bgp config router list          
[                                                         
    Router {                                              
        asn: 65103,                                                                                                  
        graceful_shutdown: false,                                                                                    
        id: 65103,                                        
        listen: "[::]:179",                               
    },                                                    
]   
root@oxz_switch1:~# mgadm bgp config ne list 65103                                                                   
[                                                                                                                    
    Neighbor {                                            
        allow_export: NoFiltering,
        allow_import: NoFiltering,
        asn: 65103,
        communities: [],
        connect_retry: 3,
        delay_open: 0,
        enforce_first_as: false,
        group: "qsfp0",
        hold_time: 6,
        host: "172.20.15.35:179",
        idle_hold_time: 3,
        keepalive: 2,
        local_pref: None,
        md5_auth_key: None,
        min_ttl: None,
        multi_exit_discriminator: None,
        name: "172.20.15.35", 
        passive: false,
        remote_asn: None,
        resolution: 100,
        vlan_id: None,
    },
]
root@oxz_switch1:~# mgadm bgp status ne 65103 
Peer Address  Peer ASN     State        State Duration  Hold   Keepalive
172.20.15.35  Some(64601)  Established  9m 21s 368ms    6s/6s  2s/2s
root@oxz_switch1:~# mgadm bgp status imported 65103
Static Routes
=============
Prefix     Nexthop       Local Pref
0.0.0.0/0  172.20.15.33  None
BGP Routes
=============
Prefix     Nexthop       Local Pref  Origin AS  Peer ID     MED   AS Path  Stale
0.0.0.0/0  172.20.15.35  None        64601      172.20.0.3  None  [64601]  None
root@oxz_switch1:~# 

With the new code, after the upgrade from V1 to V2:

root@oxz_switch1:~# mgadm bgp config router list
[
    Router {
        asn: 65103,
        graceful_shutdown: false,
        id: 65103,
        listen: "[::]:179",
    },
]
root@oxz_switch1:~# mgadm bgp config ne list 65103
[
    Neighbor {
        allow_export: NoFiltering,
        allow_import: NoFiltering,
        asn: 65103,
        communities: [],
        connect_retry: 3,
        delay_open: 0,
        enforce_first_as: false,
        group: "qsfp0",
        hold_time: 6,
        host: "172.20.15.35:179",
        idle_hold_time: 3,
        keepalive: 2,
        local_pref: None,
        md5_auth_key: None,
        min_ttl: None,
        multi_exit_discriminator: None,
        name: "172.20.15.35",
        passive: false,
        remote_asn: None,
        resolution: 100,
        vlan_id: None,
    },
]
root@oxz_switch1:~# mgadm bgp status ne 65103 
Peer Address  Peer ASN     State        State Duration  Hold   Keepalive
172.20.15.35  Some(64601)  Established  6m 33s 615ms    6s/6s  2s/2s
root@oxz_switch1:~# swadm build-info
Version: 0.2.0
Commit SHA: 861c00bacbdf7a6e22471f0dabd8f926409b5292
Commit timestamp: 2024-06-11T18:04:22.000000000Z
Git branch: main
SDE commit SHA: d9b58d421acb4b50fbee8c9c3b0e34098d7a33ef
Rustc version: 1.78.0
Rustc channel: stable
Rustc triple: x86_64-unknown-illumos
Rustc commit SHA: 9b00956e56009bab2aa15d7bff10916599e3d6d6
Cargo triple: x86_64-unknown-illumos
Debug: false
Opt level: 3

@Nieuwejaar Nieuwejaar marked this pull request as ready for review June 14, 2024 17:27
jgallagher and others added 8 commits June 15, 2024 15:58
This builds on #5788 to add support to the planner to add new CRDB zones
to blueprints (if the current count is below the policy's target count).
No changes were needed on the execution side, and the CRDB zones already
bring themselves up as part of the existing cluster by looking up other
node names in DNS.

A big chunk of the diff comes from expectorate output - our simulated
test system was producing sleds that all had the same physical disk and
zpool UUIDs, which messed with the test I wrote for the builder's zpool
allocation. I tweaked the `TypedUuidRng` that the sled uses to include
the `sled_id` (which itself comes from a "parent" `TypedUuidRng`) as the
second seed argument. If that seems unreasonable, I am very open to
other fixes!
- Fixes #5892
- Modifies the application of the external services IP allowlist so that
it's only relevant for Nexus API servers, rather than all
external-facing services (DNS being the other example today). It is not
always possible to know the peer addresses for DNS servers in the case
of recursive DNS, and so the allowlist cannot directly apply to external
DNS. This works by inserting the allowlist entries as a host-filter,
which we were doing before, but only on the named VPC Firewall rule for
the Nexus VPC Subnet.
- Fixes #5832
- Uses timeseries fields in error message rather than filter identifiers
@Nieuwejaar Nieuwejaar closed this Jun 15, 2024
@Nieuwejaar Nieuwejaar deleted the bgp_upgrade branch June 15, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants