Do you really need to make any special configurations when you implement VxLAN EVPN and you need to configure DHCP Relay? All Cisco's guides say you do..but as I found out recently it's not exactly like that. I'm going to address several scenarios here so we'll find out where we need special tweaks for DHCP Relay to work and where we don't. All tested on a real N9Ks + NX-OSv.
When Default Config Is Fine
Every Cisco's guide on this topic starts with saying that he challenge initially is that every Leaf has the same anycast gateway configured, so we can't really use this address as source address for relay and we need unique address (such as loopback). In fact, there's nothing stopping from using non-unique address as source. Personally I think there's still more convenient to use unique address wherever it's possible and it's certainly how I recommend to have it done - for several reasons like troubleshooting and security, but still, there is a possibility to make it work another way. Also keep in mind using unique address is the only recommended validated design by Cisco. Still, this configuration works as well, so if you can't use unique addresses for some reason (like you're in the middle of migration to your new fancy VxLAN EVPN fabric and you suddenly realize all your DHCP servers are running Windows Server 2012 (which will require a several kilometers (depending of your fabric size) of a duct tape to work with new DHCP options for EVPN which are a must if you use unique loopback addresses) this might be a way to go.
So, in this part we're going to consider scenario where you are fine with using non-unique Anycast Gateway address as source address.
How will response packet be routed back to the correct Leaf? Let's consider these possible scenarios:
1. DHCP Server is external network located somewhere behind BGW. BGW is a L3 only, so there's no L2 VNIs configured on it, only L3 VNIs for Routing. We assume client and server are within the same VRF.
How BGW will route DCHP replies (sent with destination of 100.0.0.254) back to Leaf 1? Well, it doesn't have a /32 route for it (SVIs with anycast gateway doesn't produce /32 routes for themselves). It only has a /24 Type 5 route (which points to one or more leafs depending of your design).
So the traffic will be handled according to well-known mechanism BGW uses to forward unknown unicast traffic (traffic for the hosts it doesn't have specific /32 route for): it will forward packet to one of the Leafs using routing VNI. Leaf will receive this packet, decapsulate it, see that destination IP is its own Anycast Gateway address and will send it to all other Leafs, this time using L2 VNI in the same fashion it does when it sends BUM traffic (using Multicast/Ingress Replication). It will also change source address to Anycast Gateway Address (100.0.0.254) and destination to 255.255.255.255.
This way packet will eventually get to the right Leaf.
2. DHCP Server is external network located somewhere behind BorderLeaf. Border Leaf functions as both Border Gateway and Leaf device, so it has client's VLAN and VNI configured on it. We again assume client and server are within the same VRF.
The scenario is pretty much like the second part of the previous one, but this time BorderLeaf, acting as a Leaf device, receives traffic which is destined to its own Anycast Gateway address not from another VTEP, but from host itself, which seems to be not a well-known scenario (at least I failed to find any mentions of it and what Leaf device is supposed to do with such a packet). However, both real hardware and NX-OSv act the same way they did in the second part of first scenario: BorderLeaf changes source and ip addresses and forwards traffic to all the other Leafs via Multicast/IR.
In both of these scenarios there was no need for anything apart the usual DHCP Relay configurations:
interface Vlan100
no shutdown
vrf member CLIENTS
ip address 100.0.0.254/24
fabric forwarding mode anycast-gateway
ip dhcp relay address 192.168.42.253
No "source-interfaces", no additional DHCP options, nothing. Works just fine with any DHCP server. Let's move to scenarios when basic config is just not enough.
When Special Tweaks Are Required
There are several scenarios where you'll either need to tweak DCHP settings on your Nexus or you'll have to tweak your server.
1. You need to use individual loopbacks for each Leaf (to distinguish where your request comes from or some other reason).
How will response packet be routed back to the correct Leaf? Let's consider these possible scenarios:
1. DHCP Server is external network located somewhere behind BGW. BGW is a L3 only, so there's no L2 VNIs configured on it, only L3 VNIs for Routing. We assume client and server are within the same VRF.
How BGW will route DCHP replies (sent with destination of 100.0.0.254) back to Leaf 1? Well, it doesn't have a /32 route for it (SVIs with anycast gateway doesn't produce /32 routes for themselves). It only has a /24 Type 5 route (which points to one or more leafs depending of your design).
So the traffic will be handled according to well-known mechanism BGW uses to forward unknown unicast traffic (traffic for the hosts it doesn't have specific /32 route for): it will forward packet to one of the Leafs using routing VNI. Leaf will receive this packet, decapsulate it, see that destination IP is its own Anycast Gateway address and will send it to all other Leafs, this time using L2 VNI in the same fashion it does when it sends BUM traffic (using Multicast/Ingress Replication). It will also change source address to Anycast Gateway Address (100.0.0.254) and destination to 255.255.255.255.
This way packet will eventually get to the right Leaf.
2. DHCP Server is external network located somewhere behind BorderLeaf. Border Leaf functions as both Border Gateway and Leaf device, so it has client's VLAN and VNI configured on it. We again assume client and server are within the same VRF.
The scenario is pretty much like the second part of the previous one, but this time BorderLeaf, acting as a Leaf device, receives traffic which is destined to its own Anycast Gateway address not from another VTEP, but from host itself, which seems to be not a well-known scenario (at least I failed to find any mentions of it and what Leaf device is supposed to do with such a packet). However, both real hardware and NX-OSv act the same way they did in the second part of first scenario: BorderLeaf changes source and ip addresses and forwards traffic to all the other Leafs via Multicast/IR.
In both of these scenarios there was no need for anything apart the usual DHCP Relay configurations:
interface Vlan100
no shutdown
vrf member CLIENTS
ip address 100.0.0.254/24
fabric forwarding mode anycast-gateway
ip dhcp relay address 192.168.42.253
No "source-interfaces", no additional DHCP options, nothing. Works just fine with any DHCP server. Let's move to scenarios when basic config is just not enough.
When Special Tweaks Are Required
There are several scenarios where you'll either need to tweak DCHP settings on your Nexus or you'll have to tweak your server.
1. You need to use individual loopbacks for each Leaf (to distinguish where your request comes from or some other reason).
2. Your client is in one VRF, and your DHCP server is in another one (and you want to set things up without having to configure any kind of route-leaking).
OK, so what's the problem here? Let's take a look at DHCP Relay packet.
Note "Relay agent IP Address" field, also known as giaddr. By default, IP address of our SVI interface is inserted here. And guess what, this is also the field that DHCP server uses to decide which pool should it use to allocate the address from.
Now you see the problem. If we replace this address with address of loopback interface, how DHCP server is supposed to understand which pool should it use to allocate the address? Here's where new cool DHCP options appear.
These cool new options are not supported in Windows Server 2012.
So, what are these options? There are three of them, all of them are sub-optons of Option 82, but the first we are interested in right now is Sub-Option 5. I'll just quote Cisco here:
Another new suboption which is turned on automatically together with the previous one is Sub-option 11(0xb) - Server ID Override. What does it do?
Originally Server ID is a field in messages sent by server which lists the address of server, so client can communicate directly with the server next time it needs to renew or release the address. Here's this ID in DHCP Offer sent by DHCP server 192.168.42.254
What our new Sub-option 11 does? It asks DHCP server to change DHCP Server identifier field to our Leaf's SVI IP address, so the client will never communicate to DHCP server directly - only via the DHCP Relay. Here's this suboption in Discover message relayed by Leaf:
And server's Offer with changed to Leaf's SVI:
Why do we need client to only communicate with the server via the Relay? Probably because now we have these new options which Relay should insert for all this stuff to work. So it's option that's required so other options could work. I guess.
Sub-option 151(0x97) - Virtual Subnet Selection carries information about VRF (well, there's actually 26 pages long RFC devoted to this sub-option so there might be a little bit more possibilities than just that, but in our case that's just it). Is useful in case your server in one VRF and your client is in another. If you use "source-interface' command, you'll have to enable this option even if your client and source-interface are in the same VRF or relay won't even work).
These options are enabled with two commands:
ip dhcp relay information option
ip dhcp relay information option vpn
And a single additional command to specify source interface on our SVI:
interface Vlan100
no shutdown
vrf member CLIENTS
ip address 100.0.0.254/24
fabric forwarding mode anycast-gateway
ip dhcp relay address 192.168.42.254
ip dhcp relay source-interface loopback1
And if your server is indeed in the other VRF, specify it:
OK, so what's the problem here? Let's take a look at DHCP Relay packet.
Note "Relay agent IP Address" field, also known as giaddr. By default, IP address of our SVI interface is inserted here. And guess what, this is also the field that DHCP server uses to decide which pool should it use to allocate the address from.
Now you see the problem. If we replace this address with address of loopback interface, how DHCP server is supposed to understand which pool should it use to allocate the address? Here's where new cool DHCP options appear.
These cool new options are not supported in Windows Server 2012.
So, what are these options? There are three of them, all of them are sub-optons of Option 82, but the first we are interested in right now is Sub-Option 5. I'll just quote Cisco here:
Sub-option 5(0x5) - Link SelectionBasically the replacement for giaddr. Here's what it looks like for scheme I used above (with source-interface of 222.222.222.222 configured and client's network 100.0.0.0/24):
(Defined in RFC#3527.)
The link selection sub-option provides a mechanism to separate the subnet/link on which the DHCP client resides from the gateway address (giaddr), which can be used to communicate with the relay agent by the DHCP server. The relay agent will set the sub-option to the correct subscriber subnet and the DHCP server will use that value to assign an IP address rather than the giaddr value. The relay agent will set the giaddr to its own IP address so that DHCP messages are able to be forwarded over the network. For this function, Cisco’s proprietary implementation is sub-option 150(0x96). You can use the ip dhcp relay sub-option type cisco command to manage the function.
Another new suboption which is turned on automatically together with the previous one is Sub-option 11(0xb) - Server ID Override. What does it do?
Originally Server ID is a field in messages sent by server which lists the address of server, so client can communicate directly with the server next time it needs to renew or release the address. Here's this ID in DHCP Offer sent by DHCP server 192.168.42.254
What our new Sub-option 11 does? It asks DHCP server to change DHCP Server identifier field to our Leaf's SVI IP address, so the client will never communicate to DHCP server directly - only via the DHCP Relay. Here's this suboption in Discover message relayed by Leaf:
And server's Offer with changed to Leaf's SVI:
Why do we need client to only communicate with the server via the Relay? Probably because now we have these new options which Relay should insert for all this stuff to work. So it's option that's required so other options could work. I guess.
Sub-option 151(0x97) - Virtual Subnet Selection carries information about VRF (well, there's actually 26 pages long RFC devoted to this sub-option so there might be a little bit more possibilities than just that, but in our case that's just it). Is useful in case your server in one VRF and your client is in another. If you use "source-interface' command, you'll have to enable this option even if your client and source-interface are in the same VRF or relay won't even work).
These options are enabled with two commands:
ip dhcp relay information option
ip dhcp relay information option vpn
And a single additional command to specify source interface on our SVI:
interface Vlan100
no shutdown
vrf member CLIENTS
ip address 100.0.0.254/24
fabric forwarding mode anycast-gateway
ip dhcp relay address 192.168.42.254
ip dhcp relay source-interface loopback1
And if your server is indeed in the other VRF, specify it:
ip dhcp relay address 192.168.42.254 use-vrf SERVERS
When it comes to configuring the server, it's pretty simple (except for the case you have Windows Server 2012, see above). On Windows Server 2016 you'll have to create additional dummy pool with loopback interfaces with all addresses excluded - a security tweak without which addresses won't be allocated.
Hi Daria,
ReplyDeleteGreat document. Could you reference the Server 2016 requirement for a dummy pool?
Hey! Sure: https://docs.microsoft.com/ru-ru/windows-server/networking/technologies/dhcp/dhcp-subnet-options
Delete"All relay agent IP addresses (GIADDR) must be part of an active DHCP scope IP address range. Any GIADDR outside of the DHCP scope IP address ranges is considered a rogue relay and Windows DHCP Server will not acknowledge DHCP client requests from those relay agents.
A special scope can be created to "authorize" relay agents. Create a scope with the GIADDR (or multiple if the GIADDR's are sequential IP addresses), exclude the GIADDR address(es) from distribution, and then activate the scope. This will authorize the relay agents while preventing the GIADDR addresses from being assigned."
Great. Is there any document available to make it work when both DHCP server and client doesn't belong to same vrf and no changes are possible on DHCP server.
ReplyDeleteIf you already have Windows Server 2016 configured, you won't need to do additional reconfigurations on server side, just use "ip dhcp relay address xxx use-vrf (server VRF)" in SVI configuration.
DeleteIf you have Windows Server 2012 I guess the only solution is to set up some kind of route leaking.
Its not working for me.
ReplyDeleteI have VxLAN fabric (L2 switches with L2 and L3 VNIs) with border gateway device (has only L3 VNI configured in inside VRF for VxLAN routing and advertising VxLAN subnets to external world (i.e., outside vrf)
I configured L2 VNIs as below on my Layer-2 switches (two L2 VNIs are going via same L3 VNI)
interface Vlan1001 (L2VLAN)
no shutdown
vrf member INSIDE
ip address 10.0.0.1/23
fabric forwarding mode anycast-gateway
ip dhcp relay address <DHCP IP routable on outside VRF and doesn't know anything about VxLAN)
!
interface Vlan1002 (L2VLAN)
no shutdown
vrf member INSIDE
ip address 10.0.2.1/23
fabric forwarding mode anycast-gateway
ip dhcp relay address <DHCP IP routable on outside VRF and doesn't know anything about VxLAN)
!
interface vlan 172 (L3 VLAN)
no shutdown
vrf member INSIDE
ip forwarding
!
Border Gateway has only L3 VNI configured towards VxLAN (inside vrf) and has reachability to external world via outside vrf. This device is advertising all Inside routes into outside and vice-versa. DHCP server is located on outside VRF.
Do you have any recommendations for this setup?
This setup doesn't seem unusual.
DeleteDo you have IP reachability between relay and leafs' SVIs, can they ping each other?
Have you tried to use loopbacks as sources?
What does how ip dhcp relay statistics [interface] show?
Thanks Daria for your response. The setup is working after adding a new loopback that has reachability to DHCP server and making that loopback as source. Thanks.
ReplyDeleteWould there be any impact to PXE boot when we configure dhcp relay source as loopback ip? PXE servers are also configured as one of the IP helper address. Pls clarify. Thanks in advance.
I had setups with PXE boot and loopback as source and it was fine - don't think anything can go wrong here.
Deletegreat. thanks. let me reach out if I need any further details/help.
ReplyDeleteThank you! Adding the "dummy" DHCP scope solved our issue. Without that the DHCP process would fail with a NAK from the server. Adding that scope, which covered the Loopback range in the same VRF as the originating subnet, worked straight away. We didn't even need to activate the scope.
ReplyDelete