Distributed Virtual Routing – SNAT

Author: assafmuller
Source: Planet OpenStack

Where Am I?
Overview and East/West Traffic
* SNAT
Floating IPs

SNAT vs Floating IPs

A quick reminder about two NAT types used in Neutron.

  1. SNAT refers to source NAT, or, changing the source address of packets as they leave the external device of a router. This is used for traffic originating from VMs that have no floating IP attached. A router is allocated a single IP address from the external network which is shared across all VMs connected to all subnets the router is connected to. Sessions are differentiated according to the full tuple of (source IP, destination IP, source port, destination port). This is typically known as ‘PAT’, or port address translation in the networking world.
  2. Floating IPs, sometimes called DNAT (Destination NAT) in Neutronland, implement a much simpler form of NAT, a 1:1 private to public address translation. You can assign a VM a floating IP and access it from the outside world.

Why Keep SNAT Centralized?

DVR distributes floating IPs north/south traffic to the compute node, just as it does for east/west traffic. This will be explained in the next blog post. SNAT north/south traffic, however, is not distributed to the compute nodes, but remains centralized on your typical network nodes. Why is this? Intuitively, you’re going to need an address from the external network on every node providing the SNAT service. This quickly becomes a matter of balance – How far would you like to distribute SNAT vs consumption of addresses on your external network(s)? The approach that was chosen is to not distribute the SNAT service at all, but keep it centralized like legacy routers. The next step would be to make the SNAT portion of distributed routers highly available by integrating DVR with L3 HA, and this work is planned for the Liberty cycle.

Logical Topology

snat_logical

Note that the router has two ports in each internal network. This is an implementation detail that you can safely ignore for now and will be explained later.

Physical Topology

snat_physical

SNAT Router Lifecycle

After attaching the router to an external network, the SNAT portion of the router is scheduled amongst L3 agents in dvr_snat mode. Observing the dvr_snat machine:

[stack@vpn-6-22 devstack (master=)]$ ip netns
snat-ef25020f-012c-41d6-a36e-f2f09cb8ea62
qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62

We can see that two namespaces were created for the same router. The ‘regular’ qrouter namespace, which is identical to the namespace created on compute nodes and is used to service VM, DHCP or LB ports on that machine, and the ‘snat’ namespace, which is used for the centralized SNAT service. Let’s dive deeper in to this new SNAT namespace:

[stack@vpn-6-22 devstack (master=)]$ sudo ip netns exec snat-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    ...
101: sg-1b9c9c26-38: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether fa:16:3e:a3:ef:a9 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.3/24 brd 10.0.0.255 scope global sg-1b9c9c26-38
    ...
102: qg-8be609d9-e3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether fa:16:3e:93:cb:37 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.21/24 brd 192.168.1.255 scope global qg-8be609d9-e3
    ...
104: sg-fef045fb-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether fa:16:3e:de:85:63 brd ff:ff:ff:ff:ff:ff
    inet 20.0.0.3/24 brd 20.0.0.255 scope global sg-fef045fb-10
    ...

We can see two new ‘sg’ devices in the SNAT namespace, and the familiar ‘qg’ / external device (Which is not present in the qrouter namespces). Where did these ‘sg’ devices come from? These are additional ports, one for each internal network the router is connected to. This is why the router now has two ports in every internal network, the ‘qr’ device on compute nodes, and the ‘sg’ device in the SNAT namespace. These ‘sg’ ports are used as an extra hop during VM SNAT traffic.

Tracking a Packet

When a VM without a floating IP sends traffic destined to the outside world, it hits the qrouter namespace on its node, which redirects the message to the SNAT namespace. To achieve this, some source routing trickery is used. Here’s a concise source routing tutorial. Now that you are familiar with ‘ip rule’, the idea of multiple routing tables and source routing, let’s move on!

Let’s observe the ‘ip rule’ output executed from within the qrouter namespace on the compute node:

[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip rule
0:	from all lookup local 
32766:	from all lookup main 
32767:	from all lookup default 
167772161:	from 10.0.0.1/24 lookup 167772161 
335544321:	from 20.0.0.1/24 lookup 335544321

It looks like there’s source routing rules setup for every subnet the router is attached to. Let’s look at the main routing table, as well as the new routing tables:

[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip route
10.0.0.0/24 dev qr-c2e43983-5c  proto kernel  scope link  src 10.0.0.1 
20.0.0.0/24 dev qr-369f59a5-2c  proto kernel  scope link  src 20.0.0.1
[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip route show table 167772161
default via 10.0.0.3 dev qr-c2e43983-5c
[stack@vpn-6-21 devstack (master=)]$ sudo ip netns exec qrouter-ef25020f-012c-41d6-a36e-f2f09cb8ea62 ip route show table 335544321
default via 20.0.0.3 dev qr-369f59a5-2c

We can observe that 10.0.0.3 and 20.0.0.3 are the ‘sg’ devices for the same router in the SNAT namespace on the dvr_snat node.

How then is east/west traffic and SNAT traffic classified and routed? If a VM in the 10.0.0.0/24 subnet on the local compute node pings a remote VM in the 20.0.0.0/24, we’d expect that to get classified as east/west traffic and go through the process explained in the previous blog post. The source guest OS puts 20.0.0.x in the destination IP and the MAC address of its default gateway in the packet and frame respectively. br-int forwards the message to the qrouter namespace on the local node, and the namespace’s ip rules are consulted. ip rules are processed according to their priority (Lowest to highest), which is listed in the first column in the ‘ip rule’ output above. The main routing table has an entry for 20.0.0.0/24 thus the message is forwarded out the appropriate ‘qr’ device.

If the same VM ping’d 8.8.8.8, however, it’d be a different story. The main routing table would be consulted first, however, it cannot match 8.8.8.8, and the main routing table doesn’t have a default route. Let’s take another look at the routing rules in place: The main routing table was consulted but did not hit a match. The ‘default’ table is empty. Can we match any of the remaining rules? Of course, the source IP address is in the 10.0.0.0/24 range, thus the fourth rule matches and the 167772161 table is consulted. We can see that it contains a single entry, a default route. The message is then routed to 10.0.0.3 (The ‘sg’ device for the subnet) via that subnet’s local ‘qr’ device. Interestingly, this is the same device the message came in on. At this point, standard DVR east/west routing takes place and the message eventually finds itself in the SNAT namespace on the dvr_snat node, where it is routed out via the ‘qg’ / external device, right after SNAT iptables rules change the source IP from the VM to the ‘qg’ device IP.

<iframe frameborder=”0″ height=”569″ marginheight=”0″ marginwidth=”0″ src=”https://docs.google.com/presentation/d/1gYptAQCULUFxmBM1yw6BncMfSd7g8MmTpK0Lp5Toz6s/embed?start=false&amp;loop=false&amp;delayms=60000″ width=”696″></iframe>

Powered by WPeMatico