Grasping Technology

Monday, October 18, 2021

HAProxy - Aggravating Problem I Have Not Solved

I have not ever really blogged on proxies. I don't have a lot of proxy experience and don't consider myself a guru with proxies, load balancers, etc.

But more and more often, solutions have come in that require load distribution to an N+1 (Active Active) cluster. And, HAProxy is supposed to be a rather lightweight and simple approach, especially in situations where the mission is not totally critical, or the load is not seriously high.

I originally set HAProxy up to distribute load to a Cloudify cluster. And Cloudify provided the configuration for HAProxy that they had tested in their lab, and that they knew worked well. Later, I set HAProxy up to load balance our Morpheus cluster. Initially it was working fine.

Or, so it seemed. Later, I noticed errors. The first thing you generally do when you see errors, is to tell HAProxy to use one node (and not 2 or 3), so that you can reduce troubleshooting complexity and examine the logs on just one back-end node. So in doing this, I managed to rather quickly figure out that if I told HAProxy to use one back-end node, things worked fine. When I told HAProxy to use two or more back-end nodes, things didn't work.

So that's where it all started.

The Problem
Below is a picture of what we are doing with HAProxy, and based on the picture below, web access comes in on the northbound side of the picture, and web access is not the problem we are having. The problem, is that VMs that are deployed onto various internal networks by Morpheus "phone home" and they phone home on a different network interface.

This works fine with a single back-end enabled. But if you enable more than one back-end in HAProxy, Morpheus fails to fully transition the state of the VM to "running".

In testing this out a bit and dumping traffic, we initially noticed something interesting. The Source IP coming into each Morpheus node, was not the HAProxy VIP - it was the interface IP address. We wound up solving this, by telling KeepAliveD to delete and re-create the routes with the VIP to be used as the Source IP - but only when it had control of the VIP. But in the end, while this made traffic analysis (tcpdump on the Morpheus nodes) a bit more clear about the traffic flow, it did not solve the actual issue.

I STILL don't know why it works with one back-end, and not two or more. I have had Proxy experts in our organization come in and look, and they seem to think HAProxy is doing its job properly, and that the issue is on the back-end clustering. The vendor, however, is telling us the issue is with HAProxy.

Our next step may be to configure a different load balancer. That should definitely rule things out. I know Squid Proxy is quite popular, but these tools do have a Learning Curve, and I have zero zilch experience with Squid Proxy. I think we may use a Netscaler Load Balancer if we wind up going with another one.

I should mention that the HAProxy configuration is not the simplest. And as a result of configuring this, I have increased my general knowledge on Load Balancing.

Monday, October 4, 2021

The first Accelerated VNF on our NFV platform

I haven't posted anything since April but that isn't because I haven't been busy.

We have our new NFV Platform up and running, and it is NOT on OpenStack. It is NOT on VMWare VIO. It also, is NOT on VMWare Telco Cloud!

We are using ESXi, vCenter, NSX-T for the SD-WAN, and Morpheus as a Cloud Management solution. Morpheus has a lot of different integrations, and a great user interface that gives tenants a place to log in and call home and self-manage their resources.

The diagram below depicts what this looks like from a Reference Architecture perspective.

The OSS, which is not covered in the diagram, is a combination of Zabbix and VROPS, both working in tandem to ensure that the clustered hosts and management functions are behaving properly.

The platform is optimized with E-NVDS, which is also referred to commonly as Enhanced Datapath which requires special DPDK drivers to be loaded on the ESXi hosts, for starters, as well as some configuration in the hypervisors. There are also settings to be made in the hypervisors to ensure that the E-NVDS is configured properly (separate upcoming post).

Now that the platform is up and running, it is time to start discussing workload types. There are a number of Workload Categories that I tend to use:

Enterprise Workloads - Enterprise Applications, 3-Tier Architectures, etc.
Telecommunications Workloads

Control Plane Workloads
Data Plane Workloads

Control Plane workloads are have more tolerances for latency and system resources than Data Plane Workloads do.

Why? Because Control Plane workloads are typically TCP-based, frequently use APIs (RESTful), and tend to be more periodic in their behavior (periodic updates). Most of the time, when you see issues related to Control Plane, it is related to back-hauling a lot of measurements and statistics (Telemetry Data). But generally speaking, this data in of itself does not have stringent requirements.

From a VM perspective, there are a few key things you need to do to ensure your VNF behaves as a true VNF and not as a standard workload VM. These include:

setting Latency Sensitivity to High, which turns off interrupts and ensures that poll mode drivers are used.
Enable Huge Pages on the VM by going into VM Advanced Settings and adding the parameter: sched.mem.lpage.enable1GHugePage = TRUE

Note: Another setting worth checking, although we did not actually set this parameter ourselves, is: sched.mem.pin = TRUE

Note: Another setting, sched.mem.maxmemctl ensures that ballooing is turned off. We do NOT have this setting, but it was mentioned to us, and we are researching this setting.

One issue we seemed to continually run into, was a vCenter alert called Virtual Machine Memory Usage, displaying in vCenter as a red banner with "Acknowledge and Reset to Green" links. The VM was in fact running, but vCenter seemed to have issues with it. The latest change we made that seems to have fixed this error, was to check the "Reserve all guest memory (All locked)" option checkbox.

This checkbox to Reserve all guest memory seemed intimidating at first, because the concern was that the VM could reserve all memory on the host. That is NOT what this setting does!!! What it does, is allow the VM to reserve all of its memory up-front - but just the VM memory that is specified (i.e. 24G). If the VM has has HugePages enabled, it makes sense that one would want the entire allotment of VM memory to memory to be reserved up front and be contiguous. When we enabled this, our vCenter alerts disappeared.

Lastly, we decided to change DRS to Manual in VM Overrides. To find this setting amongst the huge number of settings hidden in vCenter, you go to the Cluster (not the Host, not the VM, not the Datacenter) and the option for VM Overrides is there, and you have four options:

None
Manual
Partial
Full

The thinking here, is that VMs with complex settings may not play well with vMotion. I will be doing more research on DRS for VNFs before considering setting this (back) to Partial or Full.

Monday, April 26, 2021

Tenancy is Critical on a Cloud Platform

With this new VMWare platform, it was ultimately decided to go with ESXi hypervisors, managed by vCenter, and NSX-T.

During the POC, it was pointed out that this combination of solutions had some improvements and enhancements over OpenStack (DRS, vMotion, et al). But one thing seemed to be overlooked, and we pointed it out: Tenancy

VMWare attempts to address Tenancy with Vertical Stack point solutions, like vCloud Director (positioned at Service Providers), or vRealize Automation. The latter, is going through a complete transformation in its latest version. These solutions are also expensive. And, if you don't have the budget, what are your options??

One option is to set up Resource Pools and Folders in vCenter. Not the cleanest solution because you cannot set policies, workflows, etc.

What else can you do? Well, you can use a Cloud Management solution.

We had Cloudify as an Orchestrator. And we evaluated that as a Cloud Management solution. But what we found in the end, was that Cloudify excelled at complex orchestration, but it was not designed and built, ground-up, to be a Cloud Management Platform.

It seemed that this (lack of) Tenancy seemed to become apparent to everyone all at once - once the platform came up on VMWare. And, with Cloudify we lacked the Blueprint development to do the scores to hundreds of tasks that we needed to have. It needed integrations with NSX-T, vCenter, and a host of other solutions.

We looked at a couple of other solutions, and settled on a solution called Morpheus.

I will blog a bit more about Morpheus in upcoming posts. I have been very hands-on with it lately.

Migrating from OpenStack to VMWare

Been a while since I have posted anything. The latest news, is that my employer has decided to replace OpenStack with VMWare. We had to do a side-by-side comparison POC (Proof of Concept), between VMWare with Contrail as the network SDN, and VMWare with NSX-T as the network SDN.

In the end, they chose NSX-T, and the architecture looked like:

Cloudify - Cloud Management and Orchestration
VMWare vCenter Cloud (Private Cloud)
NSX-T - (I will probably blog more on NSX-T later)

So, I have had to go through a series of Bootcamp trainings on the following.

NSX-T
VeloCloud

I will make some brief posts about these, as well as some other things, going forward.

Wednesday, February 3, 2021

DPDK Hands-On Part X - Launching a DPDK virtual machine with Virsh Libvirt

In DPDK Hands-On Part IX, we launched a virtual machine with a bash script, calling qemu with sufficient command line options that it would launch a virtual machine with DPDK ports.

To launch a virtual machine that has DPDK ports using virsh (LibVirt), the first thing you need to know, is that you can not use the virt-manager GUI. Why? The GUI does not understand vhostuser (DPDK) ports - at least not in the version I happen to be running. But aside of this, the process remains the same. Your virtual machine needs to be defined by xml, which is parsed by virsh when the VM is launched. In order to launch a DPDK virtual machine, the xml will need to be crafted properly, and this post will discuss the important sections of that xml.

I made several attempts at finding an xml file that would work properly. It took quite a while. There are a few examples on OpenVSwitch websites, as well as Intel-sponsored DPDK websites. In the end, I found a link from an R&D engineer named Tomek Osinski that helped me more than any other, and I will share that here:

Configuring OVS-DPDK with VM for performance testing

In this link, is a full-fledged xml (file).

Let me pick through the relevant sections of this file, and comment on the important sections.

Memory

<currentMemory unit='KiB'>8399608</currentMemory>
<memoryBacking>
  <hugepages>
    <page size='1' unit='G' nodeset='0'/>
  </hugepages>
</memoryBacking>

This section specifies that the VM will use in excess of 8G of RAM, but tells virsh (libvirt/KVM) to place the VM's memory on HugePages.

If you try to launch this VM, and the VM Host (server running KVM) does not have Hugepages enabled, or have enough Hugepages available, the launch of the VM will fail.

CPU

<vcpu placement='static'>8</vcpu>

<cpu mode='host-model'>
  <model fallback='allow'/>
  <topology sockets='2' cores='4' threads='1'/>
  <numa>
    <cell id='0' cpus='0-1' memory='4194304' unit='KiB' memAccess='shared'/>
  </numa>
</cpu>

<cputune>
  <shares>4096</shares>
  <vcpupin vcpu='0' cpuset='14'/>
  <vcpupin vcpu='1' cpuset='15'/>
  <emulatorpin cpuset='11,13'/>
</cputune>

In this section, 8 virtual CPU (2 sockets x 4 cores each) are specified by the virtual machine. A NUMA cell is specified with virtual CPUs 0 and 1.

Of these 8, two virtual CPUs are pinned to the last two cores of the host (assuming a 16 core host that has cores 0-15).

The CPU pinning is optional. I elected to avoid CPU pinning on my own VM. One reason I avoided it is because I am already pinning my OpenVSwitch to specific cores using the pmd mask. But I am still covering this here because it is worth pointing out.

Network Interface (vhostuser)

    <interface type='vhostuser'>
      <mac address='00:00:00:00:00:01'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser0' mode='client'/>
       <model type='virtio'/>
      <driver queues='2'>
        <host mrg_rxbuf='off'/>
      </driver>
    </interface>

 
    <interface type='vhostuser'>
      <mac address='00:00:00:00:00:02'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/dpdkvhostuser1' mode='client'/>
      <model type='virtio'/>
      <driver queues='2'>
        <host mrg_rxbuf='off'/>
      </driver>
    </interface>

In this section, the user has defined a network interface of type vhostuser.

As you may recall from my earlier blogs on the topic, a vhostuser port means that the VM acts as a client and OpenVSwitch acts as a server (which is not recommended by OpenVSwitch these days - they prefer vhostuserclient ports be used instead so that a reboot of the switch does not strands multiple VMs connected to it).

Tip: The Virt-Manager KVM GUI does not allow you to pick this kind of network interface from the menu when you set up an adaptor. So you need to put this in your xml, and if you do so, it indeed will show up in the GUI as a "vhost" interface.

Each mac address is set distinctively (loading two interfaces with same mac address is obviously going to cause trouble). Each NIC is specified to use multi-queuing (2 Rx/Tx queues per NIC). And, the specific socket to connect to on the OpenVSwitch is specified.

The super important thing to know about these interface directives, is that, prior to launching the VM, the following needs to be in place as a pre-requisite.

OpenVSwitch needs to be running
The DPDK vhostuser ports need to be created on the appropriate bridges on OpenVSwitch.
The datapath on the bridges of the OpenVSwitch that host the vhostuser ports need to be set to netdev (DPDK) rather than the default of system.

Once your xml is crafted properly, boot the VM with "virsh start [ vm name ]", and see if the VM boots with network connectivity.

Tip: When your VM boots, KVM should number the interfaces as eth0, eth1, eth2 according to the order they are in the xml file. But I have found that it doesn't always do this, so it is a good idea to map the xml file mac addresses to those which show up in the VM so that you can ensure you are putting the right IP addresses on the right interfaces!!! Doing this will save you a TON OF DEBUGGING!

Tuesday, October 13, 2020

DPDK Hands-On - Part IX - Launching a VM with DPDK Ports from a bash script

When one launches a virtual machine on Linux, you can configure your virtual machine hypervisor to run as a Type 1 hypervisor (qemu), or a Type 2 hypervisor (qemu-kvm). This post won't go into the details of the differences between those.

The options one can pass into qemu or qemu-kvm can be daunting.

But, if you want to launch a virtual machine just for the purpose of testing a DPDK adaptor, this can be done in a lightweight manner.

Remember, that a vhostuser port connects to a socket on the OpenVSwitch. This is the original, legacy manner in which DPDK was implemented between VMs and OpenVSwitch.

The drawback to this, of course, is that if the switch were to die, the VMs would be stranded with no connectivity (a bad thing).

This is why they came out with vhostuserclient ports, where OpenVSwitch connects to a socket that is managed by qemu when it starts the VM. This way, if the switch dies or is restarted, ports on the switch will reconnect to the VM-managed sockets. If a VM dies, it only removes its own sockets.

So we will cover the scenarios of launching a VM in both modes; vhostuser, and vhostuserclient.

VhostUser

With a vhostuser port, OpenStack creates the socket that qemu connects to. So this port needs to be created ahead of time on OpenVSwitch (see DPDK Hands-On Part VIII - creating DPDK ports).

Below is a snippet of a bash script that cranks a VM after some parameters are defined.

if [ $PORT_TYPE == "vhost-user" ]; then
# vhost-user - openvswitch binds to socket and acts as server
export VHOST_SOCK_DIR=/var/run/openvswitch

export VHOST_PORT="${VHOST_SOCK_DIR}/vhostport1"

   # the vhostuser does not have the server parameter.
   /usr/libexec/qemu-kvm -name $VM_NAME -cpu host -enable-kvm \
   -m $GUEST_MEM -drive file=$QCOW2_IMAGE --nographic -snapshot \
   -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 \
   -object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \
   -chardev socket,id=char${MAC},path=${VHOST_PORT} \
   -netdev type=vhost-user,id=default,chardev=char${MAC},vhostforce,queues=2 \
   -device virtio-net-pci,mac=00:00:00:00:00:0${MAC}, netdev=default, mrg_rxbuf=off, mq=on, vectors=6
#1>qemu.kvm.vhostuser.out 2>&1

So let's discuss these parameters:

- The Numa Node parameter tells you how to specify your cpus. A good link on this can be found at https://futurewei-cloud.github.io/ARM-Datacenter/qemu/how-to-configure-qemu-numa-nodes/

Now on my VM, I only have one Numa slot, which equates to one Numa Node (Numa Node 0), with 4 cores on it. So with the -smp sockets=1 cores=2 directive, I can specify 2 cores for my Virtual Machine.

- Mem Path

This parameter is specified to put the VM on hugepages. Any VM using DPDK ports should be backed by HugePages.

- Socket Parameters

The socket parameters are important. OpenVSwitch, if it creates the socket, will place the socket in /var/run/openvswitch by default. You have to create the port there manually, then tell the VM to connect to it specifically by the path and socket name.

- Multi queuing

You will notice that mq=on, and 2 queues are specified. Having more than one queue for sending and receiving data unlocks a bottleneck.

NOTE: I am not clear, actually, if by specifying 2 on "queues=2", that this means 2 x Tx queues AND 2 x Rx queues. I need to look into that and perhaps update this post (or if someone can comment on this, great).

The proof is in the pudding on this. When you launch the VM, you will need to check TWO places to ensure your networking initialized properly:

First, check /var/log/openvswitch/ovs-vswitchd.log

Second, check the output of the VM. Which means that you would need to redirect your output to a file for stdout and stderr and analyze accordingly. The green highlighted section above shows this. But this does affect your start stop of the VM, so I comment this out when running in normal mode.

VhostUserClient

The vhostuserclient works very similar to vhostuser, except that the socket is managed by qemu rather than OpenVSwitch!

# vhost-user-client - qemu binds to socket and acts as server
export VHOST_SOCK_DIR="/var/lib/libvirt/qemu/vhost_sockets"

export VHOST_PORT="${VHOST_SOCK_DIR}/dpdkvhostclt1"

echo "Starting VM..."
/usr/libexec/qemu-kvm -name $VM_NAME -cpu host -enable-kvm \
-m $GUEST_MEM -drive file=$QCOW2_IMAGE --nographic -snapshot \
-numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 \
-object memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on \
-chardev socket,id=char1,path=${VHOST_PORT},server \
-netdev type=vhost-user,id=default,chardev=char1,vhostforce,queues=2 \
-device virtio-netpci,mac=00:00:00:00:00:0${MAC},netdev=default,mrg_rxbuf=off,mq=on,vectors=6

Note the difference between the "chardev" directive above, in comparison with vhostuser "chardev" directive. The vhostuserclient directive has the additional "server" parameter included!!! This is the KEY to specifying that QEMU owns the socket, as opposed to OpenVSwitch.

One might wonder, how this works in practice. If you create a VM and it binds to a socket, how does OpenVSwitch know to connect to it? The answer is, OpenVSwitch won't - until you create the port on OpenVSwitch in vhostuserclient mode! At which point, OVS will know it is the client, and create the connection to the qemu process!

So a script that launches a VM with vhostuserclient interfaces, should probably create the OVS port first, then launch the VM. Otherwise, the VM won't have the connectivity it needs in a timely manner when it boots up (i.e. for DHCP to determine its IP Address and settings).

Tuesday, September 29, 2020

DPDK Hands-On - Part VIII - Creating Virtual DPDK Ports on OpenVSwitch

Before we get into the procedure of adding virtual ports to the switch, it is important to understand the two types of DPDK virtual ports, and their differences.

In the earlier versions of DPDK+OVS, virtual interfaces were defined with a type called vhostuser. These interfaces would connect to OpenVSwitch. Meaning, from a socket perspective, that OpenVSwitch managed the socket. More technically, OVS binds to a socket in /var/run/openvswitch/<portname> behaving as a server, while the VMs connect to this socket as a client.

There is a fundamental flaw in this design. A rather major one! Picture a situation where a dozen virtual machines launch with ports that are connected to the OVS, and the OVS is rebooted! All of those sockets are destroyed on the OVS, leaving all of the VMs "stranded".

To address this flaw, the socket model needed to be reversed. The virtual machine (i.e. qemu) needed to act as the server, and the switch needed to be the client!

Hence, a new port type was created: vhostuserclient.

A more graphical and elaborative explanation of this can be found on this link:

https://software.intel.com/content/www/us/en/develop/articles/data-plane-development-kit-vhost-user-client-mode-with-open-vswitch.html

Now because there are two sides to any socket connection, it makes sense that BOTH sides need to be configured properly with the proper port type for this communication to work.

This post deals with simply adding the right DPDK virtual port type (vhostuser or vhostuserclient) to the switch. But configuring the VM properly is also necessary, and will be covered in a follow-up post after this one is published.

I think the easiest way to show how these two port types are added, with some discussion.

VhostUser

To add a vhostuser port, the following command can be run:

# ovs-vsctl add-port br-tun dpdkvhost1 -- set Interface dpdkvhost1 type=dpdkvhostuser ofport_request=2

It is as simple as adding a port to a bridge, giving it a name, and using the appropriate type for a legacy virtual DPDK port (dpdkvhostuer). We also give it port number 2 (in our earlier post, we added a physical DPDK PCI NIC to port 1 so we will assume port 1 is used by that).

Notice, that there is no socket information included in this. OpenVSwitch will create a socket, by default, in /var/run/openvswitch/<portname> once the vhostuser port is added to the switch.

NOTE: The OVS socket location can be overridden but for simplicity we will assume default location. Another issue is socket permissions. When the VM launches under a different userid such as qemu, the socket will need to be writable by qemu!

The virtual machine, with a vhostuser interface defined on it, will need to be instructed where to connect; what socket to connect to. So because the VM needs to know where to connect to, it actually makes the OVS configuration somewhat simpler in this model because OVS will create the socket where it is configured to do so, the default being in /var/run/openvswitch.

So after adding a port to the bridge, we can do a quick show on our bridge to ensure it created properly.

# ovs-vsctl show

Bridge br-tun
        fail_mode: standalone
        Port "dpdkvhost1"
            Interface "dpdkvhost1"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0"}
        Port br-tun
            Interface br-tun
                type: internal

With this configuration, we can do a test between a physical interface and a virtual interface, or the virtual interface can attempt to reach something outside of the host (i.e. a ping test to a default gateway and/or an internet address). With this configuration, a virtual machine could also attempt a DHCP request to obtain its IP address for the segment is on if a DHCP server indeed exists.

If we wanted to test between two virtual machines, another such interface would need to be added:

# ovs-vsctl add-port br-tun dpdkvhost2 -- set Interface dpdkvhost2 type=dpdkvhostuser ofport_request=3

And, this would result in the following configuration:

Bridge br-tun
        fail_mode: standalone
        Port "dpdkvhost2"
            Interface "dpdkvhost2"
                type: dpdkvhostuser
        Port "dpdkvhost1"
            Interface "dpdkvhost1"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0"}
        Port br-tun
            Interface br-tun
                type: internal

With this configuration, TWO virtual machines would connect to their respective OVS switch sockets:

VM1 connects to OVS socket for vhostusr1 --> /var/run/openvswitch/vhostusr1

VM2 connects to OVS socket for vhostusr2 --> /var/run/openvswitch/vhostusr2

Thanks to the PCI port we added earlier these two VMs "reach outside" to request an IP Address, and ping each other on the same segment if they both have an IP address.

dpdkvhost1 dpdkvhost2

=|==========|=

OVS Bridge (br-tun)

======|======

dpdk0

Upstream Router

VhostUserClient

This configuration looks similar to the vhostuser configuration, but with a subtle difference. In this case, the VM is the server in the client server socket model, so the OVS port, as a client, needs to know where the socket it in order to connect to it!

# ovs-vsctl add-port br-tun dpdkvhostclt1 -- set Interface dpdkvhostclt1 type=dpdkvhostuserclient "options:vhost-server-path=/var/lib/libvirt/qemu/vhost_sockets/dpdkvhostclt1" ofport_request=4

In this directive, the only thing that changes is the addition of the parameter telling OVS where the socket is to connect to, and of course the type of port needs to be set to dpdkvhostuserclient (instead of the vhostuser).

And, if we run out ovs-vsctl show command, we will see that the port looks similar to the vhostuser ports, except for two differences:

the type is now vhostuserclient, rather than vhostuser
the option parameter which instructs OVS (the socket client) where to connect to.

Bridge br-tun
        fail_mode: standalone
        Port "dpdkvhostclt1"
            Interface "dpdkvhostclt1"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/var/lib/libvirt/qemu/vhost_sockets/dpdkvhostclt1"}
        Port "dpdkvhost2"
            Interface "dpdkvhost2"
                type: dpdkvhostuser
        Port "dpdkvhost1"
            Interface "dpdkvhost1"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0"}
        Port br-tun
            Interface br-tun
                type: internal

Setting up Flows

Just because we have added these port, does not necessarily mean they'll work after creation. The next step, is to enable flows (rules) for traffic forwarding between these ports.

Setting up switch flows is an in-depth topic in and of itself, and one we won't cover in this post. There are advanced OpenVSwitch tutorials on Flow Programming (OpenFlow).

The first thing you can generally do, if you don't have special flow requirements that you're aware of, is to set the traffic processing to "normal", as seen below for the br-tun bridge/switch.

# ovs-ofctl add-flow br-tun actions=normal

This should give normal L2/L3 packet processing. But, if you can't ping or your network forwarding behavior isn't as desired, you may need to program more detailed or sophisticated flows.

For simplicity, I can show you a couple of examples of how one could attempt to enable some traffic to flow between ports:

allows you to ping from the bridges out to the host on their PCI interfaces...

# ovs-ofctl add-flow br-tun in_port=LOCAL,actions=output:dpdk0

# ovs-ofctl add-flow br-prv in_port=LOCAL,actions=output:dpdk1

allows you to forward packets to the proper VM when they come into the host.

# ovs-ofctl add-flow br-tun ip_dst=192.168.30.202,actions=output:dpdkvhost1

# ovs-ofctl add-flow br-prv ip_dest=192.168.20.202,actions=output:dpdkvhost0

To debug the packet flows, you can dump them with the "dump-flows" command. There is a similarity between iptables rules (iptables -nvL) and openvswitch flows, and debugging is somewhat similar in that you can dump flows, and look for packet counts.

# ovs-ofctl dump-flows br-prv
cookie=0xd2e1f3bff05fa3bf, duration=153844.320s, table=0, n_packets=0, n_bytes=0, priority=2,in_port="phy-br-prv" actions=drop
cookie=0xd2e1f3bff05fa3bf, duration=153844.322s, table=0, n_packets=10224168, n_bytes=9510063469, priority=0 actions=NORMAL

In the example above, we have two flows on the bridge br-prv. And we do not see any packets being dropped. So, presumably, anything connected to this bridge should be able to communicate from a flow perspective.

After setting these kinds of flows, ping tests and traffic verification tests will need to be done.

I refer to this as "port plumbing" and these rules indeed can get very advanced, sophisticated and complex - potentially.

If you are launching a VM on Linux, via KVM (a script usually), or using Virsh Manager (which drives off of an xml file that describes the VM), you will need to set these "port plumbing" rules up manually, and you would probably start with the basic normal processing unless you want to do something sophisticated.

If you are using OpenStack, however, OpenStack does a lot of things automatically, and the things it does is influenced by your underlying OpenStack configuration (files). For example, if you are launching a DPDK VM on an OpenStack that is using OpenVSwitch, each compute node that will be running a neutron-openvswitch-agent service. This service, is actually a Ryu OpenFlow Controller, and when you start this service, it plumbs ports on behalf of OpenStack Neutron on the basis of your Neutron configuration. So you may look at your flows with just OpenVSwitch running and see a smaller subset of flows than you would, if the neutron-openvswitch-agent were running! I may get into some of this in a subsequent post, if time allows.