Tuesday, December 19, 2017

Port Binding Failures on OpenStack - How I fixed this


In trying to set up OpenStack for a colleague, I had an issue where I could not get the ports to come up. The port status for the networks would be in a state of "down", and I could not ping these ports from within the network namespaces, or from the router ports.

Unfortunately, I did not have DEBUG turned on in the logs. So I saw no output from nova or neutron about any kind of issues with networking or port bindings.

I did enable DEBUG, at which point I started to see "DEBUG" messages (not "ERROR" messages mind you, but "DEBUG" messages) about port binding failures. Finding these port binding debug messages was like looking for a needle in a haystack as there is a ton of debug output when you enable debug in Nova and Neutron.

I had a very very difficult time figuring out what was causing this.  But here is how I fixed it:

1. I watched a segment on YouTube about ml2 and Neutron. This was an Auction OpenStack Summit, and the url is here:

https://www.youtube.com/watch?v=e38XM-QaA5Q

2. I quickly realized that host names were such an integral part of port binding, that it was necessary to check the agents, the host names of those agents in Neutron, and the host names stored in mysql.

In MySQL, the neutron database has a table called agents, and all agents are mapped to a host.  That host needs to be correct and resolvable.

In the end, I wound up deleting some old agents that were no longer being used (old hosts, and some openvswitch agents that were lingering from a previous switch from linuxbridge to openvswitch).  I then had to correct some hostnames because I had my OpenStack Controller and Network node in a VM and I had recycled that VM for my colleague - who had reassigned a new hostname for the VM on his platform.


Then, just to be thorough, I deleted all agents (i.e. DHCP agents), then deleted all subnets, then deleted all networks. I then re-created these - WITH NEW NAMES (so as to ensure that OpenStack wasn't re-using old ones); in order. First I created the networks, then I created the subnets, then I created the agents (which generally create the ports themselves). Lastly, I mapped these new subnets to the router as interfaces (which creates ports).

One thing that is EXTREMELY important, is that ports bind to the PHYSICAL network...not the Virtual network. 

If you create a external provider network called "provider", and the physical network is called "physical", and then you go into ml2_conf.ini and linuxbridge.ini and use "provider" instead of "physical" in your bindings, you will most assuredly result in a port binding failure.

So these are the tips and tricks to solving the port binding issue, or configuring properly ahead of time so that you don't run into the port binding issue.

No comments:

Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it wa...