Showing posts with label NFV. Show all posts
Showing posts with label NFV. Show all posts

Monday, October 4, 2021

The first Accelerated VNF on our NFV platform

 I haven't posted anything since April but that isn't because I haven't been busy.

We have our new NFV Platform up and running, and it is NOT on OpenStack. It is NOT on VMWare VIO. It also, is NOT on VMWare Telco Cloud!

We are using ESXi, vCenter, NSX-T for the SD-WAN, and Morpheus as a Cloud Management solution. Morpheus has a lot of different integrations, and a great user interface that gives tenants a place to log in and call home and self-manage their resources.

The diagram below depicts what this looks like from a Reference Architecture perspective.

The OSS, which is not covered in the diagram, is a combination of Zabbix and VROPS, both working in tandem to ensure that the clustered hosts and management functions are behaving properly.

The platform is optimized with E-NVDS, which is also referred to commonly as Enhanced Datapath which requires special DPDK drivers to be loaded on the ESXi hosts, for starters, as well as some configuration in the hypervisors. There are also settings to be made in the hypervisors to ensure that the E-NVDS is configured properly (separate upcoming post).

Now that the platform is up and running, it is time to start discussing workload types. There are a number of Workload Categories that I tend to use:

  1. Enterprise Workloads - Enterprise Applications, 3-Tier Architectures, etc.
  2. Telecommunications Workloads
    • Control Plane Workloads
    • Data Plane Workloads

Control Plane workloads are have more tolerances for latency and system resources than Data Plane Workloads do. 

Why? Because Control Plane workloads are typically TCP-based,  frequently use APIs (RESTful),  and tend to be more periodic in their behavior (periodic updates).  Most of the time, when you see issues related to Control Plane, it is related to back-hauling a lot of measurements and statistics (Telemetry Data). But generally speaking, this data in of itself does not have stringent requirements.

From a VM perspective, there are a few key things you need to do to ensure your VNF behaves as a true VNF and not as a standard workload VM. These include:

  • setting Latency Sensitivity to High, which turns off interrupts and ensures that poll mode drivers are used.
  • Enable Huge Pages on the VM by going into VM Advanced Settings and adding the parameter: sched.mem.lpage.enable1GHugePage = TRUE

Note: Another setting worth checking, although we did not actually set this parameter ourselves, is: sched.mem.pin = TRUE

Note: Another setting, sched.mem.maxmemctl ensures that ballooing is turned off. We do NOT have this setting, but it was mentioned to us, and we are researching this setting.

One issue we seemed to continually run into, was a vCenter alert called Virtual Machine Memory Usage, displaying in vCenter as a red banner with "Acknowledge and Reset to Green" links. The VM was in fact running, but vCenter seemed to have issues with it. The latest change we made that seems to have fixed this error, was to check the "Reserve all guest memory (All locked)" option checkbox.

This checkbox to Reserve all guest memory seemed intimidating at first, because the concern was that the VM could reserve all memory on the host. That is NOT what this setting does!!! What it does, is allow the VM to reserve all of its memory up-front - but just the VM memory that is specified (i.e. 24G). If the VM has has HugePages enabled, it makes sense that one would want the entire allotment of VM memory to  memory to be reserved up front and be contiguous. When we enabled this, our vCenter alerts disappeared.

Lastly, we decided to change DRS to Manual in VM Overrides. To find this setting amongst the huge number of settings hidden in vCenter, you go to the Cluster (not the Host, not the VM, not the Datacenter) and the option for VM Overrides is there, and you have four options:

  • None
  • Manual
  • Partial
  • Full

The thinking here, is that VMs with complex settings may not play well with vMotion. I will be doing more research on DRS for VNFs before considering setting this (back) to Partial or Full.

Wednesday, January 1, 2020

SDN- NFV Certified from Metro Ethernet Forum

It has been a while since I have blogged any updates, so I'll knock out a few!

First, I just completed the course and certification from Metro Ethernet Forum for SDN-NFV.

This was a 3 day course, and it was surprisingly hands-on as it focused heavily on OpenFlow and OpenDaylight. I was always wanting to learn more about these, so I found this quite rewarding.

One interesting stumbling block in the labs was the fact that there is a -O option that needs to be used to specify the proper version of OpenFlow. 

The course seemed to focus on the use case and context of using OpenFlow (and OpenDaylight) to configure switches - but not "everything else" out there in the field of networking that could be configured with something like OpenFlow.

For example, it was my understanding that the primary beneficiary of something like OpenFlow (and OpenDaylight) was in the Wireless (802.11x) domain, where people had scores, hundreds or even thousands of access points that had to be configured or upgraded, and it was extremely difficult to this by hand.

But, the course focused on switches - OpenVSwitch switches to be precise. And that was probably because the OpenVSwitch keeps things simple enough for the course and instructor.

Problem is, in my shop, everyone is using Juniper switches, and Juniper does not play ball with OpenFlow and OpenVSwitch. So I'm not sure how much this can or will be put to use in our specific environment. I do, however, use OpenVSwitch in my own personal OpenVSwitch-based OpenStack environment, and since OpenVSwitch works well with DPDK and VPP, this knowledge can come in handy as I need to start doing more sophisticated things with packet flows.

Nontheless, I found the course interesting and valuable. And the exam also centered around the ETSI-MANO Reference Architecture. I had familiar with this architecture, but like all exams like this, I missed questions because of time, or overthinking things, or picking the wrong of two correct answers (not the best answer), et al. But, I passed the exam, and I guess that's what matters most.

Tuesday, December 3, 2019

Virtualized Networking Acceleration Technologies - Part II


In Part I of this series of posts, I recapped my research on these virtualized networking technologies, with the aim to build an understanding of:

  • what they are
  • the history and evolution between them
What I did not cover, was a couple of further questions:
  1. When to Use Them
  2. Can you Combine Them?
This link is a fantastic link that discusses item number one. Now, I can't tell how "right" or "accurate" he is, and I typically look down in comments for rebuttals and refutes (I didn't see any and most commenters seemed relatively uninformed on this topic).

He concludes that in East-West (inter-data center) traffic, DPDK wins, and in North-South traffic, SR-IOV wins.
https://www.telcocloudbridge.com/blog/dpdk-vs-sr-iov-for-nfv-why-a-wrong-decision-can-impact-performance/

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...