This could be a long post, because things like NUMA can get complicated.
For background, we are running servers - hypervisors - that have 24 cores. There are two chips - wafers as I like to refer to them - each with 12 cores, giving a total of 24 physical cores.
When you enable hyperthreading, you get 48 cores, and this is what is presented to the operating system and cpu scheduler (somewhat - more on this later). But - you don't get an effective doubling of cores when you enable hyperthreading. What is really happening, is that the 24 cores are "cut in half" so that another 24 cores can be "fit in", giving you 48 logical cores.
Worth mentioning also, is that each (now half) core, has a "sibling" - and this also matters from a scheduling perspective when you see things like cpu pinning used - because if you pin something to a specific core, then that "sibling" cannot be used for something else. For example, if you enabled hyperthreading, the cores would look like:
0 | 1
2 | 3
4 | 5
... and so on. So if someone pinned to core 4, core 5 is also "off the table" now from a scheduling perspective because pinning is a physical core concept, not a logical core concept.
So with this background, we had a tenant who wanted to enable a "preferHT" setting. This setting can be applied to an entire hypervisor by setting numa.PreferHT=1, affecting all VMs deployed on it.
Or, one can selectively add this setting to a particular or specific virtual machine by going into the Advanced Settings and configuring numa.vcpu.preferHT=TRUE.
In our case, it was the VM setting being requested - not the hypervisor setting. Now, this tenant is the "anchor tenant" on the platform, and their workloads are very latency sensitive. So it was important to jump through this hoop when it was requested. First, we tested the setting by powering a VM off and adding the setting, then powering the VM back on. No problems with this. We then migrated the VM to another hypervisor, and had no issues with that either. Aside of that, though, how do you know that the VM setting "took" - meaning that it was picked up and recognized?
It turns out, that there are a couple of ways to do this:
1. esxtop
When you load esxtop, it is going to show you cpu by default. But if you hit the "m" key, it goes into a "memory view". If you go into memory view by hitting "m" and then hit the "f" key, a list of fields will show up. One of them, is NUMA Statistics. So by selecting this, you get a ton of interesting information about NUMA. The settings you are most interested in, are going to be:
NHN - Current home node for the virtual machine or resource pool - in our case, this was 0 or 1 (we had two numa nodes, as there is usually one per physical cpu socket).
NMIG - Number of NUMA migrations between two snapshot samples
NRMEM - (NUMA Remote Memory): Amount of remote memory allocated to the virtual machine, in MB
NLMEM (NUMA Local Memory) - Amount of local memory allocated to the virtual machine, in MB
L%D - this shows the amount of memory that is Localized. You want this number to be 100% but seeing the number in the 90s is probably okay also because it is showing that the memory access is not traversing a NUMA bus, which adds latency.
GST_NDx (Guest Node x): Guest memory being allocated for the VM on NUMA node x, where x is the node number
MEMSZ (Memory Size): Total amount of physical memory allocated to a virtual machine
2. vmdumper command
I found this command on a blog post - which I will list in my sources at the end of this blog post. This useful command, can show you a lot of interesting information about how NUMA is working "under the hood" (in practice). It can show you a Logical Processor to NUMA Node Map, it can show you how many home nodes are utilized for a given VM, and show you the assignment of NUMA clients to the respective NUMA nodes.
One of the examples covered in this blog post refers to the situation where a VM has 12 vCPUs on a 10 core system, and then goes down and shows what it would look like if the VM had 10 vCPU instead.
Sources:
http://www.staroceans.org/ESXi_VMkernel_NUMA_Constructs.htm
https://frankdenneman.nl/2010/02/03/sizing-vms-and-numa-nodes/
https://frankdenneman.nl/2010/10/07/numa-hyperthreading-and-numa-preferht/
https://docs.pexip.com/server_design/vmware_numa_affinity.htm
https://docs.pexip.com/server_design/numa_best_practices.htm#hyperthreading
https://knowledge.broadcom.com/external/article?legacyId=2003582
No comments:
Post a Comment