Wednesday, June 26, 2024

Rocky Generic Cloud Image - Image Prep, Cloud-Init and VMware Tools

 

The process I have been using up to now, has been to download the generic cloud images from various Linux Distro sites (CentOS, now Rocky). These images are pre-baked for clouds, meaning that they're smaller, more efficient, and they generally have cloud packages installed on them (i.e. cloud-init).

It is easier (and more efficient) to use one of these images, in my thinking, than to try and take an ISO and build an image "from scratch".

The problem, though, is that "cloud images" are generally public cloud images: AWS, Azure, GKE, et al.  If you are running your own private cloud on VMware, you will run into problems using these cloud images.

Today, I am having issues with the Rocky 9.5 generic cloud image.

I am downloading the qcow2, using qemu-img convert to convert qcow2 to vmdk, then running ovftool using a templatized template.vmx file. Everything works fine, but when I load the image into our CMP which initializes with cloud-init, the VM is booting up fine, but no cloud-init is running, so you cannot log into the VM.

Here is the template.vmx.parameterized file I am using. I use sed to replace the parameters, then the file is renamed template.vmx before running ovftool on it.

.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "11"
vmci0.present = "TRUE"
floppy0.present = "FALSE"
svga.vramSize = "16777216"
tools.upgrade.policy = "manual"
sched.cpu.units = "mhz"
sched.cpu.affinity = "all"
scsi0.virtualDev = "lsilogic"
scsi0.present = "TRUE"
scsi0:0.deviceType = "scsi-hardDisk"
scsi0:0.fileName = "PARM_VMDK"
sched.scsi0:0.shares = "normal"
sched.scsi0:0.throughputCap = "off"
scsi0:0.present = "TRUE"
ide0:0.present ="true"
ide0:0.startConnected = "TRUE"
ide0:0.fileName = "/opt/images/nfvcloud/imagegen/rocky9/cloudinit.iso"
ide0:0.deviceType = "cdrom-image"
displayName = "PARM_DISPLAYNAME"
guestOS = "PARM_GUESTOS"
vcpu.hotadd = "TRUE"
mem.hotadd = "TRUE"
bios.hddOrder = "scsi0:0"
bios.bootOrder = "cdrom,hdd"
sched.cpu.latencySensitivity = "normal"
svga.present = "TRUE"
RemoteDisplay.vnc.enabled = "FALSE"
RemoteDisplay.vnc.keymap = "us"
monitor.phys_bits_used = "42"
softPowerOff = "TRUE"
sched.cpu.min = "0"
sched.cpu.shares = "normal"
sched.mem.shares = "normal"
sched.mem.minsize = "1024"
memsize = "PARM_MEMSIZE"
migrate.encryptionMode = "opportunistic"

I have tried using cdrom,hdd and just hdd on the boot order. Neither makes a difference.

When I run the ovftool program, it generates the following files, which look correct.

Rocky-9-5-GenericCloud-LVM-disk1.vmdk
Rocky-9-5-GenericCloud-LVM-file1.iso
Rocky-9-5-GenericCloud-LVM.mf
Rocky-9-5-GenericCloud-LVM.ovf

The ovf file, I have inspected. It does have references to both the vmdk and iso file in it, as it should.

The iso file, I ran a utility on it and it seems to look okay also. The two directories user_data and meta_data seem to be on there as they should be.

$ isoinfo  -i Rocky-9-5-GenericCloud-LVM-file1.iso -l

Directory listing of /
d---------   0    0    0            2048 Dec 18 2024 [     28 02]  .
d---------   0    0    0            2048 Dec 18 2024 [     28 02]  ..
d---------   0    0    0            2048 Dec 18 2024 [     30 02]  META_DAT
d---------   0    0    0            2048 Dec 18 2024 [     29 02]  USER_DAT

Directory listing of /META_DAT/
d---------   0    0    0            2048 Dec 18 2024 [     30 02]  .
d---------   0    0    0            2048 Dec 18 2024 [     28 02]  ..

Directory listing of /USER_DAT/
d---------   0    0    0            2048 Dec 18 2024 [     29 02]  .
d---------   0    0    0            2048 Dec 18 2024 [     28 02]  ..

This Rocky generic cloud image, it does NOT have VMware Tools (open-vm-tools package) installed on it - I checked into that. But you shouldn't need VMware Tools for cloud-init to initialize properly.

I am perplexed as to why cloud-init won't load properly, and I am about to drop kick this image and consider alternative ways of generating an image for this platform. I don't understand why these images work fine on public clouds, but not VMware. 

I may need to abandon this generic cloud image altogether and use another process. I am going to examine this Packer process. 

https://docs.rockylinux.org/guides/automation/templates-automation-packer-vsphere/

 

Thursday, June 20, 2024

New AI Book Arrived - Machine Learning for Algorithmic Trading

This thing is like 900 pages long.

You want to take a deep breath and make sure you're committed before you even open it.

I did check the Table of Contents and scrolled quickly through, and I see it's definitely a hands-on applied technology book using the Python programming language.

I will be blogging more about it when I get going.

 




Tuesday, June 4, 2024

What Makes an AI Chip?

I haven't been able to understand why the original chip pioneers, like Intel and AMD, have not been able to pivot in order to compete with NVidia (Stock Symbol: NVDA).

I know a few things, like the fact that when gaming became popular, NVidia made the graphics chips that had graphics acceleration and such. Graphics tend to draw polygons, and drawing polygons is geometric and trigonometric - which require floating point arithmetic (non-integer based mathematics). Floating point is difficult for a CPU to do, so much so that classical CPUs either offloaded or employed other tricks to do these kinds of computations.

Now, these graphics chips are the "rave" for AI. And Nvidia stock has gone through the roof while Intel and AMD have been left behind.

So what does an AI chip have, that is different from an older CPU?

  • Graphics processing units (GPUs) - used mainly for training AI models
  • Field-programmable gate arrays (FPGAs) - used mainly for inference
  • Application-specific integrated circuits (ASICs) - used in various capacities of AI

CPUs use all three of these in some form or another, but an AI chip has all three of these in a highly optimized and accelerated design. Things like prediction (such as branching prediction), parallelism, etc. They're simply better at running "algorithms".

This link, by the way, from NVidia, discusses the distinction between Training and Inference:
https://blogs.nvidia.com/blog/difference-deep-learning-training-inference-ai/

CPUs, they were so bent on running Microsoft for so long, and emulating continuous revisions of instructions to run Windows (286-->386-->486-->Pentium--> and on and on), that they just never went back and "rearchitected" or came up with new chip architectures. They sat back and collected money, along with Microsoft, to give you incremental versions of the same thing - for YEARS.

When you are doing training for an AI model, and you are running algorithmic loops millions upon millions of times, the efficiency and time start to add up - and make a huge difference in $$$ (MONEY). 

So the CPU companies, in order to "catch up", I think, with NVidia, would need to come up with a whole bunch of chip design software. Then there is the software kits necessary to develop to the chips. You also have the foundry (which uses manufacturing equipment, much of it custom per the design), etc. Meanwhile, NVidia has its rocket off the ground, with decreasing G forces (so to speak), which accelerates its orbit. It is easy to see why an increasing gap would occur.

But - when you have everyone (China, Russia, Intel, AMD, ARM, et al) all racing to catch up, they will at some point, catch up. I think. When NVidia slows down. We shall see.

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...