Friday, March 1, 2024

I thought MacOS was based on Linux - and apparently I was wrong!

I came across this link, which discusses some things I found interesting to learn:

  • Linux is a Monolithic Kernel - I thought because you could load and unload kernel modules, that the Linux kernel had morphed into more of a Microkernel architecture because of this. But apparently not?
  •  The macOS kernel is officially known as XNU, which stands for “XNU is Not Unix.” 
 According to Apple's GitHub page:

 "XNU is a hybrid kernel combining the Mach kernel developed at Carnegie Mellon University with components from FreeBSD and C++ API for writing drivers”.

  Very interesting. I stand corrected now on MacOS being based on Linux.

Neural Network Architecture - Sizing and Dimensioning the Network

In my last blog post, I posed the question of how many hidden layers should be in a neural network, and how many hidden neurons should be in each hidden layer. This is related to the Neural Network Design, or Neural Network Architecture.

Well, I found the answer, I think, in the book entitled An Introduction to Neural Networks for Java authored by Jeff Heaton. I noticed, incidentally, that Jeff was doing AI and writing about it as early as 2008 - fifteen years ago prior to the current AI firestorm we see today - and possibly before that, using languages like Java, C# (C Sharp), and Encog (which I am unfamiliar with).

In this book, in Table 5.1 (Chapter 5), Jeff states (quoted):

"Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer. Table 5.1 summarizes the capabilities of neural network architectures with various hidden layers." 

Jeff then has the following table...

"There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:

  • The number of hidden neurons should be between the size of the input layer and the size of the output layer.
  • The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
  • The number of hidden neurons should be less than twice the size of the input layer."

Simple - and useful! Now, this is obviously a general rule of thumb, a starting point.

There is a Goldilocks method for choosing the right sizes of a Neural Network. If the number of neurons is too small, you get higher bias and underfitting. If you choose too many, you get the opposite problem of overfitting - not to mention the issue of wasting precious and expensive computational cycles on floating point processors (GPUs).

In fact, the process of calibrating a Neural Network leads to a concept of Pruning, where you examine which Neurons affect the total output, and prune out those that don't have the measure of contribution that makes a significant difference to the end result.

AI - Neural Networks and Deep Learning - Nielsen - Chap 5 - Vanishing and Exploding Gradient

When training a Neural Net, it is important to have what is referred to as a Key Performance Indicator - a KPI. This is an objective, often numerical, way of "scoring" the aggregate output so that you can actually tell that the model is learning - that it is trained - and that the act of training the model is improving the output. This seems innate almost, but it is important to always step back and keep this in mind.

Chapter 5 discusses the effort that goes into training a Neural Net, but from the perspective of Efficiency. How well, is the Neural Net actually learning as you run through a specified number of Epochs, with whatever batch sizes you choose, etc.?

In this chapter, Michael Nielsen discusses the Vanishing Gradient. He graphs the "speed of learning" on each Hidden Layer, and it is super interesting to notice that these Hidden Layers do not learn at the same rate! 

In fact, the Hidden Layer closest to the Output always outperforms the preceding Hidden Layer in terms of speed of learning.

So after reading this, the next questions in my mind - and ones that I don't believe Michael Nielsen addresses head-on in his book, is 

  • how many Hidden Layers does one need?
  • how many Neurons are needed in a Hidden Layer?

I will go back and re-scan, but I don't think there are any Rules of Thumb, or general guidance tossed out in this regard - in either book I have covered thus far.  I believe that in the examples chosen in the books, the decisions about how to size (dimension) the Neural Network is more or less arbitrary.

So my next line of inquiry and research will be on the topic of how to "design" a Neural Network, at least from the outset, with respect to the sizing and dimensions.  That might well be my next post on this topic.

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...