Grasping Technology

Thursday, June 20, 2024

New AI Book Arrived - Machine Learning for Algorithmic Trading

This thing is like 900 pages long.

You want to take a deep breath and make sure you're committed before you even open it.

I did check the Table of Contents and scrolled quickly through, and I see it's definitely a hands-on applied technology book using the Python programming language.

I will be blogging more about it when I get going.

Tuesday, June 4, 2024

What Makes an AI Chip?

I haven't been able to understand why the original chip pioneers, like Intel and AMD, have not been able to pivot in order to compete with NVidia (Stock Symbol: NVDA).

I know a few things, like the fact that when gaming became popular, NVidia made the graphics chips that had graphics acceleration and such. Graphics tend to draw polygons, and drawing polygons is geometric and trigonometric - which require floating point arithmetic (non-integer based mathematics). Floating point is difficult for a CPU to do, so much so that classical CPUs either offloaded or employed other tricks to do these kinds of computations.

Now, these graphics chips are the "rave" for AI. And Nvidia stock has gone through the roof while Intel and AMD have been left behind.

So what does an AI chip have, that is different from an older CPU?

Graphics processing units (GPUs) - used mainly for training AI models
Field-programmable gate arrays (FPGAs) - used mainly for inference
Application-specific integrated circuits (ASICs) - used in various capacities of AI

CPUs use all three of these in some form or another, but an AI chip has all three of these in a highly optimized and accelerated design. Things like prediction (such as branching prediction), parallelism, etc. They're simply better at running "algorithms".

This link, by the way, from NVidia, discusses the distinction between Training and Inference:
https://blogs.nvidia.com/blog/difference-deep-learning-training-inference-ai/

CPUs, they were so bent on running Microsoft for so long, and emulating continuous revisions of instructions to run Windows (286-->386-->486-->Pentium--> and on and on), that they just never went back and "rearchitected" or came up with new chip architectures. They sat back and collected money, along with Microsoft, to give you incremental versions of the same thing - for YEARS.

When you are doing training for an AI model, and you are running algorithmic loops millions upon millions of times, the efficiency and time start to add up - and make a huge difference in $$$ (MONEY).

So the CPU companies, in order to "catch up", I think, with NVidia, would need to come up with a whole bunch of chip design software. Then there is the software kits necessary to develop to the chips. You also have the foundry (which uses manufacturing equipment, much of it custom per the design), etc. Meanwhile, NVidia has its rocket off the ground, with decreasing G forces (so to speak), which accelerates its orbit. It is easy to see why an increasing gap would occur.

But - when you have everyone (China, Russia, Intel, AMD, ARM, et al) all racing to catch up, they will at some point, catch up. I think. When NVidia slows down. We shall see.

Tuesday, April 16, 2024

What is an Application Binary Interface (ABI)?

After someone mentioned Alma Linux to me, it seemed similar to Rocky Linux, and I wondered why there would be two Linux distros doing the same thing (picking up from CentOS and remaining RHEL compatible).

I read that "Rocky Linux is a 1-to-1 binary to RHEL while AlmaLinux is Application Binary Interface-compatible with RHEL".

Wow. Now, not only did I learn about a new Linux distro, but I also have to run down what an Application Binary Interface, or ABI is.

Referring to this, Stack Exchange post: https://stackoverflow.com/questions/2171177/what-is-an-application-binary-interface-abi, I liked this "oversimplified summary":

API: "Here are all the functions you may call."

ABI: "This is how to call a function."

Friday, March 1, 2024

I thought MacOS was based on Linux - and apparently I was wrong!

I came across this link, which discusses some things I found interesting to learn:

Linux is a Monolithic Kernel - I thought because you could load and unload kernel modules, that the Linux kernel had morphed into more of a Microkernel architecture because of this. But apparently not?
The macOS kernel is officially known as XNU, which stands for “XNU is Not Unix.”

According to Apple's GitHub page:

"XNU is a hybrid kernel combining the Mach kernel developed at Carnegie Mellon University with components from FreeBSD and C++ API for writing drivers”.

Very interesting. I stand corrected now on MacOS being based on Linux.

Neural Network Architecture - Sizing and Dimensioning the Network

In my last blog post, I posed the question of how many hidden layers should be in a neural network, and how many hidden neurons should be in each hidden layer. This is related to the Neural Network Design, or Neural Network Architecture.

Well, I found the answer, I think, in the book entitled An Introduction to Neural Networks for Java authored by Jeff Heaton. I noticed, incidentally, that Jeff was doing AI and writing about it as early as 2008 - fifteen years ago prior to the current AI firestorm we see today - and possibly before that, using languages like Java, C# (C Sharp), and Encog (which I am unfamiliar with).

In this book, in Table 5.1 (Chapter 5), Jeff states (quoted):

"Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer. Table 5.1 summarizes the capabilities of neural network architectures with various hidden layers."

Jeff then has the following table...

"There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:

The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer."

Simple - and useful! Now, this is obviously a general rule of thumb, a starting point.

There is a Goldilocks method for choosing the right sizes of a Neural Network. If the number of neurons is too small, you get higher bias and underfitting. If you choose too many, you get the opposite problem of overfitting - not to mention the issue of wasting precious and expensive computational cycles on floating point processors (GPUs).

In fact, the process of calibrating a Neural Network leads to a concept of Pruning, where you examine which Neurons affect the total output, and prune out those that don't have the measure of contribution that makes a significant difference to the end result.

AI - Neural Networks and Deep Learning - Nielsen - Chap 5 - Vanishing and Exploding Gradient

When training a Neural Net, it is important to have what is referred to as a Key Performance Indicator - a KPI. This is an objective, often numerical, way of "scoring" the aggregate output so that you can actually tell that the model is learning - that it is trained - and that the act of training the model is improving the output. This seems innate almost, but it is important to always step back and keep this in mind.

Chapter 5 discusses the effort that goes into training a Neural Net, but from the perspective of Efficiency. How well, is the Neural Net actually learning as you run through a specified number of Epochs, with whatever batch sizes you choose, etc.?

In this chapter, Michael Nielsen discusses the Vanishing Gradient. He graphs the "speed of learning" on each Hidden Layer, and it is super interesting to notice that these Hidden Layers do not learn at the same rate!

In fact, the Hidden Layer closest to the Output always outperforms the preceding Hidden Layer in terms of speed of learning.

So after reading this, the next questions in my mind - and ones that I don't believe Michael Nielsen addresses head-on in his book, is

how many Hidden Layers does one need?
how many Neurons are needed in a Hidden Layer?

I will go back and re-scan, but I don't think there are any Rules of Thumb, or general guidance tossed out in this regard - in either book I have covered thus far. I believe that in the examples chosen in the books, the decisions about how to size (dimension) the Neural Network is more or less arbitrary.

So my next line of inquiry and research will be on the topic of how to "design" a Neural Network, at least from the outset, with respect to the sizing and dimensions. That might well be my next post on this topic.

Friday, February 23, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 3 - Learning Improvement

As if Chapter 2 wasn't heavy enough, I moved onto Chapter 3, which introduced some great concepts, which I will mention here in this blog post. But, I couldn't really follow the detail very well, largely again due to the heavy mathematical expressions and notations used.

But, I will iterate what he covers in this chapter, and I think each one of these will require its own "separate study", preferably in a simpler way and manner.

Chapter 3 discusses more efficient ways to learn.

It starts out discussing the Cost Function.

In previous chapters, Nielsen uses the Quadratic Cost Function. But in Chapter 3, he introduces the Cross-Entropy Cost Function, and discusses how by using this, it avoids learning slowdown. Unfortunately, I can't comment much further on this because frankly, I got completely lost in this discussion.

He spends a GREAT DEAL of text discussing Cross-Entropy, including a discussion about the fact that he uses different Learning Rates on the Quadratic Cost Function (.15), versus the Cross-Entropy Cost Function (.005) - and discusses that the rates being different doesn't matter because it is more about how the speed of learning changes, than the actual speed of learning.

After a while, he mentioned an alternative to Cross-Entropy called Softmax. Now, this term seemed familiar. In doing a backcheck, I found that Softmax was used in the first book I read, called AI Crash Course by Haydelin de Ponteves. I remembered both Softmax and Argmax being mentioned.

Softmax introduces a layer of neurons parallel to the output neurons. So if you had 4 output neurons, you would have 4 Softmax neurons preceding the 4 output neurons. What Softmax does, is return a Probability Distribution. All of the neurons add up to 1, and if one neuron decreases, there must be an alternative increase amongst the other Softmax neurons. This could be useful, for example, in cases where the AI is guessing which animal type it is: Dog, Cat, Parrot, Snake. You might see a higher correlation between Dog and Cat, and a lower one with the Parrot and Snake.

Nielsen then goes on to discuss Overfitting (OverTraining) and Regularization, which is designed to combat Overfitting. He discusses four approaches to Regularization, which I won't echo here, as I clearly will need to consult elsewhere for simpler discussions, definitions and examples of these.