Grasping Technology: Neural Network

Showing posts with label Neural Network. Show all posts

Friday, March 1, 2024

Neural Network Architecture - Sizing and Dimensioning the Network

In my last blog post, I posed the question of how many hidden layers should be in a neural network, and how many hidden neurons should be in each hidden layer. This is related to the Neural Network Design, or Neural Network Architecture.

Well, I found the answer, I think, in the book entitled An Introduction to Neural Networks for Java authored by Jeff Heaton. I noticed, incidentally, that Jeff was doing AI and writing about it as early as 2008 - fifteen years ago prior to the current AI firestorm we see today - and possibly before that, using languages like Java, C# (C Sharp), and Encog (which I am unfamiliar with).

In this book, in Table 5.1 (Chapter 5), Jeff states (quoted):

"Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer. Table 5.1 summarizes the capabilities of neural network architectures with various hidden layers."

Jeff then has the following table...

"There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:

The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer."

Simple - and useful! Now, this is obviously a general rule of thumb, a starting point.

There is a Goldilocks method for choosing the right sizes of a Neural Network. If the number of neurons is too small, you get higher bias and underfitting. If you choose too many, you get the opposite problem of overfitting - not to mention the issue of wasting precious and expensive computational cycles on floating point processors (GPUs).

In fact, the process of calibrating a Neural Network leads to a concept of Pruning, where you examine which Neurons affect the total output, and prune out those that don't have the measure of contribution that makes a significant difference to the end result.

Wednesday, February 21, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 2 - Backpropagation

Backpropagation is the "secret sauce" of Neural Networks. And therefore, very important to understand.

Why?

Because it is how Neural Networks adapt and, well, learn.

Backpropagation is responsible, essentially, for re-updating weights (and, potentially bias also I suppose), after calculating the differences between actual results and predicted results, so that the cost is minimized over iterations of training, to ensure that weights (and biases) are optimized - and cost is minimized.

Doing so, is rather difficult, tedious and requires an understanding of Mathematics on several principle levels (i.e. Linear Algebra, Calculus, and even Trigonometry if you truly want to understand Sigmoid functions).

In Chapter 2 of this book, I was initially tracking along, and was following the discussion on Notation (for weights and biases in Neural Network Nodes). But that was as far as I got before I got confused and stuck in the envelope of intimidating mathematical equations.

I was able to push through and read it, but found that I didn't understand it, and after several attempts to reinforce by re-reading several times, I had to hit the eject button and look elsewhere for a simpler discussion on Backpropagation.

This decision to eject and look elsewhere for answers, paid huge dividends that allowed me to come back and comment on why I found this chapter so difficult.

His Cost function was unnecessarily complex
He did not need to consider biases, necessarily, in teaching the fundamentals of backpropagation
The notational symbols introduced are head-spinning

In the end, I stopped reading this chapter, because I don't know that trying to understand all of his jargon is necessary to get the gist and the essence of Backpropagation, even from the standpoint of having/getting some knowledge at a mathematical level on how it's calculated.

To give some credit where credit is due, this video from Mikael Lane helped me get a very good initial understanding of BackPropagation: Simplest Neural Network Backpropagation Example

Now, I did have a problem trying to understand where, at 3:30 or so of the video, he comes up with ∂C/∂w = 1.5 * 2(a-y) = 4.5 * w - 1.5

But, aside of that, his example helped me understand because he removed the bias from the equation! You don't really need a bias! Nobody else that I saw, had dumbed things down by doing this and it was extremely helpful. His explanation of how the Chain Rule of Differentiation is applied, was also timely and helpful.

NOTE: Mikael also has a 2nd follow-up video on the same topic:

Another Simple Backpropagation Example

From there, I went and watched another video, which does use the bias, but walks you through backpropagation in a way that makes it easier to grasp and understand, even with the Calculus used.

Credit for this video goes to Bevan Smith, and the video link can be found here:

Back Propagation in training neural networks step by step

Bevan gives a more thorough walk-through of calculations than the initial Mikael Lane video does. Both videos use the Least Squares method of Cost.

The cost function at the final output, is: Cost = (Ypredicted - Yactual)²

The derivative of this, is quite simple obviously, which helps in understanding examples: 2Yp - Ya

Nielsen, unfortunately, goes into none of this simple explanation, and chooses a Quadratic Cost Function that, for any newbie with rusty math skills, is downright scary to comprehend:

Quadratic Cost Function - Michael Nielsen Deep Learning

Nielsen then goes on to cover 4 Equations of Backpropagation which, frankly, look PhD level to me, or at least as far as I am concerned. There is some initial fun in reading this, as though you are perhaps an NSA decoder trying to figure out how to reverse engineer The Enigma (German CODEC used in WWII). But, after a while, you throw your hands up in the air on it. He even goes into some Mathematical Proofs of the equation (yikes). So this stuff is for very very heavy "Math People".

At the end, he does dump some source code in Python that you can run, which is cool, and that all looked to me like it worked fine when I ran it.

\begin{array}{rcl} (27) & C = \frac{1}{2} ‖ y - a^{L} ‖^{2} = \frac{1}{2} \sum_{j} (y_{j} - a_{j}^{L})^{2}, \end{array}

\begin{array}{rcl} (27) & C = \frac{1}{2} ‖ y - a^{L} ‖^{2} = \frac{1}{2} \sum_{j} (y_{j} - a_{j}^{L})^{2}, \end{array}

Tuesday, January 9, 2024

Artificial Intelligence - LTSM for Financial Modeling - Part I

I decided that it might be fun and interesting to see how AI might be used to optimize a stock portfolio.

Surely there is a ton of work and effort in the context of financial markets, right?

My search took me to an IEEE published research paper from 3 fellows who worked together at the Department of Data Science in Praxis Business School, based in Kolkata, India.

Below is the link to this paper, which is a PDF.

Stock Portfolio Optimization Using a Deep Learning LSTM Model

In reading the Abstract of this paper, I can see that the Long Short-Term Memory Model is used. I was interested in the source code for this project, but couldn't figure out where it was located, so I decided instead to read up on LSTM.

Reading up on LSTM, I started to realize that LSTM was a preferred model for most Financial AI.

Learning about LSTM requires a foundation and subsequent building blocks of knowledge, such as the topic of Recurrent Neural Networks (for starters). You have to start wading into the pool, with a level of comfort and confidence as the sub-topics materialize.

A link - the first link I read - on LSTM, is found here.

Understanding-LSTMs

I like this blog article, because it addresses the "Core Idea on LSTMs", and a "Step by Step LTSM Walkthrough". I couldn't help but notice that these models look, essentially, like State Transition Diagrams to me. State Transition is a key part of Artificial Intelligence I am realizing. And the diagrams start to look very electronic. Check out an electrical circuit diagram full of Transistors and Logic Gates, and you will see resemblance.

While this article was very helpful from a big-picture conceptual perspective, I got very confused by the fact that the diagrams showed both a "tanh" function and a "sigmoid" function. The symbol for sigmoid, I was familiar with. But the tanh left me scrambling to figure out what that was all about (it helps to be a math geek when you are delving into Artificial Intelligence). Here is a snippet of the diagram that sent me down this road of investigation:

tanh vs sigmoid used in activation functions

Here is an article I found that allowed me to understand what "tanh" is: sigmoid-vs-tanh-functions

From this blog article, I went and read a second paper on LTSM Models written by Akhter Rather, found at the following url:

LSTM-based Deep Learning Model for Stock Prediction and Predictive Optimization Model

So from here, I decided I wanted to see this LSTM Model in action. I couldn't find the source code from the Kolkata publication, but I felt there were probably more of these models around, and I decided to look for one. My next blog post will cover that.

Grasping Technology