Wednesday, February 21, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 2 - Backpropagation

Backpropagation is the "secret sauce" of Neural Networks. And therefore, very important to understand.

Why? 

Because it is how Neural Networks adapt and, well, learn.  

Backpropagation is responsible, essentially, for re-updating weights (and, potentially bias also I suppose), after calculating the differences between actual results and predicted results, so that the cost is minimized over iterations of training, to ensure that weights (and biases) are optimized - and cost is minimized.

Doing so, is rather difficult, tedious and requires an understanding of Mathematics on several principle levels (i.e. Linear Algebra, Calculus, and even Trigonometry if you truly want to understand Sigmoid functions).

In Chapter 2 of this book, I was initially tracking along, and was following the discussion on Notation (for weights and biases in Neural Network Nodes). But that was as far as I got before I got confused and stuck in the envelope of intimidating mathematical equations.

I was able to push through and read it, but found that I didn't understand it, and after several attempts to reinforce by re-reading several times, I had to hit the eject button and look elsewhere for a simpler discussion on Backpropagation.

This decision to eject and look elsewhere for answers, paid huge dividends that allowed me to come back and comment on why I found this chapter so difficult.

  1. His Cost function was unnecessarily complex
  2. He did not need to consider biases, necessarily, in teaching the fundamentals of backpropagation
  3. The notational symbols introduced are head-spinning

In the end, I stopped reading this chapter, because I don't know that trying to understand all of his jargon is necessary to get the gist and the essence of Backpropagation, even from the standpoint of having/getting some knowledge at a mathematical level on how it's calculated.

To give some credit where credit is due, this video from Mikael Lane helped me get a very good initial understanding of BackPropagation: Simplest Neural Network Backpropagation Example

Now, I did have a problem trying to understand where, at 3:30 or so of the video, he comes up with ∂C/∂w = 1.5 * 2(a-y) = 4.5 * w - 1.5

But, aside of that, his example helped me understand because he removed the bias from the equation! You don't really need a bias! Nobody else that I saw, had dumbed things down by doing this and it was extremely helpful. His explanation of how the Chain Rule of Differentiation is applied, was also timely and helpful.

NOTE: Mikael also has a 2nd follow-up video on the same topic: 

Another Simple Backpropagation Example

From there, I went and watched another video, which does use the bias, but walks you through backpropagation in a way that makes it easier to grasp and understand, even with the Calculus used.

Credit for this video goes to Bevan Smith, and the video link can be found here:

Back Propagation in training neural networks step by step 

Bevan gives a more thorough walk-through of calculations than the initial Mikael Lane video does. Both videos use the Least Squares method of Cost. 

The cost function at the final output, is: Cost = (Ypredicted - Yactual)²

The derivative of this, is quite simple obviously, which helps in understanding examples: 2Yp - Ya 

Nielsen, unfortunately, goes into none of this simple explanation, and chooses a Quadratic Cost Function that, for any newbie with rusty math skills, is downright scary to comprehend: 

Nielsen then goes on to cover 4 Equations of Backpropagation which, frankly, look PhD level to me, or at least as far as I am concerned. There is some initial fun in reading this, as though you are perhaps an NSA decoder trying to figure out how to reverse engineer The Enigma (German CODEC used in WWII). But, after a while, you throw your hands up in the air on it. He even goes into some Mathematical Proofs of the equation (yikes). So this stuff is for very very heavy "Math People".

At the end, he does dump some source code in Python that you can run, which is cool, and that all looked to me like it worked fine when I ran it.
 
 

C=12yaL2=12j(yjaLj)2C=12yaL2=12j(yjaLj)2

No comments:

Fixing Clustering and Disk Issues on an N+1 Morpheus CMP Cluster

I had performed an upgrade on Morpheus which I thought was fairly successful. I had some issues doing this upgrade on CentOS 7 because it wa...