Backpropagation is the "secret sauce" of Neural Networks. And therefore, very important to understand.
Why?
Because it is how Neural Networks adapt and, well, learn.
Backpropagation is responsible, essentially, for re-updating weights (and, potentially bias also I suppose), after calculating the differences between actual results and predicted results, so that the cost is minimized over iterations of training, to ensure that weights (and biases) are optimized - and cost is minimized.
Doing so, is rather difficult, tedious and requires an understanding of Mathematics on several principle levels (i.e. Linear Algebra, Calculus, and even Trigonometry if you truly want to understand Sigmoid functions).
In Chapter 2 of this book, I was initially tracking along, and was following the discussion on Notation (for weights and biases in Neural Network Nodes). But that was as far as I got before I got confused and stuck in the envelope of intimidating mathematical equations.
I was able to push through and read it, but found that I didn't understand it, and after several attempts to reinforce by re-reading several times, I had to hit the eject button and look elsewhere for a simpler discussion on Backpropagation.
This decision to eject and look elsewhere for answers, paid huge dividends that allowed me to come back and comment on why I found this chapter so difficult.
- His Cost function was unnecessarily complex
- He did not need to consider biases, necessarily, in teaching the fundamentals of backpropagation
- The notational symbols introduced are head-spinning
In the end, I stopped reading this chapter, because I don't know that trying to understand all of his jargon is necessary to get the gist and the essence of Backpropagation, even from the standpoint of having/getting some knowledge at a mathematical level on how it's calculated.
To give some credit where credit is due, this video from Mikael Lane helped me get a very good initial understanding of BackPropagation: Simplest Neural Network Backpropagation Example
Now, I did have a problem trying to understand where, at 3:30 or so of the video, he comes up with ∂C/∂w = 1.5 * 2(a-y) = 4.5 * w - 1.5
But, aside of that, his example helped me understand because he removed the bias from the equation! You don't really need a bias! Nobody else that I saw, had dumbed things down by doing this and it was extremely helpful. His explanation of how the Chain Rule of Differentiation is applied, was also timely and helpful.
NOTE: Mikael also has a 2nd follow-up video on the same topic:
Another Simple Backpropagation Example
From there, I went and watched another video, which does use the bias, but walks you through backpropagation in a way that makes it easier to grasp and understand, even with the Calculus used.
Credit for this video goes to Bevan Smith, and the video link can be found here:
Back Propagation in training neural networks step by step
Bevan gives a more thorough walk-through of calculations than the initial Mikael Lane video does. Both videos use the Least Squares method of Cost.
The cost function at the final output, is: Cost = (Ypredicted - Yactual)²
The derivative of this, is quite simple obviously, which helps in understanding examples: 2Yp - Ya
Nielsen, unfortunately, goes into none of this simple explanation, and chooses a Quadratic Cost Function that, for any newbie with rusty math skills, is downright scary to comprehend:
C=12‖y−aL‖2=12∑j(yj−aLj)2C=12‖y−aL‖2=12∑j(yj−aLj)2
No comments:
Post a Comment