Friday, March 1, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 5 - Vanishing and Exploding Gradient

When training a Neural Net, it is important to have what is referred to as a Key Performance Indicator - a KPI. This is an objective, often numerical, way of "scoring" the aggregate output so that you can actually tell that the model is learning - that it is trained - and that the act of training the model is improving the output. This seems innate almost, but it is important to always step back and keep this in mind.

Chapter 5 discusses the effort that goes into training a Neural Net, but from the perspective of Efficiency. How well, is the Neural Net actually learning as you run through a specified number of Epochs, with whatever batch sizes you choose, etc.?

In this chapter, Michael Nielsen discusses the Vanishing Gradient. He graphs the "speed of learning" on each Hidden Layer, and it is super interesting to notice that these Hidden Layers do not learn at the same rate! 

In fact, the Hidden Layer closest to the Output always outperforms the preceding Hidden Layer in terms of speed of learning.

So after reading this, the next questions in my mind - and ones that I don't believe Michael Nielsen addresses head-on in his book, is 

  • how many Hidden Layers does one need?
  • how many Neurons are needed in a Hidden Layer?

I will go back and re-scan, but I don't think there are any Rules of Thumb, or general guidance tossed out in this regard - in either book I have covered thus far.  I believe that in the examples chosen in the books, the decisions about how to size (dimension) the Neural Network is more or less arbitrary.

So my next line of inquiry and research will be on the topic of how to "design" a Neural Network, at least from the outset, with respect to the sizing and dimensions.  That might well be my next post on this topic.

No comments:

Rocky Generic Cloud Image 9.4 - Image Prep, Cloud-Init and VMware Tools

  I just fixed an issue on these Rocky 9.x generic cloud images not booting properly on a VMWare platform. It turns o...