Tuesday, January 30, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 1 Introduction

I came across this book entitled Neural Networks and Deep Learning, by Michael Nielsen, and decided to invest some time in reading it.  I am hoping to reinforce what I have read thus far (for instance in the book AI Crash Course by Haydelin de Ponteves).

The first chapter in this book covers recognition of hand-written digits. This is a pretty cool use case for AI technology, given that everyone's individual handwriting constitutes a font in and of itself, and even then, there is a certain degree of variation every time an individual hand-writes characters. Who uses technology like this? Well, banks for example might use something like this as part of check processing!

I like the fact that this example is optical in nature - and, it could even be inverted in a Generative AI manner to generate new fonts, or new characters in some context or use case!

One of the first things this book covered was Perceptrons - and explained how Perceptrons are essentially binary (1s and 0s) as result output of action functions. He then moves on to explain that binary results are not always in demand or optimal, and that gradient values (shades of gray) between 0 and 1 are often desired or necessary. And this is why we have Sigmoid Neurons. 

This was insightful, because the first book I read (unless I overlooked) never even mentioned the concept of Perceptrons, and jumped right into Sigmoid Neurons.

Nielsen then goes on to explain, briefly, the architecture of Neural Networks and embarks on a practical example using the Handwritten Digits Recognition use case.

One thing that was helpful in this chapter, was the fact that he explains how these hidden layers of a neural network "fit together" by use of a practical example. This helps as far as design and modeling of a Neural Network go.

He goes on to discuss the purpose of using bias along with weights, and then goes into a deeper discussion on gradient descent - and stochastic gradient descent. Stochastic Gradient Descent is critical for reinforcement learning because it minimizes the cost function. Minimizing a cost function does no good though, if you can't "remember" it, so the concept of back propagation is required to instill back into the model, this minimization of cost.

I downloaded the code for his Handwriting Recognition example. I immediately figured out by reading the README.txt file (most people ignore this and waste time learning the harder way) that the code was not written for Python3. Fortunately, someone else had ported the code to Python3 - and this was in another Github repository.  

In running the code from the Python3 repository, I immediately ran into an error with a dependent library called Theano. This error was listed here: Fix for No Section Blas error 

So I was fortunate. This fix worked for me. And we got a successful run! In this example, the accuracy in recognizing handwritten digits was an astounding 99.1%!

Handwriting Recognition Example

This code uses a data sample from MNIST, comprised of 60K samples. The code by default breaks this into a 50/10 training and validation sample.

So now that we know that we can run the code successfully, what might we want to do next on this code, in this chapter, before moving on to the next chapter?

  • Experiment with the number of Epochs
  • Experiment with the Learning Rate
  • Experiment with the Mini Batch Size
  • Experiment with the number of hidden neurons

Here is a table that shows some results from playing with the epochs and mini batch sizes:

test.py

Run 1: Epochs=30, Mini Batch Size=10, Results: 8700/10,000at its peak in epoch 26

Run 2: Epochs=40, Mini Batch Size=10,Results: 9497/10,000 at its peak in epoch 36

Run 3: Epochs=40, Mini Batch Size=20,Results: 9496/10,000 at its peak in epoch 36

So interestingly, the numbers go up each epoch, until a peak is reached, and then the number settles back down for the last 2-3 epochs in these trial runs.  It appears, that the number of epochs is a more important coefficient than the mini batch size. 

Another test with adjusted learning rates could be interesting to run, as this impacts Gradient Descent quite a bit. As Gradient Descent is a Calculus concept for finding the minimum, when learning rates are too low or too high, it impacts the time and ability for the "ball to drop to the bottom" of said low point.

He also, at the very end, mentions the use of more/different algorithms, referencing the Support Vector Machine (SVM).


Artificial Intelligence - LTSM for Financial Modeling - Part II

As Part II of my foray into Stock Portfolio Optimization, I decided to implement Python code that I found in a blog: https://www.datacamp.com/tutorial/lstm-python-stock-market

This blog was written by a Data Scientist, Thuresan Ganegedar . And he spent some time on this. I was super excited after I read the blog, to go in and compile the code and run it myself. The first thing I did, was get myself an API Key so that I could fetch data for a stock symbol.

But - I had issue after issue in running this code. 

First, the code was sufficiently outdated that many libraries, especially Keras, had "moved on" and completely deprecated certain libraries, calls, etc. I spent considerable time picking through the code, doing web searches, and trying to "just get the code to run". 

Second, the code was using hard-coded array values for the number of data points for the stock symbol, and because of this, I ran into null exceptions because the number of data points I had in my stock symbol fetch was over 2K less than the example had. To get around this, I wound up going with 4000 training data points and 600 testing data points. Later, I enhanced the code to dynamically set the training and testing sizes based on how many data points came down in a csv, on a percentage basis.

I also decreased the window size and the batch size as well.

In the end, I was able to produce nice graphs that matched the ones in his blog for the Predictions based on Simple Average and Exponential Moving Average. But these graphs are based on algorithms, not a neural AI network - in other words, they aren't "true AI".

The AI portions of the code did run to completion, but the graphs that came out from implementing the AI, did not look correct. The price predictions were FAR FAR lower than the actuals, as well as the ones from the Simple and Exponential Moving Average graphs.

I may come back and take another look at this, but it was a good exercise, and this exercise made me realize that I need to read and learn more about AI - and what is going on "under the hood".

Tuesday, January 9, 2024

Artificial Intelligence - LTSM for Financial Modeling - Part I

I decided that it might be fun and interesting to see how AI might be used to optimize a stock portfolio.

Surely there is a ton of work and effort in the context of financial markets, right?

My search took me to an IEEE published research paper from 3 fellows who worked together at the Department of Data Science in Praxis Business School, based in Kolkata, India. 

Below is the link to this paper, which is a PDF.

Stock Portfolio Optimization Using a Deep Learning LSTM Model

In reading the Abstract of this paper, I can see that the Long Short-Term Memory Model is used. I was interested in the source code for this project, but couldn't figure out where it was located, so I decided instead to read up on LSTM.

Reading up on LSTM, I started to realize that LSTM was a preferred model for most Financial AI. 

Learning about LSTM requires a foundation and subsequent building blocks of knowledge, such as the topic of Recurrent Neural Networks (for starters). You have to start wading into the pool, with a level of comfort and confidence as the sub-topics materialize.

A link - the first link I read - on LSTM, is found here.

Understanding-LSTMs

I like this blog article, because it addresses the "Core Idea on LSTMs", and a "Step by Step LTSM Walkthrough". I couldn't help but notice that these models look, essentially, like State Transition Diagrams to me. State Transition is a key part of Artificial Intelligence I am realizing. And the diagrams start to look very electronic. Check out an electrical circuit diagram full of Transistors and Logic Gates, and you will see resemblance.

While this article was very helpful from a big-picture conceptual perspective, I got very confused by the fact that the diagrams showed both a "tanh" function and a "sigmoid" function. The symbol for sigmoid, I was familiar with. But the tanh left me scrambling to figure out what that was all about (it helps to be a math geek when you are delving into Artificial Intelligence). Here is a snippet of the diagram that sent me down this road of investigation:

tanh vs sigmoid used in activation functions

Here is an article I found that allowed me to understand what "tanh" is: sigmoid-vs-tanh-functions

From this blog article, I went and read a second paper on LTSM Models written by Akhter Rather, found at the following url:

LSTM-based Deep Learning Model for Stock Prediction and Predictive Optimization Model

So from here, I decided I wanted to see this LSTM Model in action. I couldn't find the source code from the Kolkata publication, but I felt there were probably more of these models around, and I decided to look for one. My next blog post will cover that.

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...