Grasping Technology

Tuesday, January 30, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 1 Introduction

I came across this book entitled Neural Networks and Deep Learning, by Michael Nielsen, and decided to invest some time in reading it. I am hoping to reinforce what I have read thus far (for instance in the book AI Crash Course by Haydelin de Ponteves).

The first chapter in this book covers recognition of hand-written digits. This is a pretty cool use case for AI technology, given that everyone's individual handwriting constitutes a font in and of itself, and even then, there is a certain degree of variation every time an individual hand-writes characters. Who uses technology like this? Well, banks for example might use something like this as part of check processing!

I like the fact that this example is optical in nature - and, it could even be inverted in a Generative AI manner to generate new fonts, or new characters in some context or use case!

One of the first things this book covered was Perceptrons - and explained how Perceptrons are essentially binary (1s and 0s) as result output of action functions. He then moves on to explain that binary results are not always in demand or optimal, and that gradient values (shades of gray) between 0 and 1 are often desired or necessary. And this is why we have Sigmoid Neurons.

This was insightful, because the first book I read (unless I overlooked) never even mentioned the concept of Perceptrons, and jumped right into Sigmoid Neurons.

Nielsen then goes on to explain, briefly, the architecture of Neural Networks and embarks on a practical example using the Handwritten Digits Recognition use case.

One thing that was helpful in this chapter, was the fact that he explains how these hidden layers of a neural network "fit together" by use of a practical example. This helps as far as design and modeling of a Neural Network go.

He goes on to discuss the purpose of using bias along with weights, and then goes into a deeper discussion on gradient descent - and stochastic gradient descent. Stochastic Gradient Descent is critical for reinforcement learning because it minimizes the cost function. Minimizing a cost function does no good though, if you can't "remember" it, so the concept of back propagation is required to instill back into the model, this minimization of cost.

I downloaded the code for his Handwriting Recognition example. I immediately figured out by reading the README.txt file (most people ignore this and waste time learning the harder way) that the code was not written for Python3. Fortunately, someone else had ported the code to Python3 - and this was in another Github repository.

In running the code from the Python3 repository, I immediately ran into an error with a dependent library called Theano. This error was listed here: Fix for No Section Blas error

So I was fortunate. This fix worked for me. And we got a successful run! In this example, the accuracy in recognizing handwritten digits was an astounding 99.1%!

This code uses a data sample from MNIST, comprised of 60K samples. The code by default breaks this into a 50/10 training and validation sample.

So now that we know that we can run the code successfully, what might we want to do next on this code, in this chapter, before moving on to the next chapter?

Experiment with the number of Epochs
Experiment with the Learning Rate
Experiment with the Mini Batch Size
Experiment with the number of hidden neurons

Here is a table that shows some results from playing with the epochs and mini batch sizes:

test.py

Run 1: Epochs=30, Mini Batch Size=10, Results: 8700/10,000at its peak in epoch 26

Run 2: Epochs=40, Mini Batch Size=10,Results: 9497/10,000 at its peak in epoch 36

Run 3: Epochs=40, Mini Batch Size=20,Results: 9496/10,000 at its peak in epoch 36

So interestingly, the numbers go up each epoch, until a peak is reached, and then the number settles back down for the last 2-3 epochs in these trial runs. It appears, that the number of epochs is a more important coefficient than the mini batch size.

Another test with adjusted learning rates could be interesting to run, as this impacts Gradient Descent quite a bit. As Gradient Descent is a Calculus concept for finding the minimum, when learning rates are too low or too high, it impacts the time and ability for the "ball to drop to the bottom" of said low point.

He also, at the very end, mentions the use of more/different algorithms, referencing the Support Vector Machine (SVM).

Artificial Intelligence - LTSM for Financial Modeling - Part II

As Part II of my foray into Stock Portfolio Optimization, I decided to implement Python code that I found in a blog: https://www.datacamp.com/tutorial/lstm-python-stock-market

This blog was written by a Data Scientist, Thuresan Ganegedar . And he spent some time on this. I was super excited after I read the blog, to go in and compile the code and run it myself. The first thing I did, was get myself an API Key so that I could fetch data for a stock symbol.

But - I had issue after issue in running this code.

First, the code was sufficiently outdated that many libraries, especially Keras, had "moved on" and completely deprecated certain libraries, calls, etc. I spent considerable time picking through the code, doing web searches, and trying to "just get the code to run".

Second, the code was using hard-coded array values for the number of data points for the stock symbol, and because of this, I ran into null exceptions because the number of data points I had in my stock symbol fetch was over 2K less than the example had. To get around this, I wound up going with 4000 training data points and 600 testing data points. Later, I enhanced the code to dynamically set the training and testing sizes based on how many data points came down in a csv, on a percentage basis.

I also decreased the window size and the batch size as well.

In the end, I was able to produce nice graphs that matched the ones in his blog for the Predictions based on Simple Average and Exponential Moving Average. But these graphs are based on algorithms, not a neural AI network - in other words, they aren't "true AI".

The AI portions of the code did run to completion, but the graphs that came out from implementing the AI, did not look correct. The price predictions were FAR FAR lower than the actuals, as well as the ones from the Simple and Exponential Moving Average graphs.

I may come back and take another look at this, but it was a good exercise, and this exercise made me realize that I need to read and learn more about AI - and what is going on "under the hood".

Tuesday, January 9, 2024

Artificial Intelligence - LTSM for Financial Modeling - Part I

I decided that it might be fun and interesting to see how AI might be used to optimize a stock portfolio.

Surely there is a ton of work and effort in the context of financial markets, right?

My search took me to an IEEE published research paper from 3 fellows who worked together at the Department of Data Science in Praxis Business School, based in Kolkata, India.

Below is the link to this paper, which is a PDF.

Stock Portfolio Optimization Using a Deep Learning LSTM Model

In reading the Abstract of this paper, I can see that the Long Short-Term Memory Model is used. I was interested in the source code for this project, but couldn't figure out where it was located, so I decided instead to read up on LSTM.

Reading up on LSTM, I started to realize that LSTM was a preferred model for most Financial AI.

Learning about LSTM requires a foundation and subsequent building blocks of knowledge, such as the topic of Recurrent Neural Networks (for starters). You have to start wading into the pool, with a level of comfort and confidence as the sub-topics materialize.

A link - the first link I read - on LSTM, is found here.

Understanding-LSTMs

I like this blog article, because it addresses the "Core Idea on LSTMs", and a "Step by Step LTSM Walkthrough". I couldn't help but notice that these models look, essentially, like State Transition Diagrams to me. State Transition is a key part of Artificial Intelligence I am realizing. And the diagrams start to look very electronic. Check out an electrical circuit diagram full of Transistors and Logic Gates, and you will see resemblance.

While this article was very helpful from a big-picture conceptual perspective, I got very confused by the fact that the diagrams showed both a "tanh" function and a "sigmoid" function. The symbol for sigmoid, I was familiar with. But the tanh left me scrambling to figure out what that was all about (it helps to be a math geek when you are delving into Artificial Intelligence). Here is a snippet of the diagram that sent me down this road of investigation:

tanh vs sigmoid used in activation functions

Here is an article I found that allowed me to understand what "tanh" is: sigmoid-vs-tanh-functions

From this blog article, I went and read a second paper on LTSM Models written by Akhter Rather, found at the following url:

LSTM-based Deep Learning Model for Stock Prediction and Predictive Optimization Model

So from here, I decided I wanted to see this LSTM Model in action. I couldn't find the source code from the Kolkata publication, but I felt there were probably more of these models around, and I decided to look for one. My next blog post will cover that.

Monday, November 27, 2023

Quantum Computing - Is it worth an investment of my time to understand?

I recently watched a presentation done on Quantum Computing by a guy who looks to me, like he is following it as a fascination or hobby. After watching the presentation, I decided to do some quick searching on Quantum Computing to see if there were some things I was specifically looking for that I didn't see covered in the presentation - especially along practical lines.

I found this site, with essentially corroborated his presentation:

https://www.explainthatstuff.com/quantum-computing.html

Absolute KUDOS to the author of this site, because he explains an advanced topic in a simplistic way, and discusses many of the top-of-the-head questions one might have about Quantum Computing. If you can stay patient and get down to the bottom, he shows a patent of a Quantum Computing Architecture, which is super interesting.

https://cdn4.explainthatstuff.com/monroe-kim-quantum-computer.png

He also makes this statement, which is very interesting:

"Does that mean quantum computers are better than conventional ones? Not exactly. Apart from Shor's algorithm, and a search method called Grover's algorithm, hardly any other algorithms have been discovered that would be better performed by quantum methods."

I remembered reading a blurb about Einstein some years back, and some of his comments about "Spooky Action At a Distance", where an electron 'way over here' would seem to be inextricably and unexplainably linked to another electron "way over there". And, while we even today don't seem to have a full or proven explanation of why that behavior happens, we are apparently finding it reliable enough to exploit it for purposes of Quantum Computing (the key concept here is Entanglement). Through complex state manipulation (see Schrødinger's Cat), Entanglement unlocks massively parallel computing. I won't even attempt to go deeper than this on this post.

Now...why do we want atomic level computing? With states that are entangled (see concept of entanglement), and the whole bit?

The main Use Case for this level of expense and sophistication, is cryptography - the ability to break ciphers. After all, 256 bit AES encryption is moot for that level of super-computing.

But I wanted to see if there were others, and found this site, which kind of shows "who is doing what" with regards to Quantum Computing.

Quantum Computing Applications

I think between these links here, you can be brought up to speed on what Quantum Computing is, why it is a Thing, and some potential uses for it. Which is what most of us essentially want at this point.

Thursday, November 16, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Chapter 13 - Memory Patch

Okay, last chapter in the book!

In this chapter, you get to "create" (actually, "create" means download and run) some Github hosted code that allows you to train a model to learn how to play the video game "Snake".

Snake is an early video game, probably from the 1970s or 1980s. I don't know the details of it but I am sure there is plenty of history on it. I think you could run it on those Radio Shack Tandem TRS80 computers that had 640K of RAM on them and saved to a magnetic cassette tape (I remember you could play Pong, and I think Snake was one of them also).

The idea was that each time the snake ate an apple (red square) the snake's length would increase (by one square). You could move up, down, left, right constrained by coordinate boundaries, and if the snake overlapped with itself, it died and the game ended.

When I first ran the model training for this, it ran for more than a day - perhaps all weekend, and then died. The command prompt, when I returned to check on progress, had a [ Killed ] message.

I had other models in this book die this way, and decided that I was running out of memory, and my solution to the other models was to edit the source code, and decrease the number of Epochs, and reduce the loop complexity. This made the models a LOT less efficient and reliable, but I still saw beneficial results from running them with this tactic.

In this case, for some reason, I went to Github and looked at the Issues, and I saw a guy complaining about a Memory Leak in the Tensorflow libraries. There was a patch to fix this!

Below is a Unix/Linux "diff" command, which shows this patch:

% diff train.py train.py.memoryleak
5d4
< import tensorflow as tf
12,15d10
< import gc
< import os
< import keras
<
64,67c59
<             #qvalues = model.predict(currentState)[0]
<             qvalues = model.predict(tf.convert_to_tensor(currentState))[0]
<             gc.collect()
<             keras.backend.clear_session()
---
>             qvalues = model.predict(currentState)[0]

So in summary, the patches are:

The original statement qvalues = model.predict(currentState)[0] is replaced by:

qvalues = model.predict(tf.convert_to_tensor(currentState))[0]

There is also a garbage collect statement: gc.collect() that is added for the patch.
A Keras library call "clear_session()" has been added

Of course some imports are necessary to reference and use these new calls.

This fixes the memory problem. It does not appear that the training will ever end on its own when ou run this code. You have to Ctl-C it to get it to stop, because it just trains and trains, looking for a better score and more apples. I had to learn this the hard way after running train.py for a full weekend.

So this wraps up the book for me. I may do some review on it, and will likely move on to some new code samples and other books.

Friday, October 20, 2023

Artificial Intelligence Book 1 - Using Images in AI - Deep Convolutional Q Learning

I just finished Chapter 12 in my AI Crash Course book, from Hadelin de Ponteves.

Chapter 12 is a short chapter actually. It explains, in a refreshing and surprising simple way, the concept of Convolutional Q Learning which pertains to how image recognition/translation is fed into a Deep Q Neural Network (from prior chapters in the book).

The chapter covers four steps:

Convolution - applying feature detectors to an image
Max Pooling - simplifying the data
Flattening - taking all of the results of #1 and #2 and putting them into a one-dimensional array
Full Connection - feeding the one-dimensional array as Inputs into the Deep Q Learning model

I probably don't need to go over all of these details in this blog as that would be redundant.

If you have some exposure to Computing and are familiar with Bitmaps, I think this process has shares some conceptual similarity to Bitmaps.

For example, in Step 1 - Convolution - you are essentially sliding a feature detector or "filter" (e.g. 3x3 in the book) over an image - starting on Row 1, and sliding it left to right one one column at a time before dropping down to Row 2 and repeating that process. On each slide interval, you are employing a mapping process of multiplying each square of the image (again using a 3z3 area in the book) to the corresponding value of the map. Each individual iteration of this process creates a single Feature Map.

In sliding this 3x3 map across, you can only go 5 times to the right before you run out of real estate. Then you drop down - and you can only drop down 5 times until you run out of real estate in that direction. So if I am correct in my interpretation about how this works, you would get 5 x 5 = 25 Feature Maps with a 7x7 image and a 3x3 filter.

Pooling is actually similar to the process of filtering to a feature map. The aim of it is to reduce the size and complexity of all of those feature maps. The sliding process is the main difference; instead of going one column / row on each slide, you are sliding (using a 2x2 in the book) over the entire area of the pool size.

Once you get all of the pools, these are flattened out into a single dimensional array, and fed into the Inputs of the standard Q Learning model, with the outputs pertaining to image recognition.

This diagram shows how all of this is done, with a nice comparison between Biological image recognition, with this AI image recognition process.

Image Recognition - Biological vs AI

Source: frontiersin.org

Now in Chapter 12 of the book, the process represents what we see above. There is just a single Convolutional Layer and Pooling Layer before the AI Neural Network (hidden layers) are engaged.

Chapter 12 does not cover the fact that the Convolutional Layer is an iterative process that includes Convolution followed by Sub-Sampling in an iterative fashion.

This diagram below represents this.

In the next chapter, there is a code sample, so I will be able to see whether it includes this sub-sampling process or not.

Deep Q Learning - Neural Networks - Training the Model Takes Resources

I now am starting to see why those companies with deep pockets have an unfair advantage in the not-so-level playing field of adopting AI. Resources.

It takes a LOT of energy and computing resources to train these Artificial Intelligence models.

In Chapter 11 of AI Crash Course (by Hadelin de Ponteves), I did the work. I downloaded, inspected, and ran the examples, which are based on Google's Deep Mind project. The idea is to use an AI to control server temperature, and compare this with an "internal" (no AI) temperature manager.

What you would do, is to train the model (first), and it would produce a model.h5 file, that would then be used when you ran the actual model through testing.

The problem, though, is that on my rather powerful Mac Pro laptop, the training would never run. I would return HOURS later, only to see [ killed ] on the command prompt. The OS apparently was running out of resources (memory probably).

So I started tinkering with the code.

First, I reduced the number of epochs (from 25 to 10).

#number_epochs = 25

number_epochs = 10

Which looked like it helped, but ultimately didn't work.

Then, I reduced the number of times the training loops would run. When I looked at the original code, the number of iterations was enormous.

# STARTING THE LOOP OVER ALL THE TIMESTEPS (1 Timestep = 1 Minute) IN ONE EPOCH

while ((not game_over) and timestep <= 5 * 30 * 24 * 60):

This is 216,000 loop iterations in the inner loop, and of course this needs to be considered from the context of the outer loop (25, or, adjusted down to 10 as I did). So 216,000 * 25 = 5 million, 400 thousand. If we reduce to 10 the number of Epochs, we are dealing with 2 million, 600 thousand.

I don't know how much memory (Heap) is used over that many iterations but on a consumer machine, you are probably going to tax it pretty hard (remember it has to run the OS and whatever tasks happen to be running on it).

I was FINALLY able to get this to run by reducing the number of Epochs to 10, and reducing the steps to 5 * 30 * 24 (3600). And even with this drastic reduction, you could see the benefits the AI performed over the non-AI temperature control mechanism.