Grasping Technology

Friday, February 23, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 3 - Learning Improvement

As if Chapter 2 wasn't heavy enough, I moved onto Chapter 3, which introduced some great concepts, which I will mention here in this blog post. But, I couldn't really follow the detail very well, largely again due to the heavy mathematical expressions and notations used.

But, I will iterate what he covers in this chapter, and I think each one of these will require its own "separate study", preferably in a simpler way and manner.

Chapter 3 discusses more efficient ways to learn.

It starts out discussing the Cost Function.

In previous chapters, Nielsen uses the Quadratic Cost Function. But in Chapter 3, he introduces the Cross-Entropy Cost Function, and discusses how by using this, it avoids learning slowdown. Unfortunately, I can't comment much further on this because frankly, I got completely lost in this discussion.

He spends a GREAT DEAL of text discussing Cross-Entropy, including a discussion about the fact that he uses different Learning Rates on the Quadratic Cost Function (.15), versus the Cross-Entropy Cost Function (.005) - and discusses that the rates being different doesn't matter because it is more about how the speed of learning changes, than the actual speed of learning.

After a while, he mentioned an alternative to Cross-Entropy called Softmax. Now, this term seemed familiar. In doing a backcheck, I found that Softmax was used in the first book I read, called AI Crash Course by Haydelin de Ponteves. I remembered both Softmax and Argmax being mentioned.

Softmax introduces a layer of neurons parallel to the output neurons. So if you had 4 output neurons, you would have 4 Softmax neurons preceding the 4 output neurons. What Softmax does, is return a Probability Distribution. All of the neurons add up to 1, and if one neuron decreases, there must be an alternative increase amongst the other Softmax neurons. This could be useful, for example, in cases where the AI is guessing which animal type it is: Dog, Cat, Parrot, Snake. You might see a higher correlation between Dog and Cat, and a lower one with the Parrot and Snake.

Nielsen then goes on to discuss Overfitting (OverTraining) and Regularization, which is designed to combat Overfitting. He discusses four approaches to Regularization, which I won't echo here, as I clearly will need to consult elsewhere for simpler discussions, definitions and examples of these.

Wednesday, February 21, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 2 - Backpropagation

Backpropagation is the "secret sauce" of Neural Networks. And therefore, very important to understand.

Why?

Because it is how Neural Networks adapt and, well, learn.

Backpropagation is responsible, essentially, for re-updating weights (and, potentially bias also I suppose), after calculating the differences between actual results and predicted results, so that the cost is minimized over iterations of training, to ensure that weights (and biases) are optimized - and cost is minimized.

Doing so, is rather difficult, tedious and requires an understanding of Mathematics on several principle levels (i.e. Linear Algebra, Calculus, and even Trigonometry if you truly want to understand Sigmoid functions).

In Chapter 2 of this book, I was initially tracking along, and was following the discussion on Notation (for weights and biases in Neural Network Nodes). But that was as far as I got before I got confused and stuck in the envelope of intimidating mathematical equations.

I was able to push through and read it, but found that I didn't understand it, and after several attempts to reinforce by re-reading several times, I had to hit the eject button and look elsewhere for a simpler discussion on Backpropagation.

This decision to eject and look elsewhere for answers, paid huge dividends that allowed me to come back and comment on why I found this chapter so difficult.

His Cost function was unnecessarily complex
He did not need to consider biases, necessarily, in teaching the fundamentals of backpropagation
The notational symbols introduced are head-spinning

In the end, I stopped reading this chapter, because I don't know that trying to understand all of his jargon is necessary to get the gist and the essence of Backpropagation, even from the standpoint of having/getting some knowledge at a mathematical level on how it's calculated.

To give some credit where credit is due, this video from Mikael Lane helped me get a very good initial understanding of BackPropagation: Simplest Neural Network Backpropagation Example

Now, I did have a problem trying to understand where, at 3:30 or so of the video, he comes up with ∂C/∂w = 1.5 * 2(a-y) = 4.5 * w - 1.5

But, aside of that, his example helped me understand because he removed the bias from the equation! You don't really need a bias! Nobody else that I saw, had dumbed things down by doing this and it was extremely helpful. His explanation of how the Chain Rule of Differentiation is applied, was also timely and helpful.

NOTE: Mikael also has a 2nd follow-up video on the same topic:

Another Simple Backpropagation Example

From there, I went and watched another video, which does use the bias, but walks you through backpropagation in a way that makes it easier to grasp and understand, even with the Calculus used.

Credit for this video goes to Bevan Smith, and the video link can be found here:

Back Propagation in training neural networks step by step

Bevan gives a more thorough walk-through of calculations than the initial Mikael Lane video does. Both videos use the Least Squares method of Cost.

The cost function at the final output, is: Cost = (Ypredicted - Yactual)²

The derivative of this, is quite simple obviously, which helps in understanding examples: 2Yp - Ya

Nielsen, unfortunately, goes into none of this simple explanation, and chooses a Quadratic Cost Function that, for any newbie with rusty math skills, is downright scary to comprehend:

Quadratic Cost Function - Michael Nielsen Deep Learning

Nielsen then goes on to cover 4 Equations of Backpropagation which, frankly, look PhD level to me, or at least as far as I am concerned. There is some initial fun in reading this, as though you are perhaps an NSA decoder trying to figure out how to reverse engineer The Enigma (German CODEC used in WWII). But, after a while, you throw your hands up in the air on it. He even goes into some Mathematical Proofs of the equation (yikes). So this stuff is for very very heavy "Math People".

At the end, he does dump some source code in Python that you can run, which is cool, and that all looked to me like it worked fine when I ran it.

\begin{array}{rcl} (27) & C = \frac{1}{2} ‖ y - a^{L} ‖^{2} = \frac{1}{2} \sum_{j} (y_{j} - a_{j}^{L})^{2}, \end{array}

\begin{array}{rcl} (27) & C = \frac{1}{2} ‖ y - a^{L} ‖^{2} = \frac{1}{2} \sum_{j} (y_{j} - a_{j}^{L})^{2}, \end{array}

Thursday, February 8, 2024

Linux Phones Are Mature - But US Carriers Won't Allow Them

Today I looked into the status of some of the Linux phones, which are mature now.

Librem is one of the ones most people have heard about, but the price point on it is out of reach for anyone daring enough to jump in the pool and start swimming with a Linux phone.

Pinephone looks like it has a pretty darn nice Linux phone now, but after watching a few reviews, it is pretty clear that you need to go with the Pinephone Pro, and put a fast(er) Linux OS on it.

The main issue with performance on these phones, has to do with the graphics rendering. If you are running the Gnome Desktop for example, the GUI is going to take up most of the cycles and resources that you want for your applications. I learned this on regular Linux running on desktop servers years ago, and got into the habit of installing a more lightweight KDE desktop to try and get some of my resources back under my control.

Today, I found a German phone that apparently is really gaining in popularity in Europe - especially Germany. It is called Volla Phone. Super nice phone, and they have done some work selecting the hardware components and optimizing the Linux distro for you, so that you don't have to spend hours and hours tweaking, configuring, and putting different OS images on the phone to squeeze performance out of it.

Volla Phone - Linux Privacy Phone

Problem is - United States carriers don't allow these phones! They are not on the "Compatibility List". Now, I understand there might be an FCC cost to certifying devices on a cellular network (I have not verified this). The frequencies matter of course, but the SIM cards also matter. Volla Phone will, for instance, apparently work on T-Mobile, but only if you have an older SIM card. If you are on T-Mobile and have a new SIM card, then it won't work because of some fields that aren't exchanged (if I understand correctly).

Carriers that are in bed with Google and Apple, such as at&t and Verizon, they're going to do everything they can to prevent a Linux BYOD (Bring Your Own Device) phone hitting their network. They make too much $$$$$$$$$$$$ off of Apple and Android. T-Mobile, they're German of course, so maybe they have a little bit more of the European mindset. These are your three network rollouts across the United States, and all of your mom and pop cellular plays (i.e. Spectrum Mobile, Cricket, et al) are just MVNOs riding on that infrastructure.

So if you have one of these Linux phones, you can use it in your home. On WiFi. But if you carry it outdoors, it's a brick apparently. Here we are in 2024, and that STILL seems to be the case.

Wednesday, February 7, 2024

ZWave Plus - First Alert Combo - CO2 and Fire

I thought an ideal use case for ZWave devices, would be a Carbon Monoxide detector and/or Fire Detector that would send you an alert on certain conditions.

Well, sure enough, there is one. The First Alert Combo.

I bought one of these, and they were priced right - about $40. I had a First Alert fire detector mounted in the spot for it, and the bracket was exactly the same as the new one (how nice is that?). So mounting it took 2 seconds.

The next step was to pair it to the ZWave Hub, and from experience, that can be a hassle with certain devices. I was a little bit worried that they were marketing it for a Ring Hub, with a concern it might work for a Ring Hub but not for a more standard hub like a Wink2 or a Hubitat.

But it only took 3 tries to get it to pair.

First Attempt:

I added the batteries.
I slid the batteries in, holding the Test button
I released the Test button
I added a device on my hub (by vendor, First Alert was listed), and clicked "start inclusion".

This didn't work. So, I re-attempted.

I slide out the battery tray
I went to my hub app, added the device by vendor, and clicked "start inclusion"
I pulled out one of the two batteries
I reinserted the battery I pulled out.
I slid the battery tray back in, holding the Test Button while sliding
I released the Test button

At this point, I watched the app, and it sat there on "Start Inclusion", as though it were processing. I decided to leave it, and then came back, re-launched the app, hit "Connect to Hub" and then "Devices", and there it was! I hit Refresh on the app, and all of the pertinent statistics showed up.

Not too bad at all.

Tuesday, January 30, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 1 Introduction

I came across this book entitled Neural Networks and Deep Learning, by Michael Nielsen, and decided to invest some time in reading it. I am hoping to reinforce what I have read thus far (for instance in the book AI Crash Course by Haydelin de Ponteves).

The first chapter in this book covers recognition of hand-written digits. This is a pretty cool use case for AI technology, given that everyone's individual handwriting constitutes a font in and of itself, and even then, there is a certain degree of variation every time an individual hand-writes characters. Who uses technology like this? Well, banks for example might use something like this as part of check processing!

I like the fact that this example is optical in nature - and, it could even be inverted in a Generative AI manner to generate new fonts, or new characters in some context or use case!

One of the first things this book covered was Perceptrons - and explained how Perceptrons are essentially binary (1s and 0s) as result output of action functions. He then moves on to explain that binary results are not always in demand or optimal, and that gradient values (shades of gray) between 0 and 1 are often desired or necessary. And this is why we have Sigmoid Neurons.

This was insightful, because the first book I read (unless I overlooked) never even mentioned the concept of Perceptrons, and jumped right into Sigmoid Neurons.

Nielsen then goes on to explain, briefly, the architecture of Neural Networks and embarks on a practical example using the Handwritten Digits Recognition use case.

One thing that was helpful in this chapter, was the fact that he explains how these hidden layers of a neural network "fit together" by use of a practical example. This helps as far as design and modeling of a Neural Network go.

He goes on to discuss the purpose of using bias along with weights, and then goes into a deeper discussion on gradient descent - and stochastic gradient descent. Stochastic Gradient Descent is critical for reinforcement learning because it minimizes the cost function. Minimizing a cost function does no good though, if you can't "remember" it, so the concept of back propagation is required to instill back into the model, this minimization of cost.

I downloaded the code for his Handwriting Recognition example. I immediately figured out by reading the README.txt file (most people ignore this and waste time learning the harder way) that the code was not written for Python3. Fortunately, someone else had ported the code to Python3 - and this was in another Github repository.

In running the code from the Python3 repository, I immediately ran into an error with a dependent library called Theano. This error was listed here: Fix for No Section Blas error

So I was fortunate. This fix worked for me. And we got a successful run! In this example, the accuracy in recognizing handwritten digits was an astounding 99.1%!

This code uses a data sample from MNIST, comprised of 60K samples. The code by default breaks this into a 50/10 training and validation sample.

So now that we know that we can run the code successfully, what might we want to do next on this code, in this chapter, before moving on to the next chapter?

Experiment with the number of Epochs
Experiment with the Learning Rate
Experiment with the Mini Batch Size
Experiment with the number of hidden neurons

Here is a table that shows some results from playing with the epochs and mini batch sizes:

test.py

Run 1: Epochs=30, Mini Batch Size=10, Results: 8700/10,000at its peak in epoch 26

Run 2: Epochs=40, Mini Batch Size=10,Results: 9497/10,000 at its peak in epoch 36

Run 3: Epochs=40, Mini Batch Size=20,Results: 9496/10,000 at its peak in epoch 36

So interestingly, the numbers go up each epoch, until a peak is reached, and then the number settles back down for the last 2-3 epochs in these trial runs. It appears, that the number of epochs is a more important coefficient than the mini batch size.

Another test with adjusted learning rates could be interesting to run, as this impacts Gradient Descent quite a bit. As Gradient Descent is a Calculus concept for finding the minimum, when learning rates are too low or too high, it impacts the time and ability for the "ball to drop to the bottom" of said low point.

He also, at the very end, mentions the use of more/different algorithms, referencing the Support Vector Machine (SVM).

Artificial Intelligence - LTSM for Financial Modeling - Part II

As Part II of my foray into Stock Portfolio Optimization, I decided to implement Python code that I found in a blog: https://www.datacamp.com/tutorial/lstm-python-stock-market

This blog was written by a Data Scientist, Thuresan Ganegedar . And he spent some time on this. I was super excited after I read the blog, to go in and compile the code and run it myself. The first thing I did, was get myself an API Key so that I could fetch data for a stock symbol.

But - I had issue after issue in running this code.

First, the code was sufficiently outdated that many libraries, especially Keras, had "moved on" and completely deprecated certain libraries, calls, etc. I spent considerable time picking through the code, doing web searches, and trying to "just get the code to run".

Second, the code was using hard-coded array values for the number of data points for the stock symbol, and because of this, I ran into null exceptions because the number of data points I had in my stock symbol fetch was over 2K less than the example had. To get around this, I wound up going with 4000 training data points and 600 testing data points. Later, I enhanced the code to dynamically set the training and testing sizes based on how many data points came down in a csv, on a percentage basis.

I also decreased the window size and the batch size as well.

In the end, I was able to produce nice graphs that matched the ones in his blog for the Predictions based on Simple Average and Exponential Moving Average. But these graphs are based on algorithms, not a neural AI network - in other words, they aren't "true AI".

The AI portions of the code did run to completion, but the graphs that came out from implementing the AI, did not look correct. The price predictions were FAR FAR lower than the actuals, as well as the ones from the Simple and Exponential Moving Average graphs.

I may come back and take another look at this, but it was a good exercise, and this exercise made me realize that I need to read and learn more about AI - and what is going on "under the hood".

Tuesday, January 9, 2024

Artificial Intelligence - LTSM for Financial Modeling - Part I

I decided that it might be fun and interesting to see how AI might be used to optimize a stock portfolio.

Surely there is a ton of work and effort in the context of financial markets, right?

My search took me to an IEEE published research paper from 3 fellows who worked together at the Department of Data Science in Praxis Business School, based in Kolkata, India.

Below is the link to this paper, which is a PDF.

Stock Portfolio Optimization Using a Deep Learning LSTM Model

In reading the Abstract of this paper, I can see that the Long Short-Term Memory Model is used. I was interested in the source code for this project, but couldn't figure out where it was located, so I decided instead to read up on LSTM.

Reading up on LSTM, I started to realize that LSTM was a preferred model for most Financial AI.

Learning about LSTM requires a foundation and subsequent building blocks of knowledge, such as the topic of Recurrent Neural Networks (for starters). You have to start wading into the pool, with a level of comfort and confidence as the sub-topics materialize.

A link - the first link I read - on LSTM, is found here.

Understanding-LSTMs

I like this blog article, because it addresses the "Core Idea on LSTMs", and a "Step by Step LTSM Walkthrough". I couldn't help but notice that these models look, essentially, like State Transition Diagrams to me. State Transition is a key part of Artificial Intelligence I am realizing. And the diagrams start to look very electronic. Check out an electrical circuit diagram full of Transistors and Logic Gates, and you will see resemblance.

While this article was very helpful from a big-picture conceptual perspective, I got very confused by the fact that the diagrams showed both a "tanh" function and a "sigmoid" function. The symbol for sigmoid, I was familiar with. But the tanh left me scrambling to figure out what that was all about (it helps to be a math geek when you are delving into Artificial Intelligence). Here is a snippet of the diagram that sent me down this road of investigation:

tanh vs sigmoid used in activation functions

Here is an article I found that allowed me to understand what "tanh" is: sigmoid-vs-tanh-functions

From this blog article, I went and read a second paper on LTSM Models written by Akhter Rather, found at the following url:

LSTM-based Deep Learning Model for Stock Prediction and Predictive Optimization Model

So from here, I decided I wanted to see this LSTM Model in action. I couldn't find the source code from the Kolkata publication, but I felt there were probably more of these models around, and I decided to look for one. My next blog post will cover that.