Grasping Technology: Q Learning

Showing posts with label Q Learning. Show all posts

Thursday, November 16, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Chapter 13 - Memory Patch

Okay, last chapter in the book!

In this chapter, you get to "create" (actually, "create" means download and run) some Github hosted code that allows you to train a model to learn how to play the video game "Snake".

Snake is an early video game, probably from the 1970s or 1980s. I don't know the details of it but I am sure there is plenty of history on it. I think you could run it on those Radio Shack Tandem TRS80 computers that had 640K of RAM on them and saved to a magnetic cassette tape (I remember you could play Pong, and I think Snake was one of them also).

The idea was that each time the snake ate an apple (red square) the snake's length would increase (by one square). You could move up, down, left, right constrained by coordinate boundaries, and if the snake overlapped with itself, it died and the game ended.

When I first ran the model training for this, it ran for more than a day - perhaps all weekend, and then died. The command prompt, when I returned to check on progress, had a [ Killed ] message.

I had other models in this book die this way, and decided that I was running out of memory, and my solution to the other models was to edit the source code, and decrease the number of Epochs, and reduce the loop complexity. This made the models a LOT less efficient and reliable, but I still saw beneficial results from running them with this tactic.

In this case, for some reason, I went to Github and looked at the Issues, and I saw a guy complaining about a Memory Leak in the Tensorflow libraries. There was a patch to fix this!

Below is a Unix/Linux "diff" command, which shows this patch:

% diff train.py train.py.memoryleak
5d4
< import tensorflow as tf
12,15d10
< import gc
< import os
< import keras
<
64,67c59
<             #qvalues = model.predict(currentState)[0]
<             qvalues = model.predict(tf.convert_to_tensor(currentState))[0]
<             gc.collect()
<             keras.backend.clear_session()
---
>             qvalues = model.predict(currentState)[0]

So in summary, the patches are:

The original statement qvalues = model.predict(currentState)[0] is replaced by:

qvalues = model.predict(tf.convert_to_tensor(currentState))[0]

There is also a garbage collect statement: gc.collect() that is added for the patch.
A Keras library call "clear_session()" has been added

Of course some imports are necessary to reference and use these new calls.

This fixes the memory problem. It does not appear that the training will ever end on its own when ou run this code. You have to Ctl-C it to get it to stop, because it just trains and trains, looking for a better score and more apples. I had to learn this the hard way after running train.py for a full weekend.

So this wraps up the book for me. I may do some review on it, and will likely move on to some new code samples and other books.

Thursday, October 5, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Deep Q Learning

I read about Q Learning, and was feeling somewhat proud of myself for sticking my toe into the water.

Then I read about Deep Q Learning - in this same book - and it was as if someone took an ice bath and dumped it over my head. I went into the tunnel - advancing through chapters 9,10 and 11, only to come out the other end confused ("what did I just read?").

The coding examples were interesting enough that I kept pushing forward, but a lot of what is in the code is masked by the fact that the math and formulas were hidden away in libraries like Keras. So while I thought the examples were cool and I had a grasp of the problems they were attempting to solve, I still came out at the end with confusion and question marks in my head.

Q Learning vs Deep Q Learning

In Chapter 9, which covers Deep Q-Learning, things start to get very complex very fast. So what is the difference between Q Learning (introduced in Chapter 7-8), and Deep Q Learning?

More complex problems in Deep Q Learning - with more variables
The approach to solving a more complex problem

With regards to the approach to solving problems, the book gets into a good discussion - worth mentioning here - about the difference between ArgMax and SoftMax.

Argmax vs Softmax

In Q Learning, the 'name of the game' was to find (and use) the highest Q Value. This is referred to as "Exploitive" and is known as the ArgMax method of Reinforcement Learning.

In Deep Q Learning, probability distributions across several variables are being continually updated during the training of the model. You have a set of (input) variables, with specific weights, but as you take random samples and compare the predicated value to the actual value, the weights are updated according to the new realities (results). This process is referred to as Explorative (in nature) and is named the SoftMax method.

Chapter 9 starts you off simple(r). With a Real Estate example of predicting home prices. Seems sensible enough, since we can all think of input variables that help drive the price of a home. The focus here is on trying to show the process, which is broken down into the following steps:

Uploading the Data Set (actual home prices)
Building the Neural Network
Training the Neural Network
Displaying the Results

From here, the book advances into Deep Learning Theory. The idea borrows from the human brain, which is connected by Synapses that send signals. This is the fundamental concept behind Deep Q Learning because it starts with a certain number of "Layers". There are a minimum of three layers that are as follows:

Input Layer - consists of Input Values, and each of these gets weights that are continually adjusted
Hidden Layer(s) - these "neurons" are also continually adjusted
Output Layer - this layer compares predicted values to actual values and computes Loss Error.

The loss error gets back-propagated through the layers, re-adjusting the weights continually, using a concept called Gradient Descent (which requires at least a basic understanding of Calculus). The book covers three types of Gradient Descent (Batch, Stochastic and Mini-Batch).

The book mentions Activation Functions that take weighted input values, and return an output value. The book mentions three of these, which sound intimidating, but if you are familiar with Electronics and/or Trigonometry, these names actually make some sense:

Sigmoid Activation Function - a logarithmic curve denoting a move from state value (no lower than) 0 to (no higher than) 1.
Rectifier Activation Function - a linear but angular approach from state 0 to state 1
Threshold Activation Function - An abrupt binary state transition from 0 to 1. This is much like flipping a switch into an on/off state.

Now from here (Chapter 9), the book goes into Chapter 10 - Self Driving Car - which an implementation of Chapter 9 - and quite fun to do. Then it dives into Chapter 11, which uses the example taken from Google's DeepMind project that optimizes server temperature with a simulation.

Chapter 11 in particular really drives home the process by showing how you can optimize or minimize costs.

Building the Environment
Building the Brain - using DropOut vs NoDropOut techniques
Implementation (of the Deep Learning Algorithm)
Training the AI - using either Early Stopping or No Early Stopping
Testing the AI

Seeing is believing, and when you see this code run and start to view the results, I have to admit it is pretty darn cool.

It also takes a LONG time to run. I had to shorten the epochs from 100 to 25 to keep the job from getting killed (I am not sure what exactly was killing it). Running for 100 epochs was taking my laptop HOURS to finish (2-3 hours). But at the end of each Epoch, almost always the energy savings from the AI was superior to the energy savings of not using the AI (which in this case is modeled by the server's mainboard temperature controller).

There's so much more to discuss. But I think I have hit the highlights here.

Grasping Technology

Thursday, November 16, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Chapter 13 - Memory Patch

Thursday, October 5, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Deep Q Learning

SLAs using Zabbix in a VMware Environment

Search This Blog