Grasping Technology: Python

Showing posts with label Python. Show all posts

Thursday, June 20, 2024

New AI Book Arrived - Machine Learning for Algorithmic Trading

This thing is like 900 pages long.

You want to take a deep breath and make sure you're committed before you even open it.

I did check the Table of Contents and scrolled quickly through, and I see it's definitely a hands-on applied technology book using the Python programming language.

I will be blogging more about it when I get going.

Tuesday, January 30, 2024

AI - Neural Networks and Deep Learning - Nielsen - Chap 1 Introduction

I came across this book entitled Neural Networks and Deep Learning, by Michael Nielsen, and decided to invest some time in reading it. I am hoping to reinforce what I have read thus far (for instance in the book AI Crash Course by Haydelin de Ponteves).

The first chapter in this book covers recognition of hand-written digits. This is a pretty cool use case for AI technology, given that everyone's individual handwriting constitutes a font in and of itself, and even then, there is a certain degree of variation every time an individual hand-writes characters. Who uses technology like this? Well, banks for example might use something like this as part of check processing!

I like the fact that this example is optical in nature - and, it could even be inverted in a Generative AI manner to generate new fonts, or new characters in some context or use case!

One of the first things this book covered was Perceptrons - and explained how Perceptrons are essentially binary (1s and 0s) as result output of action functions. He then moves on to explain that binary results are not always in demand or optimal, and that gradient values (shades of gray) between 0 and 1 are often desired or necessary. And this is why we have Sigmoid Neurons.

This was insightful, because the first book I read (unless I overlooked) never even mentioned the concept of Perceptrons, and jumped right into Sigmoid Neurons.

Nielsen then goes on to explain, briefly, the architecture of Neural Networks and embarks on a practical example using the Handwritten Digits Recognition use case.

One thing that was helpful in this chapter, was the fact that he explains how these hidden layers of a neural network "fit together" by use of a practical example. This helps as far as design and modeling of a Neural Network go.

He goes on to discuss the purpose of using bias along with weights, and then goes into a deeper discussion on gradient descent - and stochastic gradient descent. Stochastic Gradient Descent is critical for reinforcement learning because it minimizes the cost function. Minimizing a cost function does no good though, if you can't "remember" it, so the concept of back propagation is required to instill back into the model, this minimization of cost.

I downloaded the code for his Handwriting Recognition example. I immediately figured out by reading the README.txt file (most people ignore this and waste time learning the harder way) that the code was not written for Python3. Fortunately, someone else had ported the code to Python3 - and this was in another Github repository.

In running the code from the Python3 repository, I immediately ran into an error with a dependent library called Theano. This error was listed here: Fix for No Section Blas error

So I was fortunate. This fix worked for me. And we got a successful run! In this example, the accuracy in recognizing handwritten digits was an astounding 99.1%!

This code uses a data sample from MNIST, comprised of 60K samples. The code by default breaks this into a 50/10 training and validation sample.

So now that we know that we can run the code successfully, what might we want to do next on this code, in this chapter, before moving on to the next chapter?

Experiment with the number of Epochs
Experiment with the Learning Rate
Experiment with the Mini Batch Size
Experiment with the number of hidden neurons

Here is a table that shows some results from playing with the epochs and mini batch sizes:

test.py

Run 1: Epochs=30, Mini Batch Size=10, Results: 8700/10,000at its peak in epoch 26

Run 2: Epochs=40, Mini Batch Size=10,Results: 9497/10,000 at its peak in epoch 36

Run 3: Epochs=40, Mini Batch Size=20,Results: 9496/10,000 at its peak in epoch 36

So interestingly, the numbers go up each epoch, until a peak is reached, and then the number settles back down for the last 2-3 epochs in these trial runs. It appears, that the number of epochs is a more important coefficient than the mini batch size.

Another test with adjusted learning rates could be interesting to run, as this impacts Gradient Descent quite a bit. As Gradient Descent is a Calculus concept for finding the minimum, when learning rates are too low or too high, it impacts the time and ability for the "ball to drop to the bottom" of said low point.

He also, at the very end, mentions the use of more/different algorithms, referencing the Support Vector Machine (SVM).

Artificial Intelligence - LTSM for Financial Modeling - Part II

As Part II of my foray into Stock Portfolio Optimization, I decided to implement Python code that I found in a blog: https://www.datacamp.com/tutorial/lstm-python-stock-market

This blog was written by a Data Scientist, Thuresan Ganegedar . And he spent some time on this. I was super excited after I read the blog, to go in and compile the code and run it myself. The first thing I did, was get myself an API Key so that I could fetch data for a stock symbol.

But - I had issue after issue in running this code.

First, the code was sufficiently outdated that many libraries, especially Keras, had "moved on" and completely deprecated certain libraries, calls, etc. I spent considerable time picking through the code, doing web searches, and trying to "just get the code to run".

Second, the code was using hard-coded array values for the number of data points for the stock symbol, and because of this, I ran into null exceptions because the number of data points I had in my stock symbol fetch was over 2K less than the example had. To get around this, I wound up going with 4000 training data points and 600 testing data points. Later, I enhanced the code to dynamically set the training and testing sizes based on how many data points came down in a csv, on a percentage basis.

I also decreased the window size and the batch size as well.

In the end, I was able to produce nice graphs that matched the ones in his blog for the Predictions based on Simple Average and Exponential Moving Average. But these graphs are based on algorithms, not a neural AI network - in other words, they aren't "true AI".

The AI portions of the code did run to completion, but the graphs that came out from implementing the AI, did not look correct. The price predictions were FAR FAR lower than the actuals, as well as the ones from the Simple and Exponential Moving Average graphs.

I may come back and take another look at this, but it was a good exercise, and this exercise made me realize that I need to read and learn more about AI - and what is going on "under the hood".

Thursday, October 5, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Deep Q Learning

I read about Q Learning, and was feeling somewhat proud of myself for sticking my toe into the water.

Then I read about Deep Q Learning - in this same book - and it was as if someone took an ice bath and dumped it over my head. I went into the tunnel - advancing through chapters 9,10 and 11, only to come out the other end confused ("what did I just read?").

The coding examples were interesting enough that I kept pushing forward, but a lot of what is in the code is masked by the fact that the math and formulas were hidden away in libraries like Keras. So while I thought the examples were cool and I had a grasp of the problems they were attempting to solve, I still came out at the end with confusion and question marks in my head.

Q Learning vs Deep Q Learning

In Chapter 9, which covers Deep Q-Learning, things start to get very complex very fast. So what is the difference between Q Learning (introduced in Chapter 7-8), and Deep Q Learning?

More complex problems in Deep Q Learning - with more variables
The approach to solving a more complex problem

With regards to the approach to solving problems, the book gets into a good discussion - worth mentioning here - about the difference between ArgMax and SoftMax.

Argmax vs Softmax

In Q Learning, the 'name of the game' was to find (and use) the highest Q Value. This is referred to as "Exploitive" and is known as the ArgMax method of Reinforcement Learning.

In Deep Q Learning, probability distributions across several variables are being continually updated during the training of the model. You have a set of (input) variables, with specific weights, but as you take random samples and compare the predicated value to the actual value, the weights are updated according to the new realities (results). This process is referred to as Explorative (in nature) and is named the SoftMax method.

Chapter 9 starts you off simple(r). With a Real Estate example of predicting home prices. Seems sensible enough, since we can all think of input variables that help drive the price of a home. The focus here is on trying to show the process, which is broken down into the following steps:

Uploading the Data Set (actual home prices)
Building the Neural Network
Training the Neural Network
Displaying the Results

From here, the book advances into Deep Learning Theory. The idea borrows from the human brain, which is connected by Synapses that send signals. This is the fundamental concept behind Deep Q Learning because it starts with a certain number of "Layers". There are a minimum of three layers that are as follows:

Input Layer - consists of Input Values, and each of these gets weights that are continually adjusted
Hidden Layer(s) - these "neurons" are also continually adjusted
Output Layer - this layer compares predicted values to actual values and computes Loss Error.

The loss error gets back-propagated through the layers, re-adjusting the weights continually, using a concept called Gradient Descent (which requires at least a basic understanding of Calculus). The book covers three types of Gradient Descent (Batch, Stochastic and Mini-Batch).

The book mentions Activation Functions that take weighted input values, and return an output value. The book mentions three of these, which sound intimidating, but if you are familiar with Electronics and/or Trigonometry, these names actually make some sense:

Sigmoid Activation Function - a logarithmic curve denoting a move from state value (no lower than) 0 to (no higher than) 1.
Rectifier Activation Function - a linear but angular approach from state 0 to state 1
Threshold Activation Function - An abrupt binary state transition from 0 to 1. This is much like flipping a switch into an on/off state.

Now from here (Chapter 9), the book goes into Chapter 10 - Self Driving Car - which an implementation of Chapter 9 - and quite fun to do. Then it dives into Chapter 11, which uses the example taken from Google's DeepMind project that optimizes server temperature with a simulation.

Chapter 11 in particular really drives home the process by showing how you can optimize or minimize costs.

Building the Environment
Building the Brain - using DropOut vs NoDropOut techniques
Implementation (of the Deep Learning Algorithm)
Training the AI - using either Early Stopping or No Early Stopping
Testing the AI

Seeing is believing, and when you see this code run and start to view the results, I have to admit it is pretty darn cool.

It also takes a LONG time to run. I had to shorten the epochs from 100 to 25 to keep the job from getting killed (I am not sure what exactly was killing it). Running for 100 epochs was taking my laptop HOURS to finish (2-3 hours). But at the end of each Epoch, almost always the energy savings from the AI was superior to the energy savings of not using the AI (which in this case is modeled by the server's mainboard temperature controller).

There's so much more to discuss. But I think I have hit the highlights here.

Wednesday, August 9, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Q Learning

The Crash Course in AI presents a fun project for the purpose of developing a familiarization with the principles of Q Learning.

It presents the situation of Robots in a Maze (Warehouse), with the idea that the AI will learn the optimal path through a maze.

To do this, the following procedure is followed:

Define a Location-to-State Mapping where each location (alphabetic; A, B, C, etc) is correlated to an Integer value (A=0,B=1,C=2,D=3, et al).
Define the Actions (each location is an action, represented by its integer value) which is represented as a array of integers.
Define the Rewards - here, each square in the maze, has certain squares it is adjacent to that constitute a "move".

The set of Reward Arrays, is an "array of arrays", and we know that an "array of arrays" is

essentially a matrix! Hence, we can refer to this large "rule set" as a Rewards Matrix.

# Defining the rewards

R = np.array([

[0,1,0,0,0,0,0,0,0,0,0,0], --> A's only valid move is to B

[1,0,1,0,0,1,0,0,0,0,0,0], --> B's only valid move is to A, C, F

[0,1,0,0,0,0,1,0,0,0,0,0], --> C's only valid move is to B, G

[0,0,0,0,0,0,0,1,0,0,0,0], --> D's only valid move is H

[0,0,0,0,0,0,0,0,1,0,0,0], --> E's only valid move is to I

[0,1,0,0,0,0,0,0,0,1,0,0], --> F's only valid move is to B, J

[0,0,1,0,0,0,1,1,0,0,0,0], --> G's only valid move is to C, G, H

[0,0,0,1,0,0,1,0,0,0,0,1], --> H's only valid move is to D, G, L

[0,0,0,0,1,0,0,0,0,1,0,0], --> I's only valid move is to E, J

[0,0,0,0,0,1,0,0,1,0,1,0], --> J's only valid move is to F, I, K

[0,0,0,0,0,0,0,0,0,1,0,1], --> K's only valid move is to J, L

[0,0,0,0,0,0,0,1,0,0,1,0] --> L's only valid move is to H, K

])

So this array, these "ones and zeroes" govern the "rules of the road" in terms of the maze. In fact, you could draw the maze out graphically based on these rules.

Now - from a simple perspective, armed with this information, you can feed a starting and ending location into the "Engine", and it will compute the optimal path for you. In cases where there are two optimal paths, it may give you one or the other.

But how does it do this? How does it "know"?

This gets into two key concepts, that comprise and feed an equation, known as the Bellman Equation.

Temporal Difference - how well (or how fast) the AI (model) is learning
Q-Value - this is an indicator of which choices led to greater rewards

If we consider that models like this might have thousands or even millions of X/Y coordinate data points (remember, it is a geographic warehouse model), it is not scalable for the AI to store all of the permutations of these as it works through the model. What this Bellman Equation does, is allow for a Calculus-like coefficient to be used such that we know if we hit coordinate X,Y, what the optimal steps were to reach X,Y.

Basically, as we traverse the maze, before we start, all Q values are (initialized to) zero. As we traverse the maze, the model calculates the Temporal Difference, and if it is high then the model flags it as a Reward, while if it is low, it is flagged as a "frustration". High values early on, are "pleasant surprises" to the model. So - in summary, as the maze is traversed, the TD is calculated, followed by a Q value adjustment (Q Value for the state/action combination to be precise).

Now...before I forget to mention this, the Rewards Matrix needs to be adjusted to reflect the ideal ending location. For example, if the maze was to begin at point E, and and at point G, the X/Y axis (starting location, ending location) of G would need to have a huge value that would tell the AI to stop there and go no further. You can see this in the coding example of the book:

# Optimize the ending state with the ultimate reward
R_new[ending_state, ending_state] = 1000

I have to admit - I started coding, before I fully read and digested what was in the book. I got tripped up by two variables in the code: Alpha, Gamma. Alpha was coded as .9, while Gamma was coded as .75. I was very confused by these constants; what they were, why they were used.

I had to go back into the book.

Alpha - Learning Rate
Gamma - Discount Factor

Hey, this AI stuff - these algorithms, they're all about Mathematics (as well as Statistics and Probability). I am not a Mathematician, and only took intermediate Calculus, so some of us really need to concentrate and put our thinking caps on if we truly want to follow the math behind these models and equations.

Thursday, February 16, 2023

Morpheus API - pyMorpheus Python API Wrapper

I have been working on some API development in the Morpheus CMP tool.

The first thing I do when I need to use an API, is to see if there is a good API wrapper. I found this one API wrapper out on Github, called pyMorpheus.

With this wrapper, I was up and running in absolutely no time, making calls to the API, parsing JSON responses, etc.

The Use Case I am working on, is a "re-conciliator" that will do two things:

Remove Orphaned VMs

Find, and delete (upon user confirmation) those VMs that have had their "rug pulled out" from Morpheus (deleted in vCenter but still sitting in Morpheus as an Instance)

Convert Certain Discovered VMs to Morpheus

This part sorta kinda worked. The call to https://<applianceurl>/servers/id/make-managed did take a Discovered VM and converted it to an instance, with a "VMWare" logo on it.

But I was unable to set advanced attributes of the VMs - Instance Type, Layout, Plan, etc. and this made it only a partial success.

Maybe if we can get the API fixed up a bit, we can get this to work.

One issue, is the "Cloud Sync". When we call the API, we do a cloud sync, to find Discovered VMs. We do the same cloud sync, to determine whether any of the VM's fields in Morpheus change their state, if someone deletes a VM in vCenter (such a state change gives us the indicator that the VM is, in fact, now an orphan). The Cloud Sync is an asynchronous call. You have to wait for an indefinite amount of time, to ensure that the results you are looking for in vCenter, are reflected in Morpheus. It's basically polling, which is not an exact art. For this reason, the reconciliator tool needs to be run as an operations tool, manually, as opposed to some kind of batch scheduled job.

Friday, May 17, 2019

Palo Alto Firewall VM Series - Integration and Evaluation - Part II

After a couple of days of playing around with the Palo Alto VM-Series Firewall (running the VM on a KVM / LibvirtD virtualization platform on a CentOS7 host), I felt I was comfortable enough with it to explore the API.

I asked a Palo Alto engineer how they bootstrap these things. He told me they use CloudInit and use a boot.xml file to change the default password. From there, they use their management platform, Panorama, to push configurations to the devices.

I don't happen to have Panorama anywhere. And I presume like everything else, it needs licenses. So, I started looking at the facilities to interface/integrate with the device; meaning APIs.

There are actually several APIs:

Command Line Interface (CLI)
WildFire API
AutoFocus API
PAN-OS Licensing API
Panorama XML API (requires Panorama of course)
Pan-OS XML API

I located, downloaded and glanced through the XML API Guide. Which actually does do a nice job of getting you acquainted with the API. There is nothing really unusual. You need to authenticate, get a token (they call it a key), and with that key you can go to work (I won't cover details of the API here).

Next it was time to examine the API firsthand. Is it running? Do I need a license? I used Postman for this. I don't know if there are other better tools for picking at APIs, but I think Postman is definitely one of those most popular tools. Making add/modify changes is always risky when you are learning a new API, so it always makes sense to start with some "get" calls so you can understand the structure of the data. So, I was able to hit the VM on standard SSL port 443, and get back a key, and with the key, run a few get commands based on examples in the API Guide. The API works, it appears!

One noteworthy comment is that the API would not work without turning off certificate validation in the settings!

Next, I considered starting to write some Python code as a client, but as Palo Alto is a pretty popular firewall from a large company, there had to be some folks who have broken ground on that already, right? A quick google search for a Python API client turned up a project from a guy named Kevin Steves, who has clients for ALL of the APIs in Python. It is on GitHub with a free use license.

https://github.com/PaloAltoNetworks/pandevice/

After cloning this, I noticed you can run setup. I elected not to run setup, and just try to invoke the API directly. I had to use the panxapi.py python file. Examining the source code, you can supply an exhaustive list of options to the main() module of the Python file, which will parse those and invoke accordingly.

Immediately, however, I ran into the same certificate validation error I experienced with PostMan. But in PostMan I could just go into settings and disable certificate validation. Figuring out how to do this with the API was more difficult. Eventually, I found an issue recorded on the project that discusses this same problem, which can be found at this link: Certificate Validation Issue

The issue discusses versions of Python on CentOS that do certificate checking. Rather than fool with upgrading Python, one poster pointed out that you can, in fact, disable certificate checking in Python by setting an environment variable: "export PYTHONHTTPSVERIFY=0". Bingo. That's all I need right now to experiment with the API.

Grasping Technology