Tuesday, October 3, 2023

Examining My Blog Analytics

This Blog - I never really "truly" cared about who saw it. I think the blog has served more as a personal diary for me than a blog that I wrote for purposes of harvesting subscribers and attention.

I have a friend who is creating a "business" (I will withhold my comments about things like Business Model). Maybe it's a hobby he thinks is a business. At any rate,  he was asking me questions as he was creating his website, and this did make me realize that I know little to nothing about crawlers, indexing, search engines, and how to maximize search engine results. I surprised myself with this realization. 

I mean, I have a working knowledge and understanding about things like Cookies, Meta Tags, and stuff like this. But it isn't something I am fluent with.

This blog gets a few peeks here and there, and a few have made some comments that I have helped them with certain issues they were looking for answers for. But really, traffic on this blog is drop in the ocean - and a small drop at that.

Today, I pulled up Analytics, and noticed that 2/3 of the pages on this blog were not indexed. There were about 3 reasons listed: 

  • Alternate Page with Proper Canonical Tag (WTF is this???)
  • Mobile Usability
  • No Sitemap

I made some changes to it:
    •    Added a SiteMap
    •    Changed the Theme to accommodate Mobile Pages
    •    Resubmitted the Blog for Indexing - including individual pages with the Canonical error

Now here is something quite interesting.
    •    I put a search string in DuckDuckGo, and up comes this blog lickety split - right at the top.
    •    Same search string in Bing.com and my blog came up lickety split - right at the top.

Note: The search string I used was "Techgrasper Blogspot"

But - when I put the same search string into Google, there was literally NO MENTION of my blog. At all. Zero. 

And this blog is hosted on a Google-owned platform!!! So what is going on there???

  • Is this blog on a blacklist or being censored?
  • Is this blog suppressed because I am not paying Google, or generating ads on it?
  • Some other reason Google is not showing the blog?
This sure does open the can of worms when it comes to the control of information flow, and who sees what. Clearly when different search engines show THIS LEVEL of inconsistency, we have some major major major major issues.

Ironically enough from a timing perspective, my whole desire to check my blog in search results is happening at the same time that Google is defending itself from a Sherman Act lawsuit.

Look - if this is a blog, called techgrasper, and there aren't many things out on the world wide web that have the words blog + techgrasper in them, shouldn't this blog SHOW UP???? On the prevailing dominant search engine???? Hosted on a blogging platform that THEY OWN????

Why I Am Not an Expert in User Interfaces and Web Front-Ends

Quite some years ago, I worked at a cool startup company towards the tail of the Dot Com bubble. I won't mention it here I guess. But - they were pioneers in Mobile Apps. 

They promised this "write once run anywhere" concept, where you would write your app in a proprietary markup language, and they would parse this markup and render the content on a host of early devices. The field for competing devices in these days was insane, including stuff like Palm Pilots to VERY early mobile phone browsers that ran their own simplistic markup (phones in those days had no resources to the markups had to be dumbed down and simplified).

Anyway, we split the development up into "tiers" where we had "front end" developers who were super knowledgeable in how to render content, and then we had the "back end" developers who wrote the server logic (really it was 3 Tier, so business logic and database logic). Most of the stuff was written in Java, and these guys had licensed WebLogic application servers, and were using EJBs (which were a hot technology at the time). They even used Entity Beans, which practically nobody used (most shops had gone with a stateless architecture and used Session Beans). WebLogic was cool, admittedly, and introduced me to Connection Pools for databases, Threads, etc. These concepts are widely used today. 

As cool as WebLogic was, it was not my first exposure to an Application Server. I was the first one in a large Telco company to bring in an Application Server (I worked in an Innovation Center back then). I contracted in for a Time Keeping system from a Silicon Valley company (Tock I think it was called), which we customized, and this system was architected for Netscape Application Server - which I believe was indeed the first "application server". Feel free to comment if anyone has a correction to this belief.

Anyway, I digress. But these technologies, if they didn't outright invent it, they lent themselves well to the concept of 3 Tier application design as we know it today. Which allowed your web front ends to be developed in a compartmentalized fashion.

I never really worked much on GUIs and Web Front Ends. I did a little bit "here and there". Websphere pages (later on). The Java Spring framework. I created a website for a small consulting company we owned back at one point. But aside of brief stints with user interfaces, the focus was never really web front-ends.

I am realizing that I need to "bone up" a bit. My next post will explain why.

Monday, October 2, 2023

Service Now Integration using pysnow API client for Python

My latest technology initiative, has been doing some first-hand integration to Service Now, using the Service Now API.  The first thing I did, was to load the API calls into PostMan. Once I tested the OAuth 2.0 authentication, and made a couple of test calls, I was ready to proceed with Python.

I searched for a Service Now Python Client, and sure enough, there is one. It is called "pysnow" and it can be installed with the Python pip utility:

>pip install pysnow

Once installed, you can interact with Service Now in a very straightforward manner - although there are some client-specific things one should learn from reading the documentation. Authentication uses OAuth 2.0, and the token re-generation is done as part of the API client, which is convenient. 

Once you have authenticated, you generally bind to a resource first (i.e. a Table), and once you have bound to it, you can then make a call against that resource (i.e. a query).  Data from calls can be accessed using helper functions such as first() or first_or_none().

Here is a snippet (from their documentation) on how the client is used:

import pysnow 
# Create client object
c = pysnow.Client(instance='myinstance', user='myusername', password='mypassword')

# Define a resource, here we'll use the incident table API
incident = c.resource(api_path='/table/incident')

# Query for incidents with state 3
response = incident.get(query={'state': 3})

# Print out the first match, or `None`
print(response.first_or_none())

Service Now, in my mind, is just a huge relational database full of tables. And the API calls are allowing you to retrieve from these tables (GET calls), update these tables (PUT calls), or delete from these tables (DELETE calls). 

You can pass queries as arguments on the GET calls, and the queries are very similar to those you might use with SQL, supporting things such as wildcard with LIKE clauses, etc.

There was one case, where I had to abandon the pysnow client, and use Python Requests. It was a case where one of the API calls required a PATCH call. I had never actually even heard of a PATCH call before encountering this, but it's a valid call - just one that is a bit more rare to encounter and up to now, I had not seen it. The pysnow API did not support a PATCH request, interestingly enough, and after figuring this out, I had to (re) write the API client calls using Python Requests for the PATCH API call.

Aside of this, the only other surprise I had, was the number of fields I was getting back on many of these calls. Some of these records were incredibly large.

Friday, August 18, 2023

Recovering a Corrupted NSX-T Manager

If your NSX-T Manager cluster is running as a cluster of VMs, if one is corrupted, there is a good chance they all are if the issue was related to storage connectivity. Or, maybe it is just one. If you are running a cluster, and don't have a backup to restore, these steps can be used to repair the file system. Mileage varies on repairing file systems, so there is no guarantee this will work, but this is the process to attempt nontheless.

1.  Connect to the console of the appliance.

2.  Reboot the system.

3.  When the GRUB boot menu appears, press the left SHIFT or ESC key quickly. If you wait too long and the boot sequence does not pause, you must reboot the system again. Press e to edit the menu.

4.  Keep the cursor on the Ubuntu selection.

5.  Press e to edit the selected option.

6.  Enter the user name ( root) and the GRUB password for root (not the same as the appliance's user root).Password "VMware1" before release 3.2 and "NSX@VM!WaR10" 3.2 and beyond.

7.  Search for the line starting with "linux" having boot command.

8. Remove all options after root= (Starting from UUID) and add "rw single init=/bin/bash".

9.  Press Ctrl-X to boot.

10.  When the log messages stop, press Enter. You will see the prompt root@(none):/#.

11.  Run following commands to repair the file system.

  • e2fsck -y /dev/sda1
  • e2fsck -y /dev/sda2
  • e2fsck -y /dev/sda3
  • e2fsck -y /dev/mapper/nsx-config
  • e2fsck -y /dev/mapper/nsx-image
  • e2fsck -y /dev/mapper/nsx-var+log
  • e2fsck -y /dev/mapper/nsx-repository
  • e2fsck -y /dev/mapper/nsx-secondary

The Linux XFS File System - How Resilient Is It?

We are using VMWare Datastores, using NFS version 3.x.  The storage was routed, which is never a good thing to do because let's face it, if your VMs all lose their storage simultaneously, that constitutes a disaster. Having dependencies on a router, which can lose its routing prefixes due to a maintenance or configuration problem, is architecturally deficient (polite way of putting it). To solve this, you need to make sure that you don't have routing hops (storage on same segment as storage interface on hypervisor).

So, after our storage routers went AWOL due to a maintenance event, I noticed some VMs came back and appeared to be fine. They had rebooted and were at a login prompt.  Other VMs, however, did not come back, and had some nasty things printing on the console (you could not log into these VMs).


What we noticed, was that any Linux virtual machine running with XFS file system type on boot or root (/boot or /) had this issue of being unrecoverable.  VMs that were using ext3 or ext4 seemed to be able to recover and start running their services - although some were still echoing some messages to the console.

There is a lesson here. That the file system matters when it comes to resiliency in a virtualized environment.

I did some searching around for discussions on file system types, and of course there are many. This one in particular, I found interesting:  ext4-vs-xfs-vs-btrfs-vs-zfs-for-nas


Wednesday, August 9, 2023

Artificial Intelligence Book 1 - Crash Course in AI - Q Learning

The Crash Course in AI presents a fun project for the purpose of developing a familiarization with the principles of Q Learning.  

It presents the situation of Robots in a Maze (Warehouse), with the idea that the AI will learn the optimal path through a maze.

To do this, the following procedure is followed:

  1. Define a Location-to-State Mapping where each location (alphabetic; A, B, C, etc) is correlated to an Integer value (A=0,B=1,C=2,D=3, et al).
  2. Define the Actions (each location is an action, represented by its integer value) which is represented as a array of integers.
  3. Define the Rewards - here, each square in the maze, has certain squares it is adjacent to that constitute a "move". 
     The set of Reward Arrays, is an "array of arrays", and we know that an "array of arrays" is 
     essentially a matrix! Hence, we can refer to this large "rule set" as a Rewards Matrix.  
        
     # Defining the rewards
     R = np.array([
                            [0,1,0,0,0,0,0,0,0,0,0,0], --> A's only valid move is to B
                            [1,0,1,0,0,1,0,0,0,0,0,0], --> B's only valid move is to A, C, F
                            [0,1,0,0,0,0,1,0,0,0,0,0], --> C's only valid move is to B, G
                            [0,0,0,0,0,0,0,1,0,0,0,0], --> D's only valid move is H
                            [0,0,0,0,0,0,0,0,1,0,0,0], --> E's only valid move is to I
                            [0,1,0,0,0,0,0,0,0,1,0,0], --> F's only valid move is to B, J
                            [0,0,1,0,0,0,1,1,0,0,0,0], --> G's only valid move is to C, G, H
                            [0,0,0,1,0,0,1,0,0,0,0,1], --> H's only valid move is to D, G, L
                            [0,0,0,0,1,0,0,0,0,1,0,0], --> I's only valid move is to E, J
                            [0,0,0,0,0,1,0,0,1,0,1,0], --> J's only valid move is to F, I, K
                            [0,0,0,0,0,0,0,0,0,1,0,1], --> K's only valid move is to J, L
                            [0,0,0,0,0,0,0,1,0,0,1,0] --> L's only valid move is to H, K
                         ])

So this array, these "ones and zeroes" govern the "rules of the road" in terms of the maze. In fact, you could draw the maze out graphically based on these rules.

Now - from a simple perspective, armed with this information, you can feed a starting and ending location into the "Engine", and it will compute the optimal path for you. In cases where there are two optimal paths, it may give you one or the other.

But how does it do this? How does it "know"?

This gets into two key concepts, that comprise and feed an equation, known as the Bellman Equation.
  • Temporal Difference - how well (or how fast) the AI (model) is learning
  • Q-Value - this is an indicator of which choices led to greater rewards
If we consider that models like this might have thousands or even millions of X/Y coordinate data points (remember, it is a geographic warehouse model), it is not scalable for the AI to store all of the permutations of these as it works through the model. What this Bellman Equation does, is allow for a Calculus-like coefficient to be used such that we know if we hit coordinate X,Y, what the optimal steps were to reach X,Y.

Basically, as we traverse the maze, before we start, all Q values are (initialized to) zero. As we traverse the maze, the model calculates the Temporal Difference, and if it is high then the model flags it as a Reward, while if it is low, it is flagged as a "frustration". High values early on, are "pleasant surprises" to the model. So - in summary, as the maze is traversed, the TD is calculated, followed by a Q value adjustment (Q Value for the state/action combination to be precise).

Now...before I forget to mention this, the Rewards Matrix needs to be adjusted to reflect the ideal ending location.  For example, if the maze was to begin at point E, and and at point G, the X/Y axis (starting location, ending location) of G would need to have a huge value that would tell the AI to stop there and go no further. You can see this in the coding example of the book:

# Optimize the ending state with the ultimate reward
R_new[ending_state, ending_state] = 1000

I have to admit - I started coding, before I fully read and digested what was in the book. I got tripped up by two variables in the code: Alpha, Gamma. Alpha was coded as .9, while Gamma was coded as .75. I was very confused by these constants; what they were, why they were used. 

I had to go back into the book.
  • Alpha - Learning Rate
  • Gamma - Discount Factor

Hey, this AI stuff - these algorithms, they're all about Mathematics (as well as Statistics and Probability). I am not a Mathematician, and only took intermediate Calculus, so some of us really need to concentrate and put our thinking caps on if we truly want to follow the math behind these models and equations.


Artificial Intelligence Book 1 - Crash Course in AI - Thompson Sampling

I bought this book in Oct 2020. Maybe due to holidays and other distractions, I never picked the book up until 2021, at which point I decided it took mental energy, and set it back down.

Well, now that AI is the rave in 2023, I decided to pick this book up and push the education along.

I love that this book uses Python. It wants you to use some user interface called Colab, which I initially looked at but quickly abandoned in favor of the tried-and-true vi editor.

The book starts off with the Multi-Armed-Armed-Bandit "problem". 

What is that? Well, the name stems from the One-Armed-Bandit; a slot-machine, which "steals" from the players of the machine.  

The Multi-Armed-Bandit, I presume, turns this on its head as it represents a slot machine player that is playing a single machine with multiple N number of arms (or, perhaps a bank of N number of single-armed machines). By using a binary system of rewards (0/1) this problem feeds into a Reinforcement Learning example where the optimal sequence of handle pulls results in the maximum rewards. 

This "use case" of the Multi-Armed-Bandit problem (slot machines or single slot machines with multiple arms), is solved by the use of Thompson Sampling. 

Thompson Sampling (the term Sampling should give this away) is a Statistics-based approach that solves some interesting problems. For example, take the case of the Multi-Armed Bandit problem just described. Just because a slot machine has paid out the most money historically, it does not mean that the particular slot machine will continue to be the best choice for the future gambler (or future pulls of the arms on the slot machine/s).  Thompson Sampling, through continual redistribution and training, accommodates the idea of exploiting the results of the past, while exploring the results of the future.  

The fascinating thing about Thompson Sampling, is that it was developed back in 1933, and largely ignored until more recently. The algorithm (or a rendition of it) has been applied in a number of areas recently, by a growing number of larger-sized companies, to solve interesting issues.

In this book, the problem that employs Thompson Sampling, is one in which a paid Subscription is offered, and the company needs to figure out how to optimize the revenue at the right price point.

Sources: 

Weber, Richard (1992), "On the Gittins index for multiarmed bandits", Annals of Applied Probability

A Tutorial on Thompson Sampling Daniel J. Russo1 , Benjamin Van Roy2 , Abbas Kazerouni2 , Ian Osband3 and Zheng Wen4 1Columbia University 2Stanford University 3Google DeepMind 4Adobe Research

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware. VMware, of course now owned by BroadSoft, has prio...