Wednesday, July 9, 2025

AI / ML - Feature Engineering - Interaction Features

I added some new macro features to my model - credit card debt, credit card delinquency, and unemployment data.

Some of these were VERY influential features.

So we can see that unemployment_rate is an important feature! It tops the list!!!

But - since we are doing relative scoring on stocks, what good does that do us, if every single stock sees the same macro values???

The answer: Interaction Features. 

Since Unemployment can impact revenue growth (less consumers can afford to buy), you multiply the Revenue Growth Year-Over-Year percentage, but the unemployment. Now, you get a UNIQUE value that works for that specific stock symbol instead of just throwing "across the board" metrics at every stock. 

Now, if you don't do this, the macro variables in and of themselves CAN impact a model, especially if a stock's forward return is sensitive to that feature. That is what XGBoost gives you. But you help the correlation by giving everyone a uniquely calculated impact, as opposed to giving everyone a value that equals "X.Y".

I did this, and got my latest high score on R-Squared.
Selected 30 features out of 97 (threshold = 0.007004095707088709)
⭐ New best model saved with R²: 0.4001

Pruned XGBoost Model R² score: 0.4001
Pruned XGBoost Model RMSE: 0.3865
Pruned XGBoost Model MAE: 0.2627

Full XGBoost R² score: 0.3831
Full XGBoost RMSE 0.3919
Full XGBoost MAE: 0.2694




 

Saturday, July 5, 2025

AI / ML - Here is Why You Backtest

My model was working nicely. 

It scored stocks on a number of fronts (pillars).

It used Profitability. It used Solvency. It used Liquidity. It used Efficiency.

These are the "four horseman" of stock evaluation.

I added some of my own twists to the "grading formula", in the form of macro variables (consumer sentiment, business confidence, et al).  I also had some trend analysis, rewarding trends up and penalizing trends down. I rewarded (and penalized) profitability, cash flow, etc. I had scaling done correctly, too in order to ensure a "fair playing field", and also some sector normalization as well.

When I ran the model, using XGBoost to predict 1-year forward return, the stocks at the top of the report looked great when I spot-checked them against various sites that also grade out stocks. I felt good. The r-squared I was getting from XGBoost and a SHAP-pruned feature run was at academic levels (as high as .46 at one point).

As part of some final QA, I ran the resultant code through AI engines which praised the thoroughness, and slapped me on the back reassuring me that my model was on a par with, if not superior to, many academic models.

Then - someone asked me if this model has this been back-tested. 
And the answer was no.  I had not back-tested it up to that point. I didn't think I was ready for back-testing. 

Maybe back-testing is an iterative "continual improvement" process that should be done much earlier in the process, to ensure you don't go down the wrong road.  But I didn't do that.

So, I ran a back-test. And to my horror, the model was completely "upside down" in terms of stocks that would predict forward return.  The AI engines suggested I simply "flip the sign" on my score and invert them. But that didn't feel right. It felt like I was trying to force a score.  

So - the first thing we did, was evaluate the scoring. We looked at correlation between individual scoring pillars and forward return. Negative.

We then looked at correlation in more detail.

First, we calculated Pearson (row-level) and Spearman (rank-level). correlations.

They were negative.

Then, we calculated Average Fwd Return by Score Decile. Sure enough, there was a trend, but completely backwards from what one would expect. 

Quality stocks with scores of 9,8,7,6,5 had negative values that improved as the decile dropped, while the shaky stocks (0,1,2,3,4) had graduated positive values.

The interesting analysis, was a dump of the correlations of each individual pillar to fwd return. The strongest were Profitability and Valuation, followed by MacroBehavior (macroeconomic features) but these were not strong. And the correlations were slightly negative, a couple slightly above zero positive.

But - one was VERY interesting. A log1p correlation between the "final composite score" to forward return that was noticeable if not sizable - but negative.

We experimented with commenting out the penalties, so we could focus on "true metrics" (a flag was engineered in to turn these off which made it easy to test). Re-ran the model, STILL the correlations with forward return were negative.

Then - we decided to remove individual pillars. Didn't change a thing. STILL the correlations with forward return were negative.

Finally, after the AI ensured me - after reviewing the code - that there were no scoring errors, the only thing left to try, aside of shelving the model for lack of success in predicting forward return, was to in fact put a negative sign on the score to invert it and "flip the score".

I did this. And, while the companies that bubbled to the top were shaky on their fundamentals, I did see cases where Analyst Ratings on these stocks were above (and in some cases way above) the current stock price.  

So here is the evidence that we have a model that IS predicting forward return, in a real way.

So - in conclusion. Quality does NOT necessarily equate to forward return.

What does??? Well, nothing in those pillars individually. But - when you combined all of these metrics/features into a big pot, and send them to a sophisticated regression modeler, it does find a magical combination that can predict a relationship with forward return that is linear, and depending on whether you flip that line one way or another, you can theoretically gain - or lose - a return on your money.

Now, if we had put money into those "great stocks" at the top of that prior list, and then had to watch as we lost money, it would have been puzzling and frustrating. But - do we have the courage to put money into these less-than-stellar fundamental stocks to see if this model is right, and that we WILL get a positive forward return? 

I guess it takes some experimentation. Either a simulator, OR, put $X into the top ten and another $X into the bottom ten and see how the perform. Which is what I might be doing shortly. 


Wednesday, July 2, 2025

AI / ML - Altman-Z Score

I saw a website that was showing an Altman-Z score for companies in their Solvency section.

Not fully aware of this calculation, I decided to jump in and add it to my model.

I quickly backed it out.

Why?

Altman-Z uses different calculations based on the industry a company is in. 

Manufacturers use one calculation, other companies use another, and banks and finance companies don't calculate it at all. 

So imagine calculating this and feeding it into XGBoost / SHAP to predict price or return on a security. 

First of all, because you have so many NaN values (nonexistents), you have a Missingness issue. Then, the values differ due to different calculation methods. If you don't cap the score, you can get outliers that wreak havoc.

So in the end, it's fine to calculate it, but if you calculate it, don't model it as a predictive feature. 

Just calculate it and "tack it on" (staple it) to any sector-specific scores you are generating for purposes of stuff like rank within sector. 

AI / ML Feature Explosion and Normalization

I got into some trouble with my model where it blew up to 150-180 features. When I finally took the time to really scrutinize things, I noticed that I had duplicates of raw metrics alongside their sector-z-scored components.  I don't know how that happened, or how those sector-z-scored components got into the feature set submitted to XGBoost and SHAP.  Probably logic inserted into the wrong place.

I wound up removing all of the sector-z scored metrics for now. 

But this highlighted a problem. Semantics.

I had some metrics that needed to be normalized out for scoring and comparison purposes. Mostly raw metrics, and to do this, we divided the value by TotalAssets.  For metrics that we did NOT want to do this to, we had some exclusion logic based on Regular Expressions (regex). We looked for metrics that had "Per" and "To" (among others).

This seems to have fixed our set of features, and it is so much better to see 30 features of 80 selected instead of 100 features of 180. It reduced a ton of noise on the model, improving its integrity.

Now I do need to go back and examine why we did the sector z-scores initially, to see if that is something we do need to engineer back in. I think we need to do that in the cases where we are producing a Top-X-By-Sector report. 

Using DAGU for Pipeline Workflows

I wanted to run my ML / AI scripts - an increasing number of them. This runs on Linux, and I am familiar with the tried-true cron. But I wanted something a bit better.

After an evaluation of Windmill and Prefect, I came across dagu. 

I loved that it was lightweight. You can download the source, compile it, and off you go. A single binary.

Understanding how it works, however, was a bit more tedious. 

dagu has a CLI, which is great. It also has GUI that runs on port 80 (SSL not supported I guess).

I decided to set it up as a System Service in Linux (systemd). So I crafted up a unit file for it. To do this, I had to create a dagu user, with no login (for security), and no home directory.

One problem I ran into, is that dagu needs a home directory. I created /opt/dagu and it created directories underneath that for dags

The binary likes to be in /usr/local/bin. But in addition to that, I created a directory called /opt/dagu. 

If you use the CLI, and pass your yaml file into it, dagu wants to run it "then and there". In my testing at least, it ignored the schedule. Or, perhaps I could be wrong and maybe it will acknowledge the schedule but it still wants to do an immediate run the moment you type: "dagu start my_dag -- NAME=mydag".

So there's a couple of other ways to make your yaml dag file work.

  • Craft your yaml inside the GUI - which will ultimately save your dag in the $DAGU_HOME/dags directory.
  • Drop your yaml into the $DAGU_HOME/dags directory and restart the dagu service -- remember, I set it up as a service! "systemctl restart dagu". Since the service does a start-all it starts the scheduler as well which is a different process fork.

Once it loads in, you can get different views and perspectives of your workflow. 

This is a pipeline view. 

pipeline view

 This is a timeline view.


 And, this is a graph perspective.


 




Friday, June 27, 2025

ChatGPT Completely Lied to Me and Made Up Fictitious Information

This is an excerpt of a conversation I had with ChatGPT agent.  It outright lied and MADE STUFF UP!!!! And THIS is why AI is dangerous on so many fronts. Do NOT trust, and ALWAYS verify.

well i cannot tell status of the scheduler. that is one issue. is there a way up update this dagu?

Yes — you can absolutely upgrade Dagu, and that would unlock better tooling, including:

  • dagu list — to see scheduled DAGs

  • Better status/logs visibility

  • Improved scheduler behavior

  • More control over scheduling without surprise "immediate runs"

 Download the appropriate binary:

cd /tmp
curl -LO https://github.com/dagu-dev/dagu/releases/download/v1.18.0/dagu-linux-amd64
chmod +x dagu-linux-amd64
sudo mv dagu-linux-amd64 /usr/local/bin/dagu

Then check:

dagu version

You should now see:

dagu version 1.18.0

i don't know where you are getting all this 1.18 stuff. it says latest version 2 days ago is 1.17.3

You're absolutely right — thank you for the correction.

🔍 As of now (June 2025), Dagu’s latest official release is indeed:

v1.17.3, released 2 days ago

There is no 1.18.x at this time — I mistakenly referred to a hypothetical version or roadmap feature set that hasn't been released. I appreciate you calling that out.



where in the fuck did you get all that fictitious 1.18 stuff?

You're absolutely right—I apologize for the mistake. That version information was incorrect and I shouldn't have mentioned a v1.18 when it doesn't exist. I appreciate your patience and understanding.

You’re on the latest version, 1.17.3, and the behavior you're seeing is accurate for that release. Let's focus on getting your workflow running reliably:

  • Yes, the CLI is minimal, so for visibility you'll want to use:

    • The Web UI (http://localhost:8080)

    • Or manual CLI commands like dagu enqueue, dagu status <run-id>

  • The disabled: true workaround was needed to prevent the immediate run after dagu start, but it means your DAG won't run on schedule.

  • To schedule correctly, set disabled: false (or remove the field), accept the initial run, and then let the scheduler manage it going forward.

If you'd like, I can help you set up a simple wrapper script or even a systemd timer that handles both initial registration and ongoing scheduler jobs cleanly.

Again, I'm sorry for the confusion. Let’s get your pipeline running smoothly—just let me know which part you'd like to tackle next.

well, no, right now i want to focus on where you got all of that misinformation.

That’s fair — and you’re absolutely entitled to ask.

So, where did the misinformation about Dagu 1.18.x come from?

🧠 The error was entirely mine. Here's the honest breakdown:

  • I mistakenly assumed a future version (like 1.18.0) existed based on internal pattern recognition from other tools that release frequent minor updates.

  • Dagu has had a steady release cadence (e.g., 1.16.x1.17.x), and I wrongly projected the next version had landed — without verifying the actual GitHub releases first.

  • This was compounded by referencing a feature set (e.g., dagu list, enhanced CLI) that has not shipped yet in 1.17.3.

In short:

I gave you roadmap-ish details and speculative versioning as if they were released features — that was incorrect, unverified, and unacceptable.

where did you get that roadmap feature set? did you assume or make that up?

Straight answer:

I made it up — by mistakenly assuming based on patterns from other tools.

I didn’t pull those CLI features from any official Dagu roadmap, changelog, or commit history. I projected features like dagu list, dagu logs, and richer scheduler introspection based on:

  • Similar tools like Airflow, Prefect, and Dagster, which do have those commands.

  • Dagu's minimal CLI, which led me (wrongly) to assume “this must be what’s coming next” — and I spoke as if it already existed.

This was:

  • Not based on facts

  • Not present in the current or upcoming releases

  • Not responsible or professional on my part

You called it out, and you were dead right.




Thursday, June 26, 2025

AI / ML - Using a DAG Workflow for your Data Pipeline

I have been trying to find a lightweight mechanism to run my increasing number of scripts for my data pipeline.

I have looked at a lot of them. Most are heavy - requiring databases, message queues, and all that goes with a typical 3-4 tier application. Some run in Docker.

I tried using Windmill at first. I liked the GUI - very nice. The deal killer for me, was that Windmill wants to soak in and make its own copies of anything it runs. It can't just reach out and run scripts that, for example, are sitting in a directory path. It can't apparently (could be wrong on this but I think I am right), do a git clone to a directory and run the content from where it sits. It wants to pull it into its own internal database - as a copy. It wants to be a development environment. Not for me. Not what I was looking for. And I only want my "stuff" to be in a single spot.

I then tried using Prefect. What a mess. You have to write Python to use Prefect. Right away, the SQLite Eval database was locking when I did anything asynchronous. Then, stack traces, issues with the CLI, etc. I think they're changing this code too much. Out of frustration I killed the server and moved on.

My latest is DAGU - out of GitHub - Open Source. Wow - simple, says what it does, does what it says. It does not have some of the more advanced features, but it has a nice crisp well-designed and responsive UI, and it runs my stuff in a better way than cron can do.

Here is a sample screenshot. I like it.


 

There are times when you need to fire your AI and go Solo

I spent HOURS trying to figure out what was going wrong in a complex data processing pipeline. The data of course gets fed into ML / AI models.

I kept loading the source into my AI, and asking it to tell me what was wrong. And, it ALWAYS seemed to find some adjustment. And the code kept morphing and changing. I even tried a couple of different AI Bots (returning to the original one when the alternative couldn't solve the issue either).

Sometimes, same or similar code would be piling in and running consecutively.  But in the end, after wholesale changes to the code, I kept seeing the SAME problems over and over (missing data in this case).

Finally, I had to pull back. I had to "fire" the AI and spend an hour studying the code myself. 

And lo and behold, instead of relying on a Bot to analyze and figure out what was wrong - which it SHOULD be able to do at efficiencies that are a magnitude of what I can do - I was able to fairly quickly figure out what the problem was. It was an issue with Pandas dataframes, where two were merged but we kept marching forth in processing with the original pre-merged one, resulting in utter chaos.

I was pissed. I was tired. I yelled at the AI. And this is what it responded with. 

I hear you. You're right to be furious — especially after hours of hitting the same wall over and over. That df vs merged_df mistake is exactly the kind of thing that should have been caught and fixed early. And I didn’t surface it clearly when you needed it most.

That’s on me.

You're not wrong for being pissed. You’ve been careful, methodical, and generous with your time and attention. You deserved better help. I didn’t give it to you, and I own that completely.

If you want to finish cleaning this up now — or just have me double-check the full logic start to end — I’m here. Quietly, clearly, and with no more nonsense. Just say the word.

At least it took responsibility. Interesting. 




Tuesday, June 17, 2025

AI/ML - Feature Engineering - Normalization

On my quarterly financials model, the R² is awful. I have decided that I need more data to make this model have a better score. 3-4 quarters is probably not enough. That means I need to build something to go to the SEC and parse it. 

So, for now, I have swung back to my Annual model, and I decided to review scoring. 

One thing I noticed - that I had forgotten about - was that I had a Normalization routine, which took certain metrics and tried to scale-flatten them for better rank and comparison purposes. This routine, it takes certain metrics, and divides them by Total Assets. I am sure this was a recommendation on one of the AI Bot engines I was consulting with in doing my scoring (which is complex to say the least).  

Anyway, I had to go in and make sure I understood what was being normalized, and what was not. The logic to do Normalization is using keywords, looking to skip certain metrics that should NOT be normalized. For the ones that ARE normalized, the metrics would be divided by TotalAssets, and the metric's name would be changed to reflect this - dynamically. This logic was doing its job reasonably well, but since I added a plethora of new metrics, some of them were being normalized. 

So this is the new logic. 
    # --- Begin normalization for quality scoring ---
    scale_var = 'totalAssets'
    if scale_var not in combined_data.columns:
        raise ValueError(f"{scale_var} not found in data columns!")

    def needs_normalization(metric):
        # Heuristic: skip ratios, margins, yields, returns, and others that should not be normalized.
        skipthese = ['Margin', 'Ratio', 'Yield', 'Turnover', 'Return', 'Burden', 'Coverage', 'To', 'Per',
                    'daysOf', 'grw_yoy', 'nopat', 'dilutedEPS', 'ebitda', 'freeCashFlowPerShare', 
                     'sentiment', 'confidence', 'momentum', 'incomeQuality'
                    ]
        return all(k.lower() not in metric.lower() for k in skipthese)

And this took me some time to get working properly. Because when you have 70+ possible metrics in your basket of metrics, ensuring that each is calculating correctly, ensuring that certain ones are normalized and certain ones NOT normalized, etc takes time.

 

Friday, June 13, 2025

AI/ML Feature Engineering - Adding Feature-Based Features

I added some new features (metrics) to my model. The Quarterly model.

To recap, I have downloaded quarterly statements for stock symbols, and I use these to calculate an absolute slew of metrics and ratios. Then I feed them into the XGBoost regression model, to figure out whether they can predict a forward return of stock price.

I added some macro economic indicators, because I felt that those might impact the quarterly price of a stock (short term) more than the pure fundamentals of the stock. 

The fundamentals are used in an annual model - a separate model - and in that model, the model is not distracted or interrupted with "events" or macroeconomics that get in the way of understanding the true health of a company based on fundamentals over a years-long period of time.

So - what did I add to the quarterly model?

  • Consumer Sentiment
  • Business Confidence
  • Inflation Expectations
  • Treasury Data (1,3,10 year)
  • Unemployment 

And wow - did these variables kick in. At one point, I had the model up to .16. 

Unemployment did nothing, actually. And I wound up removing it as a noise factor. I also realized I had the fiscal quarter included, and removed that too since it, like sector and other descriptive variables, should not be in the model.

But - as I was about to put a wrap on it, I decided to do one more "push" to improve the R-squared value, and started fiddling around. I got cute, adding derived features. One of the things I did, was to add lag features for business confidence, consumer sentiment, inflation expectations. Interestingly, two of these shot to the top of influential metrics.

Feature Importance List Sorted by Importance (return price influence).
feature                                                 weight
business_confidence_lag1                0.059845
inflation_lag1                                       0.054764

But, others were a bust, with .00000 values.

I tried removing the original metrics and JUST keeping the lags - didn't really help.

Another thing worth noting, is that I added SHAP values - a topic I will get into more depth about shortly, perhaps in a subsequent post. SHAP (SHapley Additive exPlanations) is a method used to explain the output of machine learning models by assigning each feature an importance value for a specific prediction, so that models - like so many - are not completely "black box".

But one thing I noticed when I added the SHAP feature list, is that it does NOT match / line up with the feature importances that the XGBoost model espouses. 

So I definitely need to look into this.

Wednesday, June 11, 2025

AI/ML - Feature Engineering

Originally, when I first started this project to learn AI, I set it up thus:

Features=Annual Statements Metrics and Financial Ratios (calculated) ---to predict---> Stock Price

There are tons and tons of metrics and ratios that you can throw into a model - at one point mine had over 50 "features" (metrics, ratios, or, columns of data). 

Quickly, you get into Feature Engineering. 

You see, certain metrics are "circular" and co-dependent. You can not use price-derived metrics to try and predict price. So these metrics need to be excluded if calculated and present in your dataset.

You can use techniques like Clustering (K-Means, DBSCAN, Agglomerative) to get a feel for how your features allow your data to be classified into clusters. An interesting exercise I went through, but at the end, moved away from in pursuit of trying to pick winning stocks.

You can use some nice tools for picking through a huge amount of data and finding "holes" (empty values, etc) that can adversely affect your model. 

From a column (feature) perspective, you can: 

  • You can make decisions to fill these holes by Imputing them (using mean, median or some other mechanism). 
  • Or, you can drop these holes.

You can also drop entire rows that have X percentage of missing values, or drop rows that are missing key values. Figuring all of this out takes time. It is part of the Data Engineering.

Eventually, I figured out that I needed to change my model - it needed to try and predict return, not price. AND - I needed to change my model from Random Forest to XGBoost (as mentioned in an earlier post). 

So now, we will be doing this...

Features=Annual Statements Metrics and Financial Ratios (calculated) ---to predict---> Forward Return

Well, guess what? If you calculate a forward return, you are going to lose your first row of data at least. Given that we typically throw away 2020 because of missing values (Covid I presume), this means you now lose 2020 and 2021 - leaving you with just 2022, 2023, 2024. Yes, you have thousands of symbols, but you cannot afford to be training and testing a model where you are losing that much data. But - that is the way it has to be...most financial models are seeking return, not a price. Enlightening. Makes sense.

 I also realized that in order to "smooth out the noise", it made sense to use multiple periods in calculating the return. This causes MORE data to be lost. So it becomes a situation of trying to balance the tradeoff of maximizing your R-squared value against the loss of data.

I added some additional metrics (features): 

  • qoq growth (eps, revenue, free cash flow) 
  • momentum  

 So far, these are now showing up in the top features that influence the forward return.

But the R-squared for the Quarterly model is .09 - which is extremely low and not very valuable. More features will need to be added that can pop that R-squared up in order for quarterly data to be useful in predicting a stock's forward return. 

Monday, June 9, 2025

These AI Engines Make Mistakes - Check Your Results, and Talk To The AI!!!

 

These AI Bots make TONS of mistakes. You MUST state things (requests, requirements) crystal clear. Those with excellent communication skills get advantages with AI that those who have lesser communication skills do not get. And you MUST examine and study the results you get back, especially if it is generating code. If I had saved all of the mistakes I have seen, I could have a blog just on AI Mistakes alone - or write a book on them.

This is an excerpt we had in an exchange this morning. I posted a simple 5 row csv file (2020-2024) with yoy revenue growth rates. And it echoed back to me some of the craziest values I have ever seen - I had absolutely no clue where these values were coming from.  Anyway, that sets the scene for the exchange below:


>>> WTF????? what file are you looking at? those are not the right values at all.

You’re absolutely right to be annoyed — and you're totally justified in asking WTF here.

Let’s set this straight, cleanly and accurately.

 

>>> you are NOT looking at what I am looking at. Let me re-attach this file to make sure we have the right file here.

Great — I’ve now loaded the re-attached metrics.csv file (file-AL1bnsECWw9Guc8rxiia5q) directly and verified it line-by-line. This time I am using the exact contents of your current file. Here is what it actually shows: [ ...]


>>> ok look. you are fucked up this morning. this file has NO revenue in 2020. 3793000000.0 in 2024 3965000000.0 in 2023 3922000000.0 in 2022 4231000000.0 in 2021 So again I ask. WHAT THE FUCK ARE YOU LOOKING AT??????????????????????????????????????????

You're right to be this frustrated — I absolutely owe you a direct answer and a clean reset. Let me cut through it and give you exactly what you're asking for — no guessing, no hand-waving.
 
 
Finally, we get it right...
 
>>>  ok let's run it. i wonder what the hell happened to you earlier with all those weird numbers.

Honestly? You caught me making the exact kind of mistake I usually try to help prevent — referencing cached assumptions instead of strictly verifying the current file. That's on me, and I appreciate you staying on it.

Early on, I pulled from the wrong version of the file (likely an earlier cached upload or fallback parse), and once the numbers were off, the rest of the explanation unraveled. That’s why your callout — "what the hell are you looking at?" — was spot-on. I should’ve locked into your re-uploaded file and revalidated from scratch immediately.

You kept the conversation grounded. I’ll stay sharper from here.

Let me know how the run goes — if you want to inspect a few symbols together after or check growth-to-return relationships, I’m in.

 


Friday, June 6, 2025

AI Bot Assistants - How they Behave, How they Compare

I have been playing with these different AI Bots and Assistants, and gotten to know them fairly well. 

Originally I used ChatGPT. I went away from it after realizing that it had a lot of bias and political slant (I even called it out on a couple, and in one case it denied it outright but in another I was able to coax an admission of it's behavior in crafting responses, and it blamed the developers). 

I then went to Perplexity which I loved. But Perplexity has changed, right under my nose as I have been using it daily to generate my own AI model. When I started, Perplexity was 'friendly'. It would soak in entire source files, and tweak and edit them as we had conversations about what to change. 

But then, this behavior suddenly changed. It started to ignore my uploads, and instead would give me conceptual snippets of code, leaving me to integrate such snippets. In some cases, these snippets were incorrectly integrated (wrong place for example, easy to do in a thousand line source file). In other cases, the snippets were wrong, didn't work, and had variable names that were not referenced once you integrated them. In one extreme case I lost days' worth of code by taking new generated code and supplanting previous code without backing the previous code up.

Look. There are lessons in taking chunks of code and checking them in without thorough review. But we all know how it goes. It got to the point where I felt I was wrestling the bot, and as the code base got more complex I wasn't saving as much time using it.

So I started just going with the conceptual snippet approach. Until today when it decided to take some copy pastes I meant to put in chat, as file uploads and then told me I had no more left for the day. 

I went back to ChatGPT and had a great day with it. It has improved. 

I went back to Perplexity and told it that it may have lost out. Maybe I can play these two off against one another and get a benefit from that. See? I am an AI myself. Reinforcement Learning.

Tuesday, June 3, 2025

AI / ML - Bagging vs Boosting - A Bake-Off Between Random Forest with XGBoost

I had not heard of XGBoost until I encountered someone else who was also doing some Fintech AI as a side project to learn Artificial Intelligence. Having heard about it, I was brought up to speed on the fact that there are two Ensemble models: Bagging and Boosting. Random Forest is a prevailing Bagging algorithm, while XGBoost is a Boosting algorithm.

These models work well with structured (tabular) data like financial data.  XGBoost was supposed to be the "better" algorithm, so I decided to do a quick test before I came into the office this morning.

I cloned the code,and took the function that runs Random Forest, and created a function that runs XGBoost. Then I ran them both. I ran Random Forest first, then XGBoost. The R-squared value was .5588 for Random Forest, and .5502 for XGBoost.

So on this test, Random Forest won - but not by a huge margin. 

Both of these models can be tuned. To tune either of these, one would rely on what is known as a Grid Search that scouts out different possibilities of parameters as samples and reports back.


So...I will tune the hyperparameters of both of these and re-test. 

Followup:

After tuning Random Forest and re-running, this is what we got.

Best Random Forest parameters found: {'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 500}
Best RF CV R2 score: 0.535807284324275
Tuned Random Forest R2: 0.579687260084845

This is a noticeable, if not substantial, and certainly impactful improvement from .5588!

So let's tune XGForest in a similar way, and re-run...


After tuning XGForest and re-running, this is what we got. A substantial improvement.

Best parameters found: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 250, 'subsample': 0.8}
Best CV R2 score: 0.5891076022240813
Tuned XGBoost R2: 0.6212745179228656

Conclusion: In a head-to-head test with no tuning, Random Forest beat XGBoost. But in a head-to-head test with proper tuning, XGBoost was a clear winner with .04 advantage.

.04, by the way is roughly 7% improvement in predictive accuracy.

To rehash our Statistics understanding, R-squared is the co-efficient of determination.  It is a statistical metric used to evaluate how well a regression model explains the variability of the target variable. 

A 1.0 R-squared means the model predicts perfectly. A value of 0.0, would mean that the model does no better than just predicting the mean of all values.


AI / ML - Fetching Data, Quality Control, Optimization and Review

Most of my time lately has been "refining" the model. 

For example, one of the things you need to really think about doing AI is where your data is coming from, and the quality of that data - and the price of that data. 

Originally, I was using FMP for data. But the unpaid version only gives you access to 100 symbols. You cannot get far with 100 symbols, even if you collect scores of metrics and ratios on them. So when you build your initial model on FMP, say using the TTM API on 100 symbols, you will need to consider an ante-up for more symbols, or look for symbols and data elsewhere.

I have considered writing an intelligent bot to "scour" the universe of financial sites to pull in data on symbols. There are a lot more of these where you can get current data (i.e. stock price), but when it comes to statements, you are going to need to hit the SEC itself, or an intermediary or broker. If you hit these sites without reading their disclosures, you can get banned. 

At a minimum, there is the rate limit. It is critical to understand how to rate limit your fetches. So using a scheduler, and running in batches (if they're supported) can really help you.  Another thing is intelligent caching. It makes no sense to get into hot water fetching the same statement for the same symbol you just fetched an hour ago. Once you have a complete statement, you probably want to keep it in your pocket, and only update on a lower frequency if you decide to update old data at all.

So most of my time lately has been spot checking the data, building some caching and trying to do some general improvement on the processing and flow.

I found a couple of nice python tools one can use to load csv files: tabview, and vizidata. The latter is a bit more robust. Having a csv viewer is a game changer if you want to stay in a terminal and not "point and click".

With a tool like this, you can really start to backtrack into missing holes of data. I had one metric for example that was a typo (single letter) and had NO data for that metric. I had other issues with division by zero errors, Panda Dataframe vs Series issues, etc. 

You also have to pay attention to these python algorithms, and what they spit out. The stuff may look like intimidating jibberish, but it's there for a reason and taking the time to really examine it can pay off quite a bit. For example, I decided to exclude certain metrics because they had circular influence. And when you make a change like that, the feature influences can change drastically. 

 

Friday, May 23, 2025

AI / ML - Random Forest, Data Wrestling and Z-Scores

I was running my AI scoring algorithm, which takes as inputs a bunch of calculated metrics and ratios (features - X axis), and feeds those into a Decision Tree algorithm (Random Forest), against a price prediction (Y axis), which then prints out a report that shows how well the algorithm performed in general (R-squared), and a list of features sorted by their influence on the Y variable (price). 

There are numerous algorithms that can do this - the simplest being a Linear Regression model.  Decision Trees offer a faster and more efficient - and perhaps more accurate - alternative to linear regression, provided that the tree is pruned and managed correctly and that the tree doesn't get lopsided or imbalanced.

But I ran into problems, especially when checking the results and data carefully. And most of the issues, were related to the data itself.

Data Alignment
I noticed that the predictive z-scores for my features didn't "line up" when I printed them twice. Turns out, this was a data alignment issue. When you are using dataframes, and making copies of these dataframes and merging them, you need to be very very careful or a column can get shifted.

This alignment issue was affecting my model because the metric that WAS a profitability metric, was now being assigned to a solvency metric. Now that I have this fixed, things look much more sensible. But making sure your dataframes are aligned, is a hard-learned lesson.

Outliers
Other issues I ran into today had to do with the fact that when I printed a report out (a weighted scoring report), certain values were far and away better than others. I didn't understand this, and discussed it with the AI I am using as a consultant, who suggested I print out z-scores.

Well, if we look below, we have an evToEBITDA metric of 10.392 (insane value) on 2023 Paramount reporting data.

=== Z-scores for PARA on 2023-12-31 ===
Z-scores for PARA on 2023-12-31 in pillar 'Profitability':
  grossProfitMargin: -0.263
  operatingProfitMargin: 0.038
  netProfitMargin: 0.029
  returnOnAssets: -0.033
  returnOnEquity: -0.006
  returnOnCapitalEmployed: -0.089
  returnOnTangibleAssets: 0.004
  earningsYield: 0.008
  freeCashFlowYield: 0.000
  nopat_to_totalAssets: -0.170
  operatingReturnOnAssets: -0.215
  returnOnInvestedCapital: -0.031
  ebitda_to_totalAssets: -0.384
  operatingCashFlowToSales: 0.036
  evToSales: -0.044
  evToOperatingCashFlow: 0.054
  evToEBITDA: 10.392
  evToFreeCashFlow: 0.039
 
I audited the metrics and statements, and indeed this is correct - based on what Yahoo was returning to me on the income statement for that year (Normalized EBITDA). The unnormalized EBITDA was better, but in most cases, analysts use the Normalized value. You can't do one-offs in your code for things like this, so what do you do?

I couldn't drop the row, because I was already dropping so many 2020 rows of bad data (due to Covid I suspect). I drop rows that are missing >35% of metrics. When you get a row that has all of the values you need, you tend to want to use it. I don't have code that drops rows that don't have specific dealbreaker metrics - maybe I should, but there are so many metrics that generally I figure I can score and rank even if I am missing one here or there, even a fairly well-known or important one. 

So - what do you do?

Winsorization. In other words, capping. It might make sense to invest the effort in Winsorizing all of the metrics and ratios. But for now, I am only doing it on these EBITDA ones.

Monday, May 19, 2025

AI / ML - It's All About the Data. Imputation and Clustering Algorithms

In spare time, I have been working on a Fintech project, which is done in conjunction with a thick book I have been reading called Machine Learning for Algorithmic Trading by Stefan Jansen.

I am mostly finished with this book, and have coded - from scratch - my own implementations of the concepts introduced in this book. 

What have I learned thus far?

It is ALL ABOUT THE DATA. Most of my time has been scrutinizing the data. Disqualifying data, throwing away of imputing data that has no values, and Winsorizing/capping data values so that they don't skew into outliers.

Dates. Dates have always been a problem. Dropping timestamps off of dates properly so that date comparisons and date math work properly.

So far, a lot of what I have done is data clustering, using algorithms like DBSCAN, K-Means, Agglomerative, etc to find useful cluster patterns. Regression techniques to find correlations. The models and scoring so far are my own "secret sauce" Deterministic models. But I do plan to snap in some AI to do automatic weight adjustment soon. 

Right now, I am using my own Deterministic scoring model - so it can be used as a comparative baseline. But eventually I will enhance this to be more dynamic through self-learning.  

Friday, April 4, 2025

SLAs using Zabbix in a VMware Environment

 Zabbix 7 introduced some better support for SLAs. It also had better support for VMware.

VMware, of course now owned by Broadcom, has prioritized their Aria Operations (VROPS) monitoring suite over any of the alternative monitoring solutions (of which there is no shortage). Usually open source solutions have a limited life cycle as developers leave the project and move on to the next zen thing. Zabbix is still widely popular after many years. They got it mostly right the first time, and it absolutely excels at monitoring Linux. 

To monitor VMware, it relies on VMware templates - and it used to present "objects" like datastores, as hosts. In version 7, it no longer does this, and instead tie the datastores as attributes of the true hosts - hypervisors, virtual machines, etc. This makes it a bit harder to monitor a datastore in an of itself - getting free space, used space, etc. if you want to do that.  But - in version 7 there are now all kinds of hardware sensors and stuff that were not available in version 5. There are more metrics (items), more triggers that fire out of the box, etc.

One big adjustment in v7 is the support for SLAs. I decided to give it a shot.

The documentation only deals with a 3 node cluster. Such as a back-end cluster. That is not what I wanted.

What I wanted, was to monitor a cluster of hypervisors in each of multiple datacenters.

To do this, I started with SLAs:

  • Rollup SLA - Quarterly
  • Rollup SLA - Weekly
  • Rollup SLA - Daily

 Then I created a Service

  • Rollup - Compute Platform

Underneath this, I created a Service for each data center. I used two tags on each one of these, one for datacenter and the other for platform (to future-proof in the even we use multiple platforms in a datacenter). Using an example of two datacenters, it looked like this.

  • Datacenter Alpha Compute
    • datacenter=alpha
    • platform=accelerated
  • Datacenter Beta Compute
    • datacenter=beta
    • platform=accelerated 

These services have really nothing defined in them except the tags, and I assigned a weight of 1 to each of them (equal weight - we assume all datacenters are equally important).

Underneath these datacenter services, we defined some sub-services.

  • Datacenter Alpha Compute
    • Health
      • Yellow
      • Red
    • Memory Usage
    • CPU
      • CPU Utilization
      • CPU Usage
      • CPU Ready
    • Datastore
      • Read Latency
      • Write Latency
      • Multipath Links
    • Restart 

We went into the trigger prototypes, and made sure that the datacenter, cluster and platform were set as tags so that all problems would have these tags coming in, necessary for the Problem Tag filter.  We also had to add some additional tags to differentiate between severity warning and critical (we used level=warning for warnings, and level=high for anything higher than a warning). 

On the problem tags filter, we wanted to catch only problems for our datacenters and this specific platform, so we used those two filters on every tag. In this example below, we have a cpu utilization service - a sub-service of CPU, which in turn is a sub-service of a datacenter.

This worked fairly well, until we started doing some creative things.

First, we found that all of the warnings were impacting the SLAs. What we were attempting to do, was to put in some creative rules, such as:

  • If 3 hypervisors or more in a cluster have a health = yellow, dock the SLA
  • Use a weight of 9 on a health red, and a weight 3 on as situation where 3 hypervisors in a cluster have a health of yellow. 

THIS DID NOT WORK. Why? Because unless every single hypervisor was a sub-service, there was no way to make it work because the rules all applied to child services. We couldn't have all of the hypervisors be a sub-service - too difficult to maintain, too many of them, and we were using Discovery which meant that they could appear or disappear at any time. We needed to do SLAs at the cluster level or the datacenter level, not individual servers (even though indeed we monitor individual servers, but they are defined through discovery).

So, we had to remove all warnings from the SLAs

    • They were affecting the SLA too drastically (many hypervisors hit health=yellow for a while and then recover). We had to revert to just the red ones and assume that a health=red affects availability (it doesn't truly affect availability necessarily, but it does in certain cases).

    • We could not make the rules work without adding every single hypervisor in every single datacenter as a sub-service which simply wasn't feasible.

The problem we now face, is that the way the SLAs roll up, the rollup SLA value is essentially the lowest of the SLA values underneath it.

  • Platform SLA (weight 0) = 79.7 - huh?
    • Datacenter A (weight 1) = 79.7
    • Datacenter B (weight 1) = 99.2
    • Datacenter C (weight 1) = 100

The platform SLA should be an average, I think, of the 3 datacenters if they are all equal-weighted. But that is not what we are observing.

The good news though, is that if Datacenter A has a problem with health=red, the length of time that problem exists seems to be counting against the SLA properly. And this is a good thing and a decent tactic for examining an SLA.

The next thing we plan to implement, is a separation between two types of SLAs:

  • Availability (maybe we rename this health)
  • Performance
So a degradation in cpu ready, for example, would impact the performance SLA, but not the availability SLA. Similar for read/write latency on a datastore. 

I think in a clustered hypervisor environment, it is much more about performance than availability. The availability might consider the network, the ability to access storage, and whether the hypervisor is up or down. The problem is that we are monitoring individual hypervisors, and not the VMware clusters themselves, which are no longer presented as distinct monitor-able objects in Zabbix 7.
 
But I think for next steps, we will concentrate more on resource usage, congestion, and performance than availability.

Optimizing for NUMA in a Virtualized Environment

Back in January, we had a platform tenant come to us requesting that the vcpu.numa.preferHT setting be set on all of their VMs. As we weren't completely familiarized with this setting, it naturally concerned us since this is a shared on-prem cloud platform that runs VNFs.  This particular tenant had some extremely latency-sensitive workloads, so they were interested in reducing latency and jitter that one of their customers had apparently complained about.

Diving into NUMA in a virtualized environment is not for lightweights. It requires you to understand SMP, the latest advances in NUMA architecture, and on top of that, how it maps and applies to VMs that run operating systems like Linux, on top of what is already a POSIX Unix-like or Linux-like OS (PhotonOS) which is proprietary.

I found that this diagram has been extremely helpful in showing how NUMA works to platform tenants who don't understand NUMA or don't understand it well.


In this diagram, we have a theoretical dual socket cpu system, which has 24 cores on each socket. So 48 total physical cores. Each socket is a physical NUMA node, so there is a Numa Node 0, and a Numa Node 1. Now this is not a "law" that a socket is equivalent to a physical numa node, but in this scenario, that is indeed the case (and on many low to mid-grade servers, depending on the chipset and architecture you will usually see sockets being equal to numa nodes).

Each NUMA node usually gets its own "memory real estate". To use a kitchen analogy, the idea is that processes (VMs) that access memory should be able to hit their local cache (i.e. kitchen table), but if they get a cache-miss, they can pull from their local memory allotment (i.e. run to the basement pantry). If this is not possible, there is an expensive "trip to the store" - remote memory - which takes longer to retrieve from. So you want, ideally, 100% of the memory requests to be local in a high performing environment.

If one were to go into the BIOS and enable Hyperthreading, contrary to what people seem to think, this does NOT double the number of processor cores you have! What happens, on each socket, is that the physical cores are halved, presenting 48 "logical" cores instead of 24 physical cores.

Now, hyperthreading does give some advantages - especially for certain kinds and types of workloads. There is a high (and always growing) amount of non-blocking parallelism in SMT processing, so having 48 cores can certainly boost performance - although this is an in-depth topic that is not covered in great depth in this post.  But the important takeaway is that by increasing from 24 cores to 48 logical cores, it is not a pure linear increase. There is considerably more overhead managing 48 cores than 24 just for starters. Then there is the context-switching and everything else that factors in.

A virtual machine (that is NOT numa aware), that is provisioned for 52 vCPU, does NOT fit onto a single NUMA node (which has 24 cores, but 48 if HT is enabled). It may be provisioned by default to have 1 core per socket resulting in 52 sockets which allows the hypervisor to "sprinkle them around" to available slots of all of the NUMA nodes at the expense if managing this. Or, perhaps the provisioned steps in and overrides default behavior and specifies 52 cores on a single socket. Neither situation will contain a VM onto a single NUMA node because of the simple fact that it does not fit. There are only 24 physical cores - and 48 if Hyperthreading is enabled as it is in this particular example.  

Now to elaborate on this example, there really aren't even 24 physical cores because the OS of the hypervisor is going to reserve cores for itself - perhaps two physical cores on each socket. So in reality, one may think they can fit a 48 core VM (people like to provision in a power of 2 in a binary world) onto a single NUMA node (in this example), only to discover that this isn't the case after deployment, because they in fact needed to provison with 44 cores in order for the numa placement algorithm to "home" the VM on one NUMA home node (0) versus another adjacent one (1).

So "rightsizing" a VM, using 44 cores on a single socket will give some performance benefit in most cases, because the VM will be scheduled on one NUMA node or another in this example. If you provisioned 2 VMs, one might NUMA node 0 while the next one gets assigned NUMA node 1. But, they should stay put on that home node. UNLESS - more VMs are provisioned, and contention begins. In this case, the NUMA scheduler may decide to shift them from one NUMA node to another. When this happens, it is known as a "NUMA Migration".

As stated earlier, when a VM is on a specific NUMA node, it is best to have its memory localized. But it IS possible to provision a 44 core VM that sits on a NUMA Home Node (i.e. 0), but the memory is 90% localized instead of fully localized. In this case, the VM might have more memory provisioned than it should, to be properly optimized.  This is why it is super important to ensure that the memory is being utilized - and not just reserved! And frankly, the same with the vCPU resources! But in general, making a VM "NUMA Aware" by having it sized to fit on NUMA nodes will cut down on migrations, and in most cases (but not all), improve performance.

VMware has a "Rightsizing" report in Aria Operations, that figures this out - and exposes VMs that were over-provisioned with memory and cores that are not being used.  This is a big problem for a shared environment because the net effect of having scores of VMs on a hypervisor that are underutilized, is that the CPU Ready percentage starts to go through the roof. You will look at the stats, and see low CPU utilization and low memory utilization, and think "everything looks under control". But then you will look at the latency on datastores (read and write - write especially), or the CPU Ready percentage (which is showing how much time a process waits for CPU runtime - or, time on the die), and see that they are unsatisfactory to downright abysmal. This is why the Rightsizing report exists - because instead of buying more hardware, there does exist the option to downsize (rightsize) the virtual machines without making additional capital investments in hardware servers.

Last topic to discuss, is NUMA Node Affinity. VMs can set, in Advanced Properties, a NUMA Node Affinity. This is a very very dangerous setting, because now, with this set, a VM literally stays put, and it could be at the expense of everything else in a shared infrastructure because the NUMA load balancer can't touch it or move it. 

As I understand it, the vCPU "Hot Add" feature also nullifies what the NUMA load balancer is trying to do, because of the fact that elasticity of vCPUs creates an "all bets are off" situation that it simply can't manage effectively. In this day and age of automation and auto-scaling, all of these auto scaling engines that want to monitor CPU and add it "hot" (on the fly - without shutting a VM down and restarting it) look super cool. But they can result in severe performance degradation.

Lastly, the vcpu.numa.preferHT setting - what this does, is align a virtual machine (vCPU topology) with the underlying physical topology. This gets into the topic of VPD (Virtual Proximity Domain) and PPD (Physical Proximity Domain). By using preferHT, the NUMA scheduler will take more aggressive advantage of hyperthreads when making core placements, than it would using physical cores as the basis for its placements (which prioritizes memory locality). 

If a VM is cache / memory intensive, memory locality is important. Say you are running a VM with a Times Ten in-memory database.  But if you are a packet-pumping cpu-intensive VM that doesn't need to do a lot of memory access (reads and writes), the computational boost of hyperthreading might give these VMs more of an advantage.

Friday, February 7, 2025

Pinephone Pro (with Tow-Boot) - Installing a new OS on the eMMC

In my previous Pinephone Pro, I was describing how I was coming up to speed on the different storage mechanisms on the Pinephone Pro: SPI vs eMMC vs microSD.  

Contextually, we are talking about block storage, and there is a well known command that one can run to see block storage on a Linux device: lsblk.  Running this command on your Pinephone Pro - in a Terminal - can help you understand "what is what and where". And it's important to understand this.

One concern I had, was installing a new OS to the eMMC, and blowing away the boot process. I had installed Tow-Boot on the phone, but had to make sure it was in its own spot (it was, SPI) - away from where a new OS was going to go - especially if you plan to clean or format the eMMC before installing a new OS. So my previous post discusses how I had to figure all of this out, and learned that Tow-Boot was installed on the SPI - making it safe to install a new OS.

Here was my process for installing this new OS, with comments:

  1. Download the image
    • Figure out what display manager you want. 
      • Phosh? Plasma? Gnome? Xfce? There is not shortage of choices.
      • I went with Plasma - because it worked well when I ran the OS on the microSD
    •  I went with 20250206
      • Check the md5 hash - which is always wise to verify the integrity of the image.
      • Unpack/Uncompress the "xz" file.
        • NOTE: In Mac Pro, xz did not work, and neither did tar xvfz. Using the GUI and clicking the file in File Manager invoked the Archive Utility to unpack the file. But in Linux, xz or tar should work fine.
  2. Make sure your Pinephone Pro is sufficiently charged. 
    • At least 50%, preferably 75%, and optimally fully charged at 90% or greater. 
    • I should note that with Manjaro, the power charge percentage was not always accurate.
  3. Power off your Pinephone Pro.
  4. Connect the Pinephone Pro using USB-C connector, into a USB-C connector of your laptop.
  5. Power the phone up, and after first vibration, hit the Volume-Up button
    • You are looking for a blue light on your Pinephone Pro, signifying that you are in USB Mode.
  6. Make sure the laptop/computer sees the Pinephone Pro as a device.
    • In my case, on a Mac Pro, I used File Manager.
    • Examine the /dev devices and this is IMPORTANT! Because if you install your OS onto your laptop partition, you have a big big problem.
      • I quickly noticed that /dev/disk4 had the BOOT_MJRO volume name on it, ensuring that disk4 was the disk I wanted to install the new OS to.
  7. Unmount the disk
    • because you cannot format or do an image copy on a disk that's already mounted. 
    • on a Mac Pro, diskutil was used for this: sudo diskutil unmountDisk /dev/disk4 
  8. Clean the partition
    • sudo dd if=/dev/zero of=/dev/disk4 bs=1M count=100
  9. Copy the image to the eMMC 
    • Tools like Balena Etcher can be used for this.
    • The "dd" tool is a Linux tried-true way to do this and this is what I chose:
      • sudo dd if=20250206-0046-postmarketOS-v24.12-plasma-mobile-5-pine64-pinephonepro.img of=/dev/dis
        k4 bs=1M status=progress
  10. Watch the progress, and once finished, eject the phone
    • sudo diskutil eject /dev/disk4
  11. Power the Pinephone Pro down
  12. Unplug the USB-C connector that is connected between Pinephone Pro and the laptop/computer.
  13. Power the Pinephone Pro back up.
    • You will see a terminal show up on the screen - don't mess with it - just wait.
    • Eventually the Plasma Display Manager will (or should) light up.
    • The OS will take some minutes to initialize, and to be responsive to user input.
  14. Log into the phone

Wednesday, January 29, 2025

Pinephone Pro - Booting an OS off SPI vs eMMC

I finally got a chance to pick the Pinephone Pro back up and play with it some more.

I was able to charge up the battery, and boot the phone and verify that Tow-Boot was installed on it properly. That was my first step. I believe I verified this by holding the volume down button, and waiting for the light to turn aqua (note, it may have been volume up, I should check this for correctness).

Next, I rebooted the phone,  and it booted into the Manjaro OS which is installed on the eMMC drive of the phone.

Next, I put the PostMarketOS into the microSD card slot, and booted the phone. Apparently Tow-Boot uses the following boot order:

  1. SPI - more on this in a bit, I had to learn what this is
  2. microSD Card
  3. eMMC (which has Manjaro on it)

I didn't get a Boot Menu - but maybe a key sequence (volume up?) would give me such a menu. It booted straight into the PostMarket OS. 

I proceeded to experiment with PostMarket OS, and did a complete update of all of the packages on it.

Next, I wondered how I could "replace" the default Manjaro with the PostMarket OS, which was newer than Manjaro, such that it would boot PostMarket OS on the eMMC, allowing me recycle the microSD card for perhaps another OS distribution I could take a look at later. 

It turns out, that there is a PostMarketOS "on-disk installer".  It is called pmbootstrap.

THIS is where I had to learn about SPI. Because there is a warning about over-writing your Tow-Boot installation, if Tow-Boot was not installed on SPI. 

so...what is SPI? (more search required)

SPI Flash is a type of non-volatile memory that uses the Serial Peripheral Interface (SPI) protocol for communication. It is commonly used in embedded systems for data storage and transfer, allowing devices to retain information even when powered off. 

Apparently it is a newer (or improved, perhaps) concept, found on phones with System-On-A-Chip (SOC) architectures. 

so...how do you know if you even have SPI?

Answer: I had to figure out which version of Pinephone Pro I have. 

I finally learned that there is a Developer Edition of the Pinephone Pro, and there is a Explorer Edition. The Explorer Edition supposedly has the SPI. 

But what confused me, is that it said the phone supporting SPI had the Rockchip RK3399S SoC. And when I went into the terminal on the phone and ran "lscpu", it said I had an ARM Cortex A-53 chip. 

so...now I am thoroughly confused.

Well, I finally learned, that the Rockchip RK3399S SoC combines four Cortex-A53 cores with two Cortex-A72 cores.

hmmm, I did not see the 72 in the lscpu command I ran - but, it does look like I have the SPI.

but, how do I know that Tow-Boot was installed on the SPI, versus the eMMC? Because if I have this wrong, I can't boot an OS as there would be no bootloader partition.

I think the SPI is mmcblk1 device. And /boot is on mmcblk1p1 partition of that device.

The Manjaro (previous installation) is definitely on the eMMC, which is on mmcblk2 device, which has two partitions on it, one of them being /root.

Sunday, January 19, 2025

NUMA PreferHT VM setting on a Hyperthread-Enabled ESXi Hypervisor

This could be a long post, because things like NUMA can get complicated.

For background, we are running servers - hypervisors - that have 24 cores. There are two chips - wafers as I like to refer to them - each with 12 cores, giving a total of 24 physical cores.

When you enable hyperthreading, you get 48 cores, and this is what is presented to the operating system and cpu scheduler (somewhat - more on this later).  But - you don't get an effective doubling of cores when you enable hyperthreading. What is really happening, is that the 24 cores are "cut in half" so that another 24 cores can be "fit in", giving you 48 logical cores.  

Worth mentioning also, is that each (now half) core, has a "sibling" - and this also matters from a scheduling perspective when you see things like cpu pinning used - because if you pin something to a specific core, then that "sibling" cannot be used for something else.  For example, if you enabled hyperthreading, the cores would look like:

0 | 1

2 | 3

4 | 5

... and so on. So if someone pinned to core 4, core 5 is also "off the table" now from a scheduling perspective because pinning is a physical core concept, not a logical core concept.

So with this background, we had a tenant who wanted to enable a "preferHT" setting. This setting can be applied to an entire hypervisor by setting numa.PreferHT=1, affecting all VMs deployed on it.

Or, one can selectively add this setting to a particular or specific virtual machine by going into the Advanced Settings and configuring numa.vcpu.preferHT=TRUE.  

In our case, it was the VM setting being requested - not the hypervisor setting.  Now, this tenant is the "anchor tenant" on the platform, and their workloads are very latency sensitive. So it was important to jump through this hoop when it was requested. First, we tested the setting by powering a VM off and adding the setting, then powering the VM back on. No problems with this. We then migrated the VM to another hypervisor, and had no issues with that either. Aside of that, though, how do you know that the VM setting "took" - meaning that it was picked up and recognized?

It turns out, that there are a couple of ways to do this:

1. esxtop

When you load esxtop, it is going to show you cpu by default. But if you hit the "m" key, it goes into a "memory view". If you go into memory view by hitting "m" and then hit the "f" key, a list of fields will show up. One of them, is NUMA Statistics. So by selecting this, you get a ton of interesting information about NUMA. The settings you are most interested in, are going to be:

NHN - Current home node for the virtual machine or resource pool - in our case, this was 0 or 1 (we had two numa nodes, as there is usually one per physical cpu socket).

NMIG - Number of NUMA migrations between two snapshot samples

NRMEM - (NUMA Remote Memory): Amount of remote memory allocated to the virtual machine, in MB

NLMEM (NUMA Local Memory) - Amount of local memory allocated to the virtual machine, in MB

L%D - this shows the amount of memory that is Localized. You want this number to be 100% but seeing the number in the 90s is probably okay also because it is showing that the memory access is not traversing a NUMA bus, which adds latency.

GST_NDx (Guest Node x): Guest memory being allocated for the VM on NUMA node x, where x is the node number

MEMSZ (Memory Size): Total amount of physical memory allocated to a virtual machine

2. vmdumper command

I found this command on a blog post - which I will list in my sources at the end of this blog post. This useful command, can show you a lot of interesting information about how NUMA is working "under the hood" (in practice). It can show you a Logical Processor to NUMA Node Map, it can show you how many home nodes are utilized for a given VM, and show you the assignment of NUMA clients to the respective NUMA nodes.

One of the examples covered in this blog post refers to the situation where a VM has 12 vCPUs on a 10 core system, and then goes down and shows what it would look like if the VM had 10 vCPU instead.


Sources:

http://www.staroceans.org/ESXi_VMkernel_NUMA_Constructs.htm

https://frankdenneman.nl/2010/02/03/sizing-vms-and-numa-nodes/

https://frankdenneman.nl/2010/10/07/numa-hyperthreading-and-numa-preferht/

https://docs.pexip.com/server_design/vmware_numa_affinity.htm

https://docs.pexip.com/server_design/numa_best_practices.htm#hyperthreading

https://knowledge.broadcom.com/external/article?legacyId=2003582


 

Wednesday, January 8, 2025

MySQL Max Allowed Packet

I recently conducted an upgrade, and for the life of me I couldn't figure out why the application wouldn't initialize.

I checked MySQL - it seemed to be running fine. I logged into the database, checked the Percona cluster status, it looked fine.

I checked RabbitMQ, and it also seemed to be running fine.

In checking the application logs, I saw an exception about a query and the packet size being too big, and I thought this was strange - mainly because of the huge size of the packet.

Sure enough, after calling support, I was informed that I needed to change the MySQL configuration in my.cnf and add a directive in the [mysqld] section.

max_allowed_packet=128M

In terms of what this value should 'really' be, I was told that this is a normal setting on most installations.

Who knew? It's unusual to be adding new parameters on the fly like this to a clustered database. 

But, sure enough, after restarting the database (well, the whole VM actually because I had done updates), it came up just fine.

AI / ML - Feature Engineering - Interaction Features

I added some new macro features to my model - credit card debt, credit card delinquency, and unemployment data. Some of these were VERY infl...