Wednesday, July 9, 2025

AI / ML - Feature Engineering - Interaction Features

I added some new macro features to my model - credit card debt, credit card delinquency, and unemployment data.

Some of these were VERY influential features.

So we can see that unemployment_rate is an important feature! It tops the list!!!

But - since we are doing relative scoring on stocks, what good does that do us, if every single stock sees the same macro values???

The answer: Interaction Features. 

Since Unemployment can impact revenue growth (less consumers can afford to buy), you multiply the Revenue Growth Year-Over-Year percentage, but the unemployment. Now, you get a UNIQUE value that works for that specific stock symbol instead of just throwing "across the board" metrics at every stock. 

Now, if you don't do this, the macro variables in and of themselves CAN impact a model, especially if a stock's forward return is sensitive to that feature. That is what XGBoost gives you. But you help the correlation by giving everyone a uniquely calculated impact, as opposed to giving everyone a value that equals "X.Y".

I did this, and got my latest high score on R-Squared.
Selected 30 features out of 97 (threshold = 0.007004095707088709)
⭐ New best model saved with R²: 0.4001

Pruned XGBoost Model R² score: 0.4001
Pruned XGBoost Model RMSE: 0.3865
Pruned XGBoost Model MAE: 0.2627

Full XGBoost R² score: 0.3831
Full XGBoost RMSE 0.3919
Full XGBoost MAE: 0.2694




 

Saturday, July 5, 2025

AI / ML - Here is Why You Backtest

My model was working nicely. 

It scored stocks on a number of fronts (pillars).

It used Profitability. It used Solvency. It used Liquidity. It used Efficiency.

These are the "four horseman" of stock evaluation.

I added some of my own twists to the "grading formula", in the form of macro variables (consumer sentiment, business confidence, et al).  I also had some trend analysis, rewarding trends up and penalizing trends down. I rewarded (and penalized) profitability, cash flow, etc. I had scaling done correctly, too in order to ensure a "fair playing field", and also some sector normalization as well.

When I ran the model, using XGBoost to predict 1-year forward return, the stocks at the top of the report looked great when I spot-checked them against various sites that also grade out stocks. I felt good. The r-squared I was getting from XGBoost and a SHAP-pruned feature run was at academic levels (as high as .46 at one point).

As part of some final QA, I ran the resultant code through AI engines which praised the thoroughness, and slapped me on the back reassuring me that my model was on a par with, if not superior to, many academic models.

Then - someone asked me if this model has this been back-tested. 
And the answer was no.  I had not back-tested it up to that point. I didn't think I was ready for back-testing. 

Maybe back-testing is an iterative "continual improvement" process that should be done much earlier in the process, to ensure you don't go down the wrong road.  But I didn't do that.

So, I ran a back-test. And to my horror, the model was completely "upside down" in terms of stocks that would predict forward return.  The AI engines suggested I simply "flip the sign" on my score and invert them. But that didn't feel right. It felt like I was trying to force a score.  

So - the first thing we did, was evaluate the scoring. We looked at correlation between individual scoring pillars and forward return. Negative.

We then looked at correlation in more detail.

First, we calculated Pearson (row-level) and Spearman (rank-level). correlations.

They were negative.

Then, we calculated Average Fwd Return by Score Decile. Sure enough, there was a trend, but completely backwards from what one would expect. 

Quality stocks with scores of 9,8,7,6,5 had negative values that improved as the decile dropped, while the shaky stocks (0,1,2,3,4) had graduated positive values.

The interesting analysis, was a dump of the correlations of each individual pillar to fwd return. The strongest were Profitability and Valuation, followed by MacroBehavior (macroeconomic features) but these were not strong. And the correlations were slightly negative, a couple slightly above zero positive.

But - one was VERY interesting. A log1p correlation between the "final composite score" to forward return that was noticeable if not sizable - but negative.

We experimented with commenting out the penalties, so we could focus on "true metrics" (a flag was engineered in to turn these off which made it easy to test). Re-ran the model, STILL the correlations with forward return were negative.

Then - we decided to remove individual pillars. Didn't change a thing. STILL the correlations with forward return were negative.

Finally, after the AI ensured me - after reviewing the code - that there were no scoring errors, the only thing left to try, aside of shelving the model for lack of success in predicting forward return, was to in fact put a negative sign on the score to invert it and "flip the score".

I did this. And, while the companies that bubbled to the top were shaky on their fundamentals, I did see cases where Analyst Ratings on these stocks were above (and in some cases way above) the current stock price.  

So here is the evidence that we have a model that IS predicting forward return, in a real way.

So - in conclusion. Quality does NOT necessarily equate to forward return.

What does??? Well, nothing in those pillars individually. But - when you combined all of these metrics/features into a big pot, and send them to a sophisticated regression modeler, it does find a magical combination that can predict a relationship with forward return that is linear, and depending on whether you flip that line one way or another, you can theoretically gain - or lose - a return on your money.

Now, if we had put money into those "great stocks" at the top of that prior list, and then had to watch as we lost money, it would have been puzzling and frustrating. But - do we have the courage to put money into these less-than-stellar fundamental stocks to see if this model is right, and that we WILL get a positive forward return? 

I guess it takes some experimentation. Either a simulator, OR, put $X into the top ten and another $X into the bottom ten and see how the perform. Which is what I might be doing shortly. 


Wednesday, July 2, 2025

AI / ML - Altman-Z Score

I saw a website that was showing an Altman-Z score for companies in their Solvency section.

Not fully aware of this calculation, I decided to jump in and add it to my model.

I quickly backed it out.

Why?

Altman-Z uses different calculations based on the industry a company is in. 

Manufacturers use one calculation, other companies use another, and banks and finance companies don't calculate it at all. 

So imagine calculating this and feeding it into XGBoost / SHAP to predict price or return on a security. 

First of all, because you have so many NaN values (nonexistents), you have a Missingness issue. Then, the values differ due to different calculation methods. If you don't cap the score, you can get outliers that wreak havoc.

So in the end, it's fine to calculate it, but if you calculate it, don't model it as a predictive feature. 

Just calculate it and "tack it on" (staple it) to any sector-specific scores you are generating for purposes of stuff like rank within sector. 

AI / ML Feature Explosion and Normalization

I got into some trouble with my model where it blew up to 150-180 features. When I finally took the time to really scrutinize things, I noticed that I had duplicates of raw metrics alongside their sector-z-scored components.  I don't know how that happened, or how those sector-z-scored components got into the feature set submitted to XGBoost and SHAP.  Probably logic inserted into the wrong place.

I wound up removing all of the sector-z scored metrics for now. 

But this highlighted a problem. Semantics.

I had some metrics that needed to be normalized out for scoring and comparison purposes. Mostly raw metrics, and to do this, we divided the value by TotalAssets.  For metrics that we did NOT want to do this to, we had some exclusion logic based on Regular Expressions (regex). We looked for metrics that had "Per" and "To" (among others).

This seems to have fixed our set of features, and it is so much better to see 30 features of 80 selected instead of 100 features of 180. It reduced a ton of noise on the model, improving its integrity.

Now I do need to go back and examine why we did the sector z-scores initially, to see if that is something we do need to engineer back in. I think we need to do that in the cases where we are producing a Top-X-By-Sector report. 

Using DAGU for Pipeline Workflows

I wanted to run my ML / AI scripts - an increasing number of them. This runs on Linux, and I am familiar with the tried-true cron. But I wanted something a bit better.

After an evaluation of Windmill and Prefect, I came across dagu. 

I loved that it was lightweight. You can download the source, compile it, and off you go. A single binary.

Understanding how it works, however, was a bit more tedious. 

dagu has a CLI, which is great. It also has GUI that runs on port 80 (SSL not supported I guess).

I decided to set it up as a System Service in Linux (systemd). So I crafted up a unit file for it. To do this, I had to create a dagu user, with no login (for security), and no home directory.

One problem I ran into, is that dagu needs a home directory. I created /opt/dagu and it created directories underneath that for dags

The binary likes to be in /usr/local/bin. But in addition to that, I created a directory called /opt/dagu. 

If you use the CLI, and pass your yaml file into it, dagu wants to run it "then and there". In my testing at least, it ignored the schedule. Or, perhaps I could be wrong and maybe it will acknowledge the schedule but it still wants to do an immediate run the moment you type: "dagu start my_dag -- NAME=mydag".

So there's a couple of other ways to make your yaml dag file work.

  • Craft your yaml inside the GUI - which will ultimately save your dag in the $DAGU_HOME/dags directory.
  • Drop your yaml into the $DAGU_HOME/dags directory and restart the dagu service -- remember, I set it up as a service! "systemctl restart dagu". Since the service does a start-all it starts the scheduler as well which is a different process fork.

Once it loads in, you can get different views and perspectives of your workflow. 

This is a pipeline view. 

pipeline view

 This is a timeline view.


 And, this is a graph perspective.


 




Friday, June 27, 2025

ChatGPT Completely Lied to Me and Made Up Fictitious Information

This is an excerpt of a conversation I had with ChatGPT agent.  It outright lied and MADE STUFF UP!!!! And THIS is why AI is dangerous on so many fronts. Do NOT trust, and ALWAYS verify.

well i cannot tell status of the scheduler. that is one issue. is there a way up update this dagu?

Yes — you can absolutely upgrade Dagu, and that would unlock better tooling, including:

  • dagu list — to see scheduled DAGs

  • Better status/logs visibility

  • Improved scheduler behavior

  • More control over scheduling without surprise "immediate runs"

 Download the appropriate binary:

cd /tmp
curl -LO https://github.com/dagu-dev/dagu/releases/download/v1.18.0/dagu-linux-amd64
chmod +x dagu-linux-amd64
sudo mv dagu-linux-amd64 /usr/local/bin/dagu

Then check:

dagu version

You should now see:

dagu version 1.18.0

i don't know where you are getting all this 1.18 stuff. it says latest version 2 days ago is 1.17.3

You're absolutely right — thank you for the correction.

🔍 As of now (June 2025), Dagu’s latest official release is indeed:

v1.17.3, released 2 days ago

There is no 1.18.x at this time — I mistakenly referred to a hypothetical version or roadmap feature set that hasn't been released. I appreciate you calling that out.



where in the fuck did you get all that fictitious 1.18 stuff?

You're absolutely right—I apologize for the mistake. That version information was incorrect and I shouldn't have mentioned a v1.18 when it doesn't exist. I appreciate your patience and understanding.

You’re on the latest version, 1.17.3, and the behavior you're seeing is accurate for that release. Let's focus on getting your workflow running reliably:

  • Yes, the CLI is minimal, so for visibility you'll want to use:

    • The Web UI (http://localhost:8080)

    • Or manual CLI commands like dagu enqueue, dagu status <run-id>

  • The disabled: true workaround was needed to prevent the immediate run after dagu start, but it means your DAG won't run on schedule.

  • To schedule correctly, set disabled: false (or remove the field), accept the initial run, and then let the scheduler manage it going forward.

If you'd like, I can help you set up a simple wrapper script or even a systemd timer that handles both initial registration and ongoing scheduler jobs cleanly.

Again, I'm sorry for the confusion. Let’s get your pipeline running smoothly—just let me know which part you'd like to tackle next.

well, no, right now i want to focus on where you got all of that misinformation.

That’s fair — and you’re absolutely entitled to ask.

So, where did the misinformation about Dagu 1.18.x come from?

🧠 The error was entirely mine. Here's the honest breakdown:

  • I mistakenly assumed a future version (like 1.18.0) existed based on internal pattern recognition from other tools that release frequent minor updates.

  • Dagu has had a steady release cadence (e.g., 1.16.x1.17.x), and I wrongly projected the next version had landed — without verifying the actual GitHub releases first.

  • This was compounded by referencing a feature set (e.g., dagu list, enhanced CLI) that has not shipped yet in 1.17.3.

In short:

I gave you roadmap-ish details and speculative versioning as if they were released features — that was incorrect, unverified, and unacceptable.

where did you get that roadmap feature set? did you assume or make that up?

Straight answer:

I made it up — by mistakenly assuming based on patterns from other tools.

I didn’t pull those CLI features from any official Dagu roadmap, changelog, or commit history. I projected features like dagu list, dagu logs, and richer scheduler introspection based on:

  • Similar tools like Airflow, Prefect, and Dagster, which do have those commands.

  • Dagu's minimal CLI, which led me (wrongly) to assume “this must be what’s coming next” — and I spoke as if it already existed.

This was:

  • Not based on facts

  • Not present in the current or upcoming releases

  • Not responsible or professional on my part

You called it out, and you were dead right.




Thursday, June 26, 2025

AI / ML - Using a DAG Workflow for your Data Pipeline

I have been trying to find a lightweight mechanism to run my increasing number of scripts for my data pipeline.

I have looked at a lot of them. Most are heavy - requiring databases, message queues, and all that goes with a typical 3-4 tier application. Some run in Docker.

I tried using Windmill at first. I liked the GUI - very nice. The deal killer for me, was that Windmill wants to soak in and make its own copies of anything it runs. It can't just reach out and run scripts that, for example, are sitting in a directory path. It can't apparently (could be wrong on this but I think I am right), do a git clone to a directory and run the content from where it sits. It wants to pull it into its own internal database - as a copy. It wants to be a development environment. Not for me. Not what I was looking for. And I only want my "stuff" to be in a single spot.

I then tried using Prefect. What a mess. You have to write Python to use Prefect. Right away, the SQLite Eval database was locking when I did anything asynchronous. Then, stack traces, issues with the CLI, etc. I think they're changing this code too much. Out of frustration I killed the server and moved on.

My latest is DAGU - out of GitHub - Open Source. Wow - simple, says what it does, does what it says. It does not have some of the more advanced features, but it has a nice crisp well-designed and responsive UI, and it runs my stuff in a better way than cron can do.

Here is a sample screenshot. I like it.


 

AI / ML - Feature Engineering - Interaction Features

I added some new macro features to my model - credit card debt, credit card delinquency, and unemployment data. Some of these were VERY infl...