Saturday, July 5, 2025

AI / ML - Here is Why You Backtest

My model was working nicely. 

It scored stocks on a number of fronts (pillars).

It used Profitability. It used Solvency. It used Liquidity. It used Efficiency.

These are the "four horseman" of stock evaluation.

I added some of my own twists to the "grading formula", in the form of macro variables (consumer sentiment, business confidence, et al).  I also had some trend analysis, rewarding trends up and penalizing trends down. I rewarded (and penalized) profitability, cash flow, etc. I had scaling done correctly, too in order to ensure a "fair playing field", and also some sector normalization as well.

When I ran the model, using XGBoost to predict 1-year forward return, the stocks at the top of the report looked great when I spot-checked them against various sites that also grade out stocks. I felt good. The r-squared I was getting from XGBoost and a SHAP-pruned feature run was at academic levels (as high as .46 at one point).

As part of some final QA, I ran the resultant code through AI engines which praised the thoroughness, and slapped me on the back reassuring me that my model was on a par with, if not superior to, many academic models.

Then - someone asked me if this model has this been back-tested. 
And the answer was no.  I had not back-tested it up to that point. I didn't think I was ready for back-testing. 

Maybe back-testing is an iterative "continual improvement" process that should be done much earlier in the process, to ensure you don't go down the wrong road.  But I didn't do that.

So, I ran a back-test. And to my horror, the model was completely "upside down" in terms of stocks that would predict forward return.  The AI engines suggested I simply "flip the sign" on my score and invert them. But that didn't feel right. It felt like I was trying to force a score.  

So - the first thing we did, was evaluate the scoring. We looked at correlation between individual scoring pillars and forward return. Negative.

We then looked at correlation in more detail.

First, we calculated Pearson (row-level) and Spearman (rank-level). correlations.

They were negative.

Then, we calculated Average Fwd Return by Score Decile. Sure enough, there was a trend, but completely backwards from what one would expect. 

Quality stocks with scores of 9,8,7,6,5 had negative values that improved as the decile dropped, while the shaky stocks (0,1,2,3,4) had graduated positive values.

The interesting analysis, was a dump of the correlations of each individual pillar to fwd return. The strongest were Profitability and Valuation, followed by MacroBehavior (macroeconomic features) but these were not strong. And the correlations were slightly negative, a couple slightly above zero positive.

But - one was VERY interesting. A log1p correlation between the "final composite score" to forward return that was noticeable if not sizable - but negative.

We experimented with commenting out the penalties, so we could focus on "true metrics" (a flag was engineered in to turn these off which made it easy to test). Re-ran the model, STILL the correlations with forward return were negative.

Then - we decided to remove individual pillars. Didn't change a thing. STILL the correlations with forward return were negative.

Finally, after the AI ensured me - after reviewing the code - that there were no scoring errors, the only thing left to try, aside of shelving the model for lack of success in predicting forward return, was to in fact put a negative sign on the score to invert it and "flip the score".

I did this. And, while the companies that bubbled to the top were shaky on their fundamentals, I did see cases where Analyst Ratings on these stocks were above (and in some cases way above) the current stock price.  

So here is the evidence that we have a model that IS predicting forward return, in a real way.

So - in conclusion. Quality does NOT necessarily equate to forward return.

What does??? Well, nothing in those pillars individually. But - when you combined all of these metrics/features into a big pot, and send them to a sophisticated regression modeler, it does find a magical combination that can predict a relationship with forward return that is linear, and depending on whether you flip that line one way or another, you can theoretically gain - or lose - a return on your money.

Now, if we had put money into those "great stocks" at the top of that prior list, and then had to watch as we lost money, it would have been puzzling and frustrating. But - do we have the courage to put money into these less-than-stellar fundamental stocks to see if this model is right, and that we WILL get a positive forward return? 

I guess it takes some experimentation. Either a simulator, OR, put $X into the top ten and another $X into the bottom ten and see how the perform. Which is what I might be doing shortly. 


No comments:

AI / ML - Here is Why You Backtest

My model was working nicely.  It scored stocks on a number of fronts (pillars). It used Profitability. It used Solvency. It used Liquidity. ...