Wednesday, July 9, 2025

AI / ML - Feature Engineering - Interaction Features

I added some new macro features to my model - credit card debt, credit card delinquency, and unemployment data.

Some of these were VERY influential features.

So we can see that unemployment_rate is an important feature! It tops the list!!!

But - since we are doing relative scoring on stocks, what good does that do us, if every single stock sees the same macro values???

The answer: Interaction Features. 

Since Unemployment can impact revenue growth (less consumers can afford to buy), you multiply the Revenue Growth Year-Over-Year percentage, but the unemployment. Now, you get a UNIQUE value that works for that specific stock symbol instead of just throwing "across the board" metrics at every stock. 

Now, if you don't do this, the macro variables in and of themselves CAN impact a model, especially if a stock's forward return is sensitive to that feature. That is what XGBoost gives you. But you help the correlation by giving everyone a uniquely calculated impact, as opposed to giving everyone a value that equals "X.Y".

I did this, and got my latest high score on R-Squared.
Selected 30 features out of 97 (threshold = 0.007004095707088709)
⭐ New best model saved with R²: 0.4001

Pruned XGBoost Model R² score: 0.4001
Pruned XGBoost Model RMSE: 0.3865
Pruned XGBoost Model MAE: 0.2627

Full XGBoost R² score: 0.3831
Full XGBoost RMSE 0.3919
Full XGBoost MAE: 0.2694




 

No comments:

AI / ML - Feature Engineering - Interaction Features

I added some new macro features to my model - credit card debt, credit card delinquency, and unemployment data. Some of these were VERY infl...