I added some new features (metrics) to my model. The Quarterly model.
To recap, I have downloaded quarterly statements for stock symbols, and I use these to calculate an absolute slew of metrics and ratios. Then I feed them into the XGBoost regression model, to figure out whether they can predict a forward return of stock price.
I added some macro economic indicators, because I felt that those might impact the quarterly price of a stock (short term) more than the pure fundamentals of the stock.
The fundamentals are used in an annual model - a separate model - and in that model, the model is not distracted or interrupted with "events" or macroeconomics that get in the way of understanding the true health of a company based on fundamentals over a years-long period of time.
So - what did I add to the quarterly model?
- Consumer Sentiment
- Business Confidence
- Inflation Expectations
- Treasury Data (1,3,10 year)
- Unemployment
And wow - did these variables kick in. At one point, I had the model up to .16.
Unemployment did nothing, actually. And I wound up removing it as a noise factor. I also realized I had the fiscal quarter included, and removed that too since it, like sector and other descriptive variables, should not be in the model.
But - as I was about to put a wrap on it, I decided to do one more "push" to improve the R-squared value, and started fiddling around. I got cute, adding derived features. One of the things I did, was to add lag features for business confidence, consumer sentiment, inflation expectations. Interestingly, two of these shot to the top of influential metrics.
Feature Importance List Sorted by Importance (return price influence).
feature weight
business_confidence_lag1 0.059845
inflation_lag1 0.054764
But, others were a bust, with .00000 values.
I tried removing the original metrics and JUST keeping the lags - didn't really help.
Another thing worth noting, is that I added SHAP values - a topic I will get into more depth about shortly, perhaps in a subsequent post. SHAP (SHapley Additive exPlanations) is a method used to explain the output of machine learning models by
assigning each feature an importance value for a specific prediction, so that models - like so many - are not completely "black box".
But one thing I noticed when I added the SHAP feature list, is that it does NOT match / line up with the feature importances that the XGBoost model espouses.
So I definitely need to look into this.
No comments:
Post a Comment