I really felt that news would be a great thing to add to the model. But the problem with news, is that news is recent, and this data I am using with XGBoost is historical time-series data.
If you added news, what would you do - cram the values only into the most recent year?
I think if you go with stuff that is changing and close to real-time, you need to re-think the whole model including the type of model. Maybe news works better with a Transformer model or a LSTM Neural Network model than a predictive regression model.
So - I am running out of new things to add to my model to try and boost the predictability of it (increase the R-squared).
Then I came up with the idea of adding earnings hits, misses and meets. A quick consult with an LLM suggested using an earnings_surprise score, so that not only do we get the misses/meets/beats counts but also we capture the magnitude. A great idea.
I implemented this, and lo and behold, the earnings_surprise score moves the needle. Substantially and consistently.
The best thing about this, is that the earnings_surprise score is symbol-specific, and so it is not some macro feature I have to figure out how to interact with the symbol data.
No comments:
Post a Comment