Fixing a Train/Serve Skew in Sentiment Residuals
The signal generation process uses a technique called sentiment residualization — essentially, we remove the portion of the FinBERT sentiment score that can be explained by price momentum alone, leaving behind only the genuine sentiment surprise. A stock that has been running up for 20 days will naturally attract positive news coverage, so we want to isolate the sentiment signal that exists above and beyond what the price action would predict.
The problem was subtle. During training, the residual model was fitted on tens of thousands of rows spanning months of history. But at inference time, the same calculation was being refitted fresh each day on whatever small universe of stocks passed the daily filters — typically around 30 stocks. That's a very different statistical population, which meant the sentiment residuals being fed into the composite ranking signal weren't quite the same thing the model had learned from during training. Classic train/serve skew.
The fix was straightforward — serialize the residual model coefficients to disk at the end of each training run and load those fixed coefficients at inference time rather than refitting. Now the definition of sentiment surprise is consistent from training through to live signal generation.
No comments:
Post a Comment