Why is my Random Forest forecast almost identical to the target volatility?

Hey everyone,

I’m working on a small volatility forecasting project for NVDA, using models like GARCH(1,1), LSTM, and Random Forest. I also combined their outputs into a simple ensemble.

Here’s the issue:
In the plot I made , the Random Forest prediction (orange line) is nearly identical to the actual realized volatility (black line). It’s hugging the true values so closely that it seems suspicious — way tighter than what GARCH or LSTM are doing.

📌 Some quick context:

The target is rolling realized volatility from log returns. RF uses features like rolling mean, std, skew, kurtosis, etc. LSTM uses a sequence of past returns (or vol) as input. I used ChatGPT and Perplexity to help me build this — I’m still pretty new to ML, so there might be something I’m missing. I tried to avoid data leakage and used proper train/test splits.

My question:
Why is the Random Forest doing so well? Could this be data leakage? Overfitting? Or do tree-based models just tend to perform this way on volatility data?

Would love any tips or suggestions from more experienced folks 🙏

submitted by /u/ASP_RocksS to r/learnmachinelearning
[link] [comments]


Commentaires

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *