Hey everyone,
I’m working on a small volatility forecasting project for NVDA, using models like GARCH(1,1), LSTM, and Random Forest. I also combined their outputs into a simple ensemble.
Here’s the issue:
In the plot I made , the Random Forest prediction (orange line) is nearly identical to the actual realized volatility (black line). It’s hugging the true values so closely that it seems suspicious — way tighter than what GARCH or LSTM are doing.
📌 Some quick context:
The target is rolling realized volatility from log returns. RF uses features like rolling mean, std, skew, kurtosis, etc. LSTM uses a sequence of past returns (or vol) as input. I used ChatGPT and Perplexity to help me build this — I’m still pretty new to ML, so there might be something I’m missing. I tried to avoid data leakage and used proper train/test splits.
My question:
Why is the Random Forest doing so well? Could this be data leakage? Overfitting? Or do tree-based models just tend to perform this way on volatility data?
Would love any tips or suggestions from more experienced folks 🙏
submitted by /u/ASP_RocksS to r/learnmachinelearning
[link] [comments]
Laisser un commentaire