Oxford Engineering, B1 Sustainable Computing Mini Project (3rd Year) Aaron Rose
The brief: forecast the UK grid's hourly carbon intensity (gCO₂e/kWh) for 2025 from historical data (2009 to 2024), using both a non-ML and an ML method, and discuss the trade-offs. The full write-up is in report/Sustainable_Computing_Report.pdf.
For a forecast timestamp, take all historical points sharing the same (month, week-of-month ±1, day-of-week, hour) and compute a weighted mean, with recent years weighted more heavily to track the UK's decarbonisation trend (445 → 124 gCO₂e/kWh from 2009 to 2024).
So the 2024 weight is 10, 2023 is 5, 2022 is 2.5, down to about 0.01 for 2009. If no exact pattern match exists, fall back to month + hour, then hour only, then the global mean.
A regression tree ensemble trained on 22 engineered features. The continuous calendar variables are encoded cyclically so that, for example, 23:00 and 00:00 sit next to each other in feature space:
applied to hour (
Boosted trees regress toward the conditional mean and under-predict variance, so predictions are passed through a variance-calibration step:
This rescales residuals about the mean to match the actual standard deviation, accepting a small MAE penalty for a much more realistic distribution.
Top: daily-averaged hourly forecast across 2025. Bottom: monthly forecast.
- The carbon intensity is really non-stationary, which brings challenges. The grid has changed a lot over the years as we've moved off coal and got greener, so 2009 data isn't really telling you about 2025.
- Both methods end up over-predicting because of this. The grid has decarbonised faster than the history suggests it should.
- The XGBoost model predicts a bit too flat. Tree models tend to pull toward the average, so the forecast misses the highs and lows.
- The weighted average sometimes can't find an exact match for a given hour, so it has to fall back to a coarser pattern.
- A lot of the hour-to-hour swings come from things like wind ramping up or a gas plant coming online, and you can't really predict that from history alone. You'd need live weather and grid data.
- Interestingly, the simple weighted average actually beats XGBoost on MAE. Most of the signal is just in month, day-of-week, and hour, which the simple method captures directly.
Carbon-Intensity-Forecasting/
├── data/ # 2009 to 2024 train, 2025 test (half-hourly CSV)
├── non_ml_forecast/ # weighted-average scripts
├── ml_forecast/ # XGBoost + variance calibration
├── plots/ # the two figures shown above
└── report/ # full PDF write-up
pip install numpy pandas matplotlib xgboost scikit-learn
python non_ml_forecast/non_ml_forecast.py
python ml_forecast/xgb_forecast.py
