End-to-end multi-horizon time series forecasting model that predicts Bitcoin hourly closing prices 24 hours ahead, built with TensorFlow from low-level primitives including custom Multi-Head Attention, custom training loop with tf.GradientTape, and Seq2Seq LSTM architecture with teacher forcing.
Completed as the final project of Dicoding's Advanced Deep Learning Project Development certification (April 2026).
Cryptocurrency price prediction is a challenging multivariate time series problem characterized by high volatility, non-stationary patterns, and complex dependencies between technical indicators. This project tackles multi-horizon forecasting — predicting 24 consecutive future values — which is harder than single-step prediction because errors compound over time.
The goal: build a Seq2Seq model that outperforms a standard LSTM+Attention baseline on this task, while demonstrating deep understanding of neural network internals by implementing key components from scratch.
- Size: 53,150 hourly records
- Features (6):
Close— target variable (Bitcoin closing price)Volume USDT— trading volume in USDTRSI— Relative Strength IndexMACD_Hist— MACD histogramATR— Average True Range (volatility)KAMAO— Kaufman's Adaptive Moving Average
- Target: predict next 24 hours of
Closevalues
Input (window_size, 6)
→ LSTM(128, return_sequences=True)
→ CustomDropout(0.2)
→ CustomMultiHeadAttention(d_model=128, heads=4)
→ CustomLayerNorm
→ LSTM(64)
→ CustomDropout(0.2)
→ CustomDense(64, relu)
→ CustomDense(24) [24-hour forecast]
ENCODER:
Input → LSTM(128) → CustomMultiHeadAttention → CustomLayerNorm → CustomDropout
└─ output encoder states + context
DECODER (autoregressive):
For each of 24 timesteps:
LSTMCell(input + prev_output)
→ CustomMultiHeadAttention (cross-attention with encoder)
→ CustomLayerNorm
→ CustomDense(1) [predict next hour]
Trained with teacher forcing during training (ground truth as next input) and autoregressive inference at test time (model output as next input).
To deepen architectural understanding beyond high-level Keras abstractions, the following were implemented using TensorFlow low-level API:
| Component | Purpose |
|---|---|
CustomDense |
Linear transformation with manual weight/bias initialization |
CustomMultiHeadAttention |
Scaled dot-product attention with multi-head parallelism |
CustomDropout |
Stochastic regularization with training/inference modes |
CustomLayerNorm |
Per-feature normalization with learnable scale & shift |
custom_mae_loss |
Horizon-weighted MAE (later timesteps weighted higher) |
CustomEarlyStopping |
Training halt when val loss stops improving |
CustomReduceLROnPlateau |
Adaptive learning rate reduction |
| Custom Training Loop | Manual tf.GradientTape forward/backward pass instead of model.fit() |
- Split-before-normalize: train/val/test split performed before MinMaxScaler fitting to prevent data leakage — scaler learns only from training distribution.
- ACF/PACF analysis: autocorrelation and partial autocorrelation plots used to determine optimal window size (input sequence length) empirically rather than arbitrarily.
- Time series decomposition: STL decomposition applied to identify trend and seasonality components in the Close price.
- tf.data pipeline: production-style
tf.data.Datasetwith windowing, batching, and prefetching instead of raw numpy arrays. - Feature engineering: rolling statistics and selected technical indicators feeding into the model.
Both models were evaluated on the held-out test set with MAE in scaled space (MinMax [0,1]):
| Model | Test MAE (scaled) | Relative Improvement |
|---|---|---|
| LSTM + Multi-Head Attention (baseline) | 0.0139 | — |
| Seq2Seq LSTM (proposed) | 0.0052 | -62% |
Target: Test MAE < 0.015 ✅ Achieved.
The Seq2Seq model produces forecasts much closer to actual prices across the 24-hour horizon, thanks to the encoder-decoder's ability to maintain context across long sequences and the teacher forcing strategy stabilizing training.
See inference plots in the notebook for visual comparison of predicted vs actual.
.
├── Muhammad_Fariz_Abizar_Submission_Akhir_DLTM.ipynb # Full pipeline notebook
├── model_baseline_LSTM.keras # Trained baseline model
├── model_seq2seq_LSTM.keras # Trained Seq2Seq model
├── best_model_seq2seq_LSTM.keras # Best Seq2Seq (by val loss)
├── requirements.txt # Python dependencies
└── README.md
git clone https://github.com/FaizarM/bitcoin-forecasting.git
cd bitcoin-forecastingpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtjupyter notebook Muhammad_Fariz_Abizar_Submission_Akhir_DLTM.ipynbThe dataset is loaded directly from a public Google Drive link inside the notebook — no manual download required.
- Deep Learning: TensorFlow 2.19, Keras
- Data Manipulation: NumPy, pandas
- Visualization: Matplotlib, Seaborn
- Statistics: statsmodels (ACF/PACF, decomposition)
- Preprocessing: scikit-learn (MinMaxScaler)
- Environment: Python 3.10+, Jupyter Notebook
- Implementing Multi-Head Attention from scratch made the paper "Attention is All You Need" tangible — understanding Q/K/V projections and the scaled dot-product beyond library abstractions.
- Custom training loops with
tf.GradientTapeexpose whatmodel.fit()does internally, giving fine-grained control over gradients, metrics, and callbacks. - Teacher forcing significantly stabilizes seq2seq training but requires careful inference-time switching to autoregressive mode.
- Horizon-weighted loss helps the model care more about later (harder) timesteps in multi-horizon forecasting.
This project was completed as the final deliverable for Dicoding — Advanced Deep Learning Project Development (Certificate No. 07Z67082JPQR, valid until April 2029).
Muhammad Fariz Abizar
Data Science undergraduate @ BINUS University Online Learning
Associate Data Scientist (BNSP Certified)
- 🔗 LinkedIn: linkedin.com/in/fariz-abizar
- 💼 GitHub: github.com/FaizarM
- 📧 Email: muhammadfarizabizar@gmail.com
If you find this project useful or learned something from it, consider giving it a ⭐ — it helps and motivates future work!