Skip to content

FaizarM/bitcoin-forecasting

Repository files navigation

Bitcoin Hourly Price Forecasting — Multi-Horizon Seq2Seq LSTM

End-to-end multi-horizon time series forecasting model that predicts Bitcoin hourly closing prices 24 hours ahead, built with TensorFlow from low-level primitives including custom Multi-Head Attention, custom training loop with tf.GradientTape, and Seq2Seq LSTM architecture with teacher forcing.

Completed as the final project of Dicoding's Advanced Deep Learning Project Development certification (April 2026).


🎯 Problem Statement

Cryptocurrency price prediction is a challenging multivariate time series problem characterized by high volatility, non-stationary patterns, and complex dependencies between technical indicators. This project tackles multi-horizon forecasting — predicting 24 consecutive future values — which is harder than single-step prediction because errors compound over time.

The goal: build a Seq2Seq model that outperforms a standard LSTM+Attention baseline on this task, while demonstrating deep understanding of neural network internals by implementing key components from scratch.


📊 Dataset

  • Size: 53,150 hourly records
  • Features (6):
    • Close — target variable (Bitcoin closing price)
    • Volume USDT — trading volume in USDT
    • RSI — Relative Strength Index
    • MACD_Hist — MACD histogram
    • ATR — Average True Range (volatility)
    • KAMAO — Kaufman's Adaptive Moving Average
  • Target: predict next 24 hours of Close values

🏗️ Architecture

Baseline: LSTM + Multi-Head Attention

Input (window_size, 6) 
  → LSTM(128, return_sequences=True) 
  → CustomDropout(0.2) 
  → CustomMultiHeadAttention(d_model=128, heads=4) 
  → CustomLayerNorm 
  → LSTM(64) 
  → CustomDropout(0.2) 
  → CustomDense(64, relu) 
  → CustomDense(24)  [24-hour forecast]

Proposed: Seq2Seq LSTM (Encoder-Decoder with Teacher Forcing)

ENCODER:
  Input → LSTM(128) → CustomMultiHeadAttention → CustomLayerNorm → CustomDropout
         └─ output encoder states + context

DECODER (autoregressive):
  For each of 24 timesteps:
    LSTMCell(input + prev_output) 
      → CustomMultiHeadAttention (cross-attention with encoder)
      → CustomLayerNorm 
      → CustomDense(1)  [predict next hour]

Trained with teacher forcing during training (ground truth as next input) and autoregressive inference at test time (model output as next input).


🔧 Custom Components (Built From Scratch)

To deepen architectural understanding beyond high-level Keras abstractions, the following were implemented using TensorFlow low-level API:

Component Purpose
CustomDense Linear transformation with manual weight/bias initialization
CustomMultiHeadAttention Scaled dot-product attention with multi-head parallelism
CustomDropout Stochastic regularization with training/inference modes
CustomLayerNorm Per-feature normalization with learnable scale & shift
custom_mae_loss Horizon-weighted MAE (later timesteps weighted higher)
CustomEarlyStopping Training halt when val loss stops improving
CustomReduceLROnPlateau Adaptive learning rate reduction
Custom Training Loop Manual tf.GradientTape forward/backward pass instead of model.fit()

📈 Methodology Highlights

  • Split-before-normalize: train/val/test split performed before MinMaxScaler fitting to prevent data leakage — scaler learns only from training distribution.
  • ACF/PACF analysis: autocorrelation and partial autocorrelation plots used to determine optimal window size (input sequence length) empirically rather than arbitrarily.
  • Time series decomposition: STL decomposition applied to identify trend and seasonality components in the Close price.
  • tf.data pipeline: production-style tf.data.Dataset with windowing, batching, and prefetching instead of raw numpy arrays.
  • Feature engineering: rolling statistics and selected technical indicators feeding into the model.

📉 Results

Both models were evaluated on the held-out test set with MAE in scaled space (MinMax [0,1]):

Model Test MAE (scaled) Relative Improvement
LSTM + Multi-Head Attention (baseline) 0.0139
Seq2Seq LSTM (proposed) 0.0052 -62%

Target: Test MAE < 0.015 ✅ Achieved.

The Seq2Seq model produces forecasts much closer to actual prices across the 24-hour horizon, thanks to the encoder-decoder's ability to maintain context across long sequences and the teacher forcing strategy stabilizing training.

See inference plots in the notebook for visual comparison of predicted vs actual.


📁 Repository Structure

.
├── Muhammad_Fariz_Abizar_Submission_Akhir_DLTM.ipynb  # Full pipeline notebook
├── model_baseline_LSTM.keras                           # Trained baseline model
├── model_seq2seq_LSTM.keras                            # Trained Seq2Seq model  
├── best_model_seq2seq_LSTM.keras                       # Best Seq2Seq (by val loss)
├── requirements.txt                                    # Python dependencies
└── README.md

🚀 How to Reproduce

1. Clone the repository

git clone https://github.com/FaizarM/bitcoin-forecasting.git
cd bitcoin-forecasting

2. Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Run the notebook

jupyter notebook Muhammad_Fariz_Abizar_Submission_Akhir_DLTM.ipynb

The dataset is loaded directly from a public Google Drive link inside the notebook — no manual download required.


🛠️ Tech Stack

  • Deep Learning: TensorFlow 2.19, Keras
  • Data Manipulation: NumPy, pandas
  • Visualization: Matplotlib, Seaborn
  • Statistics: statsmodels (ACF/PACF, decomposition)
  • Preprocessing: scikit-learn (MinMaxScaler)
  • Environment: Python 3.10+, Jupyter Notebook

📚 Key Learnings

  • Implementing Multi-Head Attention from scratch made the paper "Attention is All You Need" tangible — understanding Q/K/V projections and the scaled dot-product beyond library abstractions.
  • Custom training loops with tf.GradientTape expose what model.fit() does internally, giving fine-grained control over gradients, metrics, and callbacks.
  • Teacher forcing significantly stabilizes seq2seq training but requires careful inference-time switching to autoregressive mode.
  • Horizon-weighted loss helps the model care more about later (harder) timesteps in multi-horizon forecasting.

📜 Certification

This project was completed as the final deliverable for Dicoding — Advanced Deep Learning Project Development (Certificate No. 07Z67082JPQR, valid until April 2029).


👤 Author

Muhammad Fariz Abizar
Data Science undergraduate @ BINUS University Online Learning
Associate Data Scientist (BNSP Certified)


If you find this project useful or learned something from it, consider giving it a ⭐ — it helps and motivates future work!

About

Multi-horizon Bitcoin price forecasting using Seq2Seq LSTM with custom Multi-Head Attention and training loop built from scratch using TensorFlow low-level API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors