Skip to content

useraya/timeseries-alignment-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Universal Time-Series Alignment Tool

Python Streamlit Live Demo License

Upload any two time-series datasets and instantly discover the lead-lag relationship between them automatically.


What it does

Most analysts spend hours manually testing time shifts between datasets. Does copper price lead industrial output by 6 weeks? Does web traffic predict sales 3 days ahead? This tool automates that entirely.

You load two datasets , the app finds the exact lag where they correlate best, and shifts the charts so you can see the relationship visually.


Live demo

https://timeseries-alignment-tool.streamlit.app/

No installation needed just open and use.


Features

  • Multiple input methods — CSV upload, Excel upload, manual row-by-row entry, or paste raw text
  • Auto date detection — scans every column and identifies the date column automatically
  • Smart resampling — aligns two datasets with different frequencies (daily vs weekly, monthly vs daily, etc.)
  • ADF stationarity test — runs Augmented Dickey-Fuller on each series and auto-differences if needed
  • Full CCF analysis — tests every lag from -N to +N and measures Pearson correlation at each step
  • Significance filtering — flags results with p > 0.05 to prevent false positives
  • Interactive charts — raw series, CCF bar chart with significance thresholds, and aligned visualization
  • CSV export — download the full cross-correlation function results

How it works

Load two datasets (CSV, Excel, manual, or paste)
            |
Auto-detect date column
            |
Align on common time period (resample if needed)
            |
ADF test — auto-difference if non-stationary
            |
Compute CCF for all lags from -N to +N
            |
Find lag with highest |r| + p-value check
            |
Display aligned series and CCF chart

The ADF test — stationarity check

The Augmented Dickey-Fuller test checks whether a time series is stable around a fixed mean or whether it drifts over time.

A series that keeps trending upward (like a stock price) is non-stationary. If two series both trend upward for completely unrelated reasons, the math will say they are highly correlated which is misleading. The ADF test catches this. If a series fails the test (p-value > 0.05), the app automatically computes the differences between consecutive values instead of using the raw values. This removes the trend and makes the correlation honest.

Simple rule: p-value below 0.05 means the series is stable and ready to use as-is. p-value above 0.05 means the series drifts and needs differencing.

The CCF — finding who leads who

The Cross-Correlation Function shifts one series forward and backward in time and measures how similar the two series look at each shift.

For each lag from -N to +N it computes a Pearson correlation coefficient between -1 and +1:

  • +1 means the two series are identical at that shift
  • 0 means no relationship
  • -1 means they move in opposite directions

The app picks the lag where the absolute correlation is highest. A positive lag means Dataset A leads Dataset B. A negative lag means Dataset B leads Dataset A.

Example: lag = +7 with r = 0.85 means Dataset A predicts Dataset B 7 periods in advance with 85% similarity.


Installation

# 1. Clone the repo
git clone https://github.com/useraya/timeseries-alignment-tool.git
cd timeseries-alignment-tool

# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate        # macOS / Linux
venv\Scripts\activate           # Windows

# 3. Install dependencies
pip install -r requirements.txt

# 4. Run the app
streamlit run app.py

The app opens at http://localhost:8501


Project structure

timeseries-alignment-tool/
├── app.py                  <- Streamlit entry point (all logic in one file)
├── requirements.txt        <- Python dependencies
├── README.md
└── data/                   <- Sample datasets (optional)

Input formats

Method Details
CSV upload Any separator, date column auto-detected
Excel upload .xlsx format, first sheet used
Manual entry Row-by-row date and value inputs in the UI
Paste text One row per line in date,value format

Date formats supported: YYYY-MM-DD, DD/MM/YYYY, MM-DD-YYYY, Unix timestamps, and most standard formats (auto-parsed by pandas).


Usage example

Use case: Does Google Trends search volume for "inflation" predict CPI changes 4 weeks later?

  1. Upload google_trends_inflation.csv as Dataset A
  2. Upload cpi_monthly.csv as Dataset B
  3. Set max lag to 12 in the sidebar
  4. The app finds lag = +4 with r = 0.72, p < 0.001
  5. Result: search volume leads CPI by 4 weeks

Requirements

streamlit>=1.32.0
pandas>=2.0.0
scipy>=1.11.0
statsmodels>=0.14.0
plotly>=5.18.0
numpy>=1.26.0
openpyxl>=3.1.0

Python 3.10 or higher recommended.


Deployment

The app is deployed on Streamlit Community Cloud (free tier).

To deploy your own fork:

  1. Push the repo to GitHub (must be public for the free tier)
  2. Go to share.streamlit.io
  3. Connect your GitHub account
  4. Select the repo, branch main, file app.py
  5. Click Deploy — live in about 2 minutes

No server configuration needed. Streamlit Cloud detects every push to main and redeploys automatically.


Limitations

  • Results with p > 0.05 are flagged as non-significant and should be interpreted with caution
  • Series shorter than 20 data points may produce unreliable ADF results
  • Spurious correlations are possible when both series share a common external trend — always interpret results in context
  • Max lag is internally capped at N/4 (25% of series length) to avoid unreliable estimates at extreme lags

License

MIT License. Free to use , fork , and modify.

Releases

No releases published

Packages

 
 
 

Contributors

Languages