Upload any two time-series datasets and instantly discover the lead-lag relationship between them automatically.
Most analysts spend hours manually testing time shifts between datasets. Does copper price lead industrial output by 6 weeks? Does web traffic predict sales 3 days ahead? This tool automates that entirely.
You load two datasets , the app finds the exact lag where they correlate best, and shifts the charts so you can see the relationship visually.
https://timeseries-alignment-tool.streamlit.app/
No installation needed just open and use.
- Multiple input methods — CSV upload, Excel upload, manual row-by-row entry, or paste raw text
- Auto date detection — scans every column and identifies the date column automatically
- Smart resampling — aligns two datasets with different frequencies (daily vs weekly, monthly vs daily, etc.)
- ADF stationarity test — runs Augmented Dickey-Fuller on each series and auto-differences if needed
- Full CCF analysis — tests every lag from -N to +N and measures Pearson correlation at each step
- Significance filtering — flags results with p > 0.05 to prevent false positives
- Interactive charts — raw series, CCF bar chart with significance thresholds, and aligned visualization
- CSV export — download the full cross-correlation function results
Load two datasets (CSV, Excel, manual, or paste)
|
Auto-detect date column
|
Align on common time period (resample if needed)
|
ADF test — auto-difference if non-stationary
|
Compute CCF for all lags from -N to +N
|
Find lag with highest |r| + p-value check
|
Display aligned series and CCF chart
The Augmented Dickey-Fuller test checks whether a time series is stable around a fixed mean or whether it drifts over time.
A series that keeps trending upward (like a stock price) is non-stationary. If two series both trend upward for completely unrelated reasons, the math will say they are highly correlated which is misleading. The ADF test catches this. If a series fails the test (p-value > 0.05), the app automatically computes the differences between consecutive values instead of using the raw values. This removes the trend and makes the correlation honest.
Simple rule: p-value below 0.05 means the series is stable and ready to use as-is. p-value above 0.05 means the series drifts and needs differencing.
The Cross-Correlation Function shifts one series forward and backward in time and measures how similar the two series look at each shift.
For each lag from -N to +N it computes a Pearson correlation coefficient between -1 and +1:
- +1 means the two series are identical at that shift
- 0 means no relationship
- -1 means they move in opposite directions
The app picks the lag where the absolute correlation is highest. A positive lag means Dataset A leads Dataset B. A negative lag means Dataset B leads Dataset A.
Example: lag = +7 with r = 0.85 means Dataset A predicts Dataset B 7 periods in advance with 85% similarity.
# 1. Clone the repo
git clone https://github.com/useraya/timeseries-alignment-tool.git
cd timeseries-alignment-tool
# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate # macOS / Linux
venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run the app
streamlit run app.pyThe app opens at http://localhost:8501
timeseries-alignment-tool/
├── app.py <- Streamlit entry point (all logic in one file)
├── requirements.txt <- Python dependencies
├── README.md
└── data/ <- Sample datasets (optional)
| Method | Details |
|---|---|
| CSV upload | Any separator, date column auto-detected |
| Excel upload | .xlsx format, first sheet used |
| Manual entry | Row-by-row date and value inputs in the UI |
| Paste text | One row per line in date,value format |
Date formats supported: YYYY-MM-DD, DD/MM/YYYY, MM-DD-YYYY, Unix timestamps, and most standard formats (auto-parsed by pandas).
Use case: Does Google Trends search volume for "inflation" predict CPI changes 4 weeks later?
- Upload
google_trends_inflation.csvas Dataset A - Upload
cpi_monthly.csvas Dataset B - Set max lag to 12 in the sidebar
- The app finds lag = +4 with r = 0.72, p < 0.001
- Result: search volume leads CPI by 4 weeks
streamlit>=1.32.0
pandas>=2.0.0
scipy>=1.11.0
statsmodels>=0.14.0
plotly>=5.18.0
numpy>=1.26.0
openpyxl>=3.1.0
Python 3.10 or higher recommended.
The app is deployed on Streamlit Community Cloud (free tier).
To deploy your own fork:
- Push the repo to GitHub (must be public for the free tier)
- Go to share.streamlit.io
- Connect your GitHub account
- Select the repo, branch
main, fileapp.py - Click Deploy — live in about 2 minutes
No server configuration needed. Streamlit Cloud detects every push to main and redeploys automatically.
- Results with p > 0.05 are flagged as non-significant and should be interpreted with caution
- Series shorter than 20 data points may produce unreliable ADF results
- Spurious correlations are possible when both series share a common external trend — always interpret results in context
- Max lag is internally capped at N/4 (25% of series length) to avoid unreliable estimates at extreme lags
MIT License. Free to use , fork , and modify.