Skip to content

BellaKeri/TFINTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

223 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TFINTA - Transport for Ireland Data

"Python library and shell scripts for parsing and displaying Transport for Ireland (TFI/NTA) Rail and DART schedule datasets, both GTFS and realtime"

Since version 1.2 it is PyPI package:

https://pypi.org/project/tfinta/

License

Copyright 2025 BellaKeri BellaKeri@github.com & Daniel Balparda balparda@github.com

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License here.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Overview

TFINTA (Transport for Ireland Data) is a small, batteries-included toolkit for working with publicly-available Irish public-transport datasets—right from your shell or from pure Python.

What you get CLI entry-point What it does
Static GTFS schedules for bus, rail, ferry, Luas… gtfs Download the national GTFS bundle, cache it, and let you inspect any table (agency, stops, routes, shapes, trips, calendars…).
Irish Rail / DART schedules (their separate GTFS feed) dart Same idea, but focused on heavy-rail only—extra helpers for station boards and service calendars.
Live train movements via the Irish Rail XML feed realtime Query the current running trains or a live arrivals/departures board for any station.
Python API import tfinta Load the cached databases as Pandas DataFrames or iterate over strongly-typed dataclasses.

The authors and the library/tools art NOT affiliated with TFI or Irish Rail. The project simply republishes data that both agencies already expose for free. Always check the license/terms on the upstream feeds before redistributing.

Why another transport library?

  • One-stop shop – static schedules and live positions under a single import.
  • Zero boilerplate – no need to remember URLs; the code bundles them.
  • Typed, 90%+ test-covered, MIT-compatible – ideal for research, hobby dashboards or production back-ends.
  • Friendly CLI – perfect for quick shell exploration or cron-driven exports.

Happy hacking & fáilte chuig sonraí iompair na hÉireann!

Use

The TFINTA CLI (gtfs, dart and realtime commands) lets you download, cache, inspect, and pretty-print the official Transport for Ireland Rail and DART schedule dataset from your shell. It also allows you access to realtime data provided by the rail service.

Install

To use in your project/terminal just do:

poetry add tfinta  # (or pip install tfinta)

(In code you will use as from tfinta import dart for example.)

Quick start

A compact set of commands to get you started quickly from installation to inspecting static schedules and live train positions.

  • Install the package (poetry or pip):
poetry add tfinta  # or: pip install tfinta
  • Download and cache the official GTFS bundle (cached by default for 7 days):
poetry run gtfs read
  • Inspect the downloaded GTFS files and some high-level metadata:
poetry run gtfs print basics   # lists files, agencies, routes and brief stats
  • Work with DART (Irish Rail) schedule data:
poetry run dart print stops                # show all DART stops
poetry run dart print trips -d 20250701    # show DART trips for 2025-07-01
  • Query live train positions / running trains from the realtime feed:
poetry run realtime print running          # currently running trains on the network

Notes and tips:

  • Downloads and parsed GTFS data are cached to avoid repeated network requests; the default cache lifetime is 7 days. See the Command Reference for cache control and refresh flags.
  • All CLI commands are also available when invoked via poetry run <command> if you use Poetry-managed virtualenvs.

Quick Python usage:

import tfinta

# Use the library from Python: the package exposes helpers to load
# cached databases as pandas DataFrames and to iterate typed dataclasses.
# See the Command Reference and package docs for specific API calls.

from tfinta import dart
print('Use the dart, gtfs or realtime modules programmatically; see docs.')

For full command and option details, see the Command Reference below.

Command Reference

API

Pre-Requisite

brew install --cask docker gcloud-cli

Run app, login. Run gcloud init, login.

Build and Run API Image

docker build -t tfinta-api .  # or: make docker
docker run --rm -p 8080:8080 tfinta-api  # or: make docker-run

Test on http://localhost:8080/docs.

Project tfinta-prod

On https://console.cloud.google.com/ project is tfinta-prod (#157394351650). On a new development machine you have to run (once):

gcloud config set project tfinta-prod
gcloud config set run/region europe-west1
gcloud services enable run.googleapis.com artifactregistry.googleapis.com cloudbuild.googleapis.com iamcredentials.googleapis.com sts.googleapis.com

The project was created with (no need to do again):

gcloud artifacts repositories create tfinta --repository-format=docker --location=europe-west1 --description="TFINTA container images"
gcloud builds submit --tag "europe-west1-docker.pkg.dev/tfinta-prod/tfinta/tfinta-api:manual-1"

gcloud run deploy tfinta-api --image "europe-west1-docker.pkg.dev/tfinta-prod/tfinta/tfinta-api:manual-1" --region europe-west1 --platform managed --allow-unauthenticated --port 8080

gcloud run services update tfinta-api --region europe-west1 --concurrency 80 --min-instances 0 --max-instances 2 --cpu 1 --memory 512Mi

URL: https://tfinta-api-157394351650.europe-west1.run.app/docs

Get JSON: https://tfinta-api-157394351650.europe-west1.run.app/openapi.json

To generate a new manual deploy:

gcloud builds submit --tag "europe-west1-docker.pkg.dev/tfinta-prod/tfinta/tfinta-api:manual-<<VERSION>>"
gcloud run deploy tfinta-api --region europe-west1 --platform managed --allow-unauthenticated --port 8080 --image "europe-west1-docker.pkg.dev/tfinta-prod/tfinta/tfinta-api:manual-<<VERSION>>"

Automated CD Pipeline

Pushing a version tag (e.g. git tag 2.3 && git push --tags) triggers .github/workflows/cd.yaml, which:

  1. Runs a quick lint/type/test gate on Python 3.12.
  2. Builds Docker images for both APIs and pushes them to Artifact Registry tagged with the version and latest.
  3. Deploys tfinta-api and tfinta-apidb to Cloud Run in parallel.
  4. Publishes the Python package to PyPI.

All three deployment/publish jobs are gated behind the production environment (see below) so a human admin must approve before any infrastructure is touched.

Access-control setup (do once, in order)

Step 1 — Protect the environment (GitHub → Settings → Environments)
  1. Create an environment named production.
  2. Under Required reviewers, add the admin user(s) who may approve deployments.
  3. Check Prevent self-review if you don’t want the person who pushed the tag to also approve their own deploy.
  4. Optionally enable Wait timer (e.g. 5 minutes) to allow cancellation.

All three sensitive jobs (deploy-api, deploy-apidb, publish-pypi) declare environment: production, so none of them will run until a reviewer approves.

Step 2 — Restrict tag creation (GitHub → Settings → Rules → Rulesets)

This prevents a non-admin from pushing a version tag that would trigger the pipeline:

  1. Create a new ruleset, type Tag.
  2. Target pattern: [0-9]* (matches all version tags matching the CD trigger).
  3. Rules to enable: Restrict creations, Restrict deletions, Require linear history.
  4. Bypass: Repository admins only.
Step 3 — Protect workflow files (.github/CODEOWNERS)

The .github/CODEOWNERS file (already committed) requires @BellaKeri or @balparda to approve any PR that touches .github/workflows/ or CODEOWNERS itself. For this to take effect:

  1. Enable Require a pull request before merging in your default-branch protection rule (Settings → Rules → Rulesets → branch target main).
  2. Enable Require review from Code Owners.

Without this a contributor could merge a workflow change that exfiltrates secrets.

GCP authentication setup (Workload Identity Federation, keyless)

Authentication uses Workload Identity Federation — no long-lived service-account key is ever stored in GitHub. The id-token: write permission is scoped to only the two deploy jobs that need it.

Run once per GCP project:

# Create the WIF pool and OIDC provider
gcloud iam workload-identity-pools create github-pool \
  --project=tfinta-prod --location=global \
  --display-name="GitHub Actions Pool"

# Note: attribute-condition restricts token issuance to this repo only
gcloud iam workload-identity-pools providers create-oidc github-provider \
  --project=tfinta-prod --location=global \
  --workload-identity-pool=github-pool \
  --display-name="GitHub provider" \
  --issuer-uri="https://token.actions.githubusercontent.com" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
  --attribute-condition="assertion.repository == 'BellaKeri/TFINTA'"

# Create the CD service account and grant required roles
gcloud iam service-accounts create tfinta-cd \
  --project=tfinta-prod --display-name="TFINTA CD"

for ROLE in roles/run.admin roles/artifactregistry.writer roles/iam.serviceAccountUser; do
  gcloud projects add-iam-policy-binding tfinta-prod \
    --member="serviceAccount:tfinta-cd@tfinta-prod.iam.gserviceaccount.com" \
    --role="${ROLE}"
done

# Allow the GitHub repo to impersonate the SA via WIF
PROJECT_NUMBER=$(gcloud projects describe tfinta-prod --format='value(projectNumber)')
gcloud iam service-accounts add-iam-policy-binding \
  tfinta-cd@tfinta-prod.iam.gserviceaccount.com \
  --project=tfinta-prod \
  --role=roles/iam.workloadIdentityUser \
  --member="principalSet://iam.googleapis.com/projects/${PROJECT_NUMBER}/locations/global/workloadIdentityPools/github-pool/attribute.repository/BellaKeri/TFINTA"

# Print the provider resource name → add as GitHub Secret GCP_WIF_PROVIDER
gcloud iam workload-identity-pools providers describe github-provider \
  --project=tfinta-prod --location=global \
  --workload-identity-pool=github-pool \
  --format="get(name)"

Then create the following in GitHub → Settings → Secrets and variables → Actions:

Name Kind Value
GCP_WIF_PROVIDER Secret Output of the last gcloud command above
TFINTA_DB_HOST Secret GCE VM external IP
TFINTA_DB_PASSWORD Secret PostgreSQL password for the tfinta user
PYPI_API_TOKEN Secret PyPI project token
GCP_PROJECT Variable tfinta-prod
GCP_REGION Variable europe-west1
GCP_SA_EMAIL Variable tfinta-cd@tfinta-prod.iam.gserviceaccount.com
GCP_ARTIFACT_REGISTRY Variable europe-west1-docker.pkg.dev/tfinta-prod/tfinta
TFINTA_DB_NAME Variable tfinta
TFINTA_DB_USER Variable tfinta
TFINTA_DB_MIN_CONN Variable 2 (optional, uses code default)
TFINTA_DB_MAX_CONN Variable 10 (optional, uses code default)
TFINTA_LOCK_DURATION_SEC Variable 15 (optional, uses code default)
TFINTA_STALE_STATIONS_SEC Variable 86400 (optional, uses code default)
TFINTA_STALE_RUNNING_SEC Variable 90 (optional, uses code default)
TFINTA_STALE_STATION_BOARD_SEC Variable 90 (optional, uses code default)
TFINTA_STALE_TRAIN_MOVEMENTS_SEC Variable 90 (optional, uses code default)

DB API (PostgreSQL-backed)

The DB API (apidb.py) exposes the exact same REST endpoints as the realtime API (api.py), but reads data from a PostgreSQL database instead of the upstream Irish Rail XML feed. This allows for:

  • Faster, cached responses
  • Historical data queries
  • Independence from upstream API availability
  • Custom data ingestion pipelines

Architecture

┌────────────┐     ┌──────────────┐     ┌──────────────┐
│  Client    │────▶│  Cloud Run   │────▶│  Compute     │
│  (browser/ │     │  (apidb.py)  │     │  Engine      │
│   mobile)  │     │  Port 8081   │     │  e2-micro    │
└────────────┘     └──────────────┘     │  PostgreSQL  │
                                        │  Port 5432   │
                                        └──────────────┘

Local Development (Docker Compose)

Start a local PostgreSQL instance:

# Start Postgres (background)
docker compose up -d

# Check it's healthy
docker compose ps

# Stop
docker compose down

# Stop and remove data volume
docker compose down -v

The local database is accessible at localhost:5432 with user tfinta / password tfinta.

SQL Schema and Migrations

The database schema lives in db/migrations/ as numbered SQL files. Each migration is idempotent (safe to re-run).

Tables:

Table Description
stations All Irish Rail stations (code, name, location, alias)
running_trains Currently running trains (code, status, position)
station_board_queries Metadata for station board fetches
station_board_lines Individual lines on a station departure board
train_movement_queries Metadata for train movement fetches
train_stops Individual stops in a train's journey
schema_version Migration tracking

Bootstrap a new database (run once as superuser):

psql -U postgres -f db/migrations/000_create_database.sql

Apply all migrations (idempotent):

# Local (defaults to localhost/tfinta/tfinta)
./db/migrate.sh

# Remote
PGHOST=<vm-ip> PGPASSWORD=<password> ./db/migrate.sh

Apply a single migration manually:

psql -U tfinta -d tfinta -f db/migrations/001_initial_schema.sql

Running the DB API Locally

# 1. Start Postgres
docker compose up -d

# 2. Run migrations
./db/migrate.sh

# 3. Start the DB API server
poetry run uvicorn tfinta.apidb:app --reload --port 8081

# 4. Open docs
open http://localhost:8081/docs

Or via Docker:

docker build -f Dockerfile.apidb -t tfinta-apidb .
docker run --rm -p 8081:8081 \
  -e TFINTA_DB_HOST=host.docker.internal \
  tfinta-apidb

Environment variables (see .env.example):

Variable Default Description
TFINTA_DB_HOST localhost PostgreSQL host
TFINTA_DB_PORT 5432 PostgreSQL port
TFINTA_DB_NAME tfinta Database name
TFINTA_DB_USER tfinta Database user
TFINTA_DB_PASSWORD tfinta Database password
TFINTA_DB_MIN_CONN 2 Minimum pool connections
TFINTA_DB_MAX_CONN 10 Maximum pool connections
TFINTA_LOCK_DURATION_SEC 15 Max seconds a cache refresh lock is held (see db.py)
TFINTA_STALE_STATIONS_SEC 86400 Cache TTL for station list (seconds)
TFINTA_STALE_RUNNING_SEC 90 Cache TTL for running trains (seconds)
TFINTA_STALE_STATION_BOARD_SEC 90 Cache TTL for station board (seconds)
TFINTA_STALE_TRAIN_MOVEMENTS_SEC 90 Cache TTL for train movements (seconds)

Deploy PostgreSQL to GCE e2-micro (Free Tier)

The deploy/gce/deploy_gce_postgres.sh script provisions a Compute Engine e2-micro VM (1 GB RAM, 2 shared vCPUs, 30 GB standard PD) in the GCP free tier.

# Pre-requisites
brew install --cask gcloud-cli
gcloud auth login
gcloud config set project tfinta-prod

# Create the VM (installs PostgreSQL 17, applies tuning)
./deploy/gce/deploy_gce_postgres.sh

# Get the VM's external IP
gcloud compute instances describe tfinta-pg \
  --zone=europe-west1-b \
  --format='get(networkInterfaces[0].accessConfigs[0].natIP)'

# SSH into the VM and change the default password!
gcloud compute ssh tfinta-pg --zone=europe-west1-b
sudo -u postgres psql -c "ALTER ROLE tfinta WITH PASSWORD 'YOUR_SECURE_PASSWORD';"
exit

# Run migrations from your local machine
PGHOST=<VM_IP> PGPASSWORD=<password> ./db/migrate.sh

# Teardown (deletes VM + firewall rule)
./deploy/gce/deploy_gce_postgres.sh teardown

Security note: The default setup opens port 5432 to all IPs (0.0.0.0/0). For production, restrict --source-ranges in the firewall rule to your Cloud Run egress IPs or use a VPC connector.

Deploy DB API to Cloud Run

# Build and push the DB API image
gcloud builds submit \
  --tag "europe-west1-docker.pkg.dev/tfinta-prod/tfinta/tfinta-apidb:manual-1" \
  -f Dockerfile.apidb .

# Deploy with DB connection env vars
gcloud run deploy tfinta-apidb \
  --image "europe-west1-docker.pkg.dev/tfinta-prod/tfinta/tfinta-apidb:manual-1" \
  --region europe-west1 \
  --platform managed \
  --allow-unauthenticated \
  --port 8081 \
  --set-env-vars "TFINTA_DB_HOST=<VM_EXTERNAL_IP>,TFINTA_DB_USER=tfinta,TFINTA_DB_PASSWORD=<password>,TFINTA_DB_NAME=tfinta"

# Optionally override cache TTL / lock duration (all have sensible defaults):
# --set-env-vars "...,TFINTA_STALE_STATIONS_SEC=86400,TFINTA_STALE_RUNNING_SEC=90,TFINTA_STALE_STATION_BOARD_SEC=90,TFINTA_STALE_TRAIN_MOVEMENTS_SEC=90,TFINTA_LOCK_DURATION_SEC=15"

# Tune resources
gcloud run services update tfinta-apidb \
  --region europe-west1 \
  --concurrency 80 \
  --min-instances 0 \
  --max-instances 2 \
  --cpu 1 \
  --memory 512Mi

PostgreSQL Tuning (e2-micro)

The configuration in db/postgresql-tfinta.conf is optimized for a 1 GB RAM VM:

Parameter Value Rationale
max_connections 30 Limit RAM usage per-connection
shared_buffers 128 MB ~12% of RAM
work_mem 4 MB Per-sort/hash memory
maintenance_work_mem 64 MB VACUUM, CREATE INDEX
effective_cache_size 512 MB Planner hint for OS cache
autovacuum on Keep tables healthy
autovacuum_max_workers 2 Limit CPU contention

Data Sources

Stations

GPT Search

Official dataset Rail&DART

  1. Get All Stations - usage returns a list of all stations with StationDesc, StationCode, StationId, StationAlias, StationLatitude and StationLongitude ordered by Latitude, Longitude. Example:
<objStation>
    <StationDesc>Howth Junction</StationDesc>
    <StationAlias>Donaghmede ( Howth Junction )</StationAlias>
    <StationLatitude>53.3909</StationLatitude>
    <StationLongitude>-6.15672</StationLongitude>
    <StationCode>HWTHJ</StationCode>
    <StationId>105</StationId>
</objStation>

Trains

Official running Trains

  1. Get All Running Trains - Usage returns a listing of 'running trains' ie trains that are between origin and destination or are due to start within 10 minutes of the query time. Returns TrainStatus, TrainLatitude, TrainLongitude, TrainCode, TrainDate, PublicMessage and Direction.
  • a . TrainStatus = N for not yet running or R for running.

  • b . TrainCode is Irish Rail's unique code for an individual train service on a date.

  • c . Direction is either Northbound or Southbound for trains between Dundalk and Rosslare and between Sligo and Dublin. for all other trains the direction is to the destination eg. To Limerick.

  • d . Public Message is the latest information on the train uses \n for a line break eg AA509\n11:00 - Waterford to Dublin Heuston (0 mins late)\nDeparted Waterford next stop Thomastown.

<objTrainPositions>
    <TrainStatus>N</TrainStatus>
    <TrainLatitude>51.9018</TrainLatitude>
    <TrainLongitude>-8.4582</TrainLongitude>
    <TrainCode>D501</TrainCode>
    <TrainDate>01 Jun 2025</TrainDate>
    <PublicMessage>D501\nCork to Cobh\nExpected Departure 08:00</PublicMessage>
    <Direction>To Cobh</Direction>
</objTrainPositions>

GTFS Schedule Files

The Official GTFS Schedules will have a small 19kb CSV, currently here, that has the positions of all GTFS files. We will load this CSV to search for the Iarnród Éireann / Irish Rail entry.

GTFS is defined here. It has 6 mandatory tables (files) and a number of optional ones. We will start by making a cached loader for this data into memory dicts that will be pickled to disk.

Appendix: Development Instructions

Setup

If you want to develop for this project, first install python 3.11/12/13 and Poetry, but to get the versions you will need, we suggest you do it like this (Linux):

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install git python3 python3-pip pipx python3-dev python3-venv build-essential software-properties-common

sudo add-apt-repository ppa:deadsnakes/ppa  # install arbitrary python version
sudo apt-get update
sudo apt-get install python3.11 python3.13

sudo apt-get remove python3-poetryTaranis Travel - Android/iPhone App
python3.13 -m pipx ensurepath
# re-open terminal
pipx install poetry
poetry --version  # should be >=2.1

poetry config virtualenvs.in-project true  # creates .venv inside project directory
poetry config pypi-token.pypi <TOKEN>      # add your personal PyPI project token, if any

or this (Mac):

brew update
brew upgrade
brew cleanup -s

brew install git python@3.11 python@3.13  # install arbitrary python version

brew uninstall poetry
python3.13 -m pip install --user pipx
python3.13 -m pipx ensurepath
# re-open terminal
pipx install poetry
poetry --version  # should be >=2.1

poetry config virtualenvs.in-project true  # creates .venv inside project directory
poetry config pypi-token.pypi <TOKEN>      # add your personal PyPI project token, if any

Now install the project:

git clone https://github.com/BellaKeri/TFINTA.git TFINTA
cd TFINTA

poetry env use python3.11  # creates the venv, 3.11 for development!
poetry sync                # sync env to project's poetry.lock file
poetry env info            # no-op: just to check

poetry run pytest -vvv
# or any command as:
poetry run <any-command>

To activate like a regular environment do:

poetry env activate
# will print activation command which you next execute, or you can do:
source .env/bin/activate                         # if .env is local to the project
source "$(poetry env info --path)/bin/activate"  # for other paths

pytest  # or other commands

deactivate

Updating Dependencies

To update poetry.lock file to more current versions do poetry update, it will ignore the current lock, update, and rewrite the poetry.lock file. If you have cache problems poetry cache clear PyPI --all will clean it.

To add a new dependency you should do:

poetry add "pkg>=1.2.3"  # regenerates lock, updates env (adds dep to prod code)
poetry add -G dev "pkg>=1.2.3"  # adds dep to dev code ("group" dev)
# also remember: "pkg@^1.2.3" = latest 1.* ; "pkg@~1.2.3" = latest 1.2.* ; "pkg@1.2.3" exact

If you manually added a dependency to pyproject.toml you should very carefully recreate the environment and files:

rm -rf .venv .poetry poetry.lock
poetry env use python3.13
poetry install

Remember to check your diffs before submitting (especially poetry.lock) to avoid surprises!

When dependencies change, always regenerate requirements.txt by running:

poetry export --format requirements.txt --without-hashes --output requirements.txt

Creating a New Version

# bump the version!
poetry version minor  # updates 1.6 to 1.7, for example
# or:
poetry version patch  # updates 1.6 to 1.6.1
# or:
poetry version <version-number>
# (also updates `pyproject.toml` and `poetry.lock`)

# publish to GIT, including a TAG
git commit -a -m "release version 1.7"
git tag 1.7
git push
git push --tags

# prepare package for PyPI
poetry build
poetry publish

You can find the 10 top slowest tests by running:

poetry run pytest -vvv -q --durations=10

You can search for flaky tests by running all tests 100 times:

poetry run pytest --flake-finder --flake-runs=100

TODO

  • Versioning of GTFS data

About

Transport for Ireland Data

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors