Skip to content

Implement support for UK postal codes and the ITL (International Territorial Level) #7

@bk86a

Description

@bk86a

Overview

Implement support for UK postal codes, mapping them to ITL (International Territorial Level) regions. The UK left the EU but ITL is the direct successor to NUTS for UK statistical regions (same boundaries, UK prefix changed to TL), and UK institutions remain significant participants in Erasmus+ and other EU programmes.

The /lookup response for UK would use the same nuts1/2/3 fields but populated with ITL codes (e.g., TLI32 for Inner London - East).


Data Source

NSPL (National Statistics Postcode Lookup) from the ONS Open Geography Portal.

  • ~1.79 million live postcodes, each mapped to ITL3 (ITL1 and ITL2 derivable by truncation)
  • Free download, Open Government Licence v3.0, no authentication
  • Updated quarterly (February, May, August, November)
  • ZIP download (~178 MB compressed), contains CSV with ~40 columns
  • Relevant columns: pcds (formatted postcode, e.g., SW1A 2AA), itl (ITL3 code, e.g., TLI32), doterm (termination date -- blank for live postcodes)
  • NSPL does not include ITL region names -- those come from separate ONS "Names and Codes" CSV files (232 rows total across 3 levels)

Why NSPL over ONSPD: NSPL uses best-fit allocation via Census Output Areas (ONS-recommended for statistical purposes). ONSPD uses point-in-polygon from grid references. Both contain the same ITL field and postcode set.

Download location: https://geoportal.statistics.gov.uk -- search "National Statistics Postcode Lookup"; also mirrored on data.gov.uk


ITL Code Structure

Level Format Regions Example Name
ITL1 TLx (3 chars) 12 TLI London
ITL2 TLxN (4 chars) 41 TLI3 Inner London - East
ITL3 TLxNN (5 chars) 179 TLI32 Tower Hamlets

All 12 ITL1 regions: TLC (North East), TLD (North West), TLE (Yorkshire and Humber), TLF (East Midlands), TLG (West Midlands), TLH (East of England), TLI (London), TLJ (South East), TLK (South West), TLL (Wales), TLM (Scotland), TLN (Northern Ireland).


UK Postcode Format

Six valid outward code patterns, always followed by a space and 3-character inward code (9AA):

Pattern Example
A9 9AA M1 1AA
A99 9AA B33 8TH
A9A 9AA W1A 1HQ
AA9 9AA CR2 6XH
AA99 9AA DN55 1PT
AA9A 9AA EC1A 1BB

Proposed regex (space optional to handle common real-world input):

^([A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2})$

Normalization strips the space, so SW1A 2AA and SW1A2AA both become SW1A2AA for lookup.


Implementation Tasks

Task 1: Add UK postcode regex to postal_patterns.json

"UK": {
    "regex": "^([A-Z]{1,2}[0-9][0-9A-Z]?\\s?[0-9][A-Z]{2})$",
    "example": "SW1A 2AA, EC1A 1BB, M1 1AA, B33 8TH"
}

No tercet_map needed -- normalize_postal_code() already strips spaces and uppercases.

Task 2: Add country alias GB to UK

UK postcodes may be submitted with ISO 3166-1 GB or the commonly used UK. Add an alias in lookup() (similar to GR to EL for Greece):

if cc == "GB":
    cc = "UK"

Task 3: Extend _parse_csv_content() column detection

Add NSPL column names to the existing alias lists in data_loader.py:

  • Postal code aliases: add "PCDS" (NSPL's formatted postcode column)
  • NUTS3/ITL aliases: add "ITL", "ITL3", "ITL3CD" to the NUTS3 candidate list

This allows the existing CSV parser to handle NSPL format without a separate parser.

Task 4: Add NSPL download mechanism

The NSPL ZIP URL changes with each quarterly release (unlike TERCET's predictable URL pattern). Options:

Option A -- Dedicated configuration setting:
Add nspl_url to settings.json / config.py. The operator sets the URL to the current NSPL ZIP. This is simple but requires manual URL updates each quarter.

Option B -- Use extra_sources mechanism:
Configure the NSPL URL via PC2NUTS_EXTRA_SOURCES. This works today with the column detection fix from Task 3, but extra_sources uses overwrite mode (last-write-wins) which may not be desired for a primary data source.

Option C -- ONS Geoportal API discovery:
Query the ArcGIS Hub API to find the latest NSPL download URL automatically. More complex but eliminates manual URL maintenance.

Recommended: Start with Option A for simplicity. The URL only changes quarterly and can be updated via environment variable.

Task 5: Filter to live postcodes only

The NSPL includes ~900K terminated (historic) postcodes alongside ~1.79M live ones. During CSV parsing, filter rows where doterm (date of termination) is not blank. This requires detecting the DOTERM column and skipping rows where it has a value.

Task 6: Load ITL region names

NSPL doesn't include ITL region names. Download them from the ONS Geoportal "International Territorial Levels Names and Codes" CSV files (3 files, one per level, ~232 rows total). Store in _nuts_names alongside GISCO NUTS names.

Columns: ITLxyzCD (code), ITLxyzNM (name) -- exact column names vary by release.

Task 7: Add UK to settings.json countries list

"countries": [
    "AT", "BE", "BG", "CY", "CZ", "DE", "DK", "EE", "EL", "ES",
    "FI", "FR", "HR", "HU", "IE", "IT", "LT", "LU", "LV", "MT",
    "NL", "PL", "PT", "RO", "SE", "SI", "SK",
    "CH", "IS", "LI", "NO",
    "MK", "RS", "TR",
    "UK"
]

Task 8: Document UK/ITL behaviour

Update the API description and field docstrings to note that for country=UK (or GB), the nuts1/2/3 fields contain ITL codes (which are the UK's successor to NUTS, with matching geographic boundaries).


Memory Considerations

Adding ~1.79M UK postcodes to the in-memory _lookup dict roughly doubles the current dataset (~900K TERCET entries). Estimated additional memory: ~150-200 MB.

Mitigation options if memory is a concern:

  • Outward code aggregation: Map ~3,100 postcode districts to ITL3 via majority vote. Only ~3K entries, but some districts straddle ITL3 boundaries (reduced precision).
  • SQLite-backed lookup for UK: Keep UK postcodes in SQLite and query on demand instead of loading into memory. Slower per-query but minimal RAM overhead.
  • Postcode sector aggregation: ~12,500 sectors (outward code + first inward digit). Better precision than districts, much smaller than full postcodes.

Recommendation: Start with full postcode loading. If memory becomes a deployment constraint, switch to sector-level aggregation as a compromise.


API Compatibility

  • Accept both country=UK and country=GB (alias GB to UK, similar to GR to EL)
  • Response uses existing nuts1/2/3 field names -- no breaking API change
  • The nuts_version in /health remains the GISCO version; UK data versioning (NSPL quarter) could be surfaced via a separate field or the existing last_updated timestamp

Files Affected

File Change
app/postal_patterns.json Add UK regex entry
app/settings.json Add UK to countries, add NSPL URL config
app/config.py Add nspl_url setting (if Option A)
app/data_loader.py Column aliases, GB to UK alias, NSPL download, doterm filter, ITL names
app/main.py GB to UK alias in /lookup endpoint
app/models.py Update field descriptions to mention ITL for UK

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions