-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Overview
Implement support for UK postal codes, mapping them to ITL (International Territorial Level) regions. The UK left the EU but ITL is the direct successor to NUTS for UK statistical regions (same boundaries, UK prefix changed to TL), and UK institutions remain significant participants in Erasmus+ and other EU programmes.
The /lookup response for UK would use the same nuts1/2/3 fields but populated with ITL codes (e.g., TLI32 for Inner London - East).
Data Source
NSPL (National Statistics Postcode Lookup) from the ONS Open Geography Portal.
- ~1.79 million live postcodes, each mapped to ITL3 (ITL1 and ITL2 derivable by truncation)
- Free download, Open Government Licence v3.0, no authentication
- Updated quarterly (February, May, August, November)
- ZIP download (~178 MB compressed), contains CSV with ~40 columns
- Relevant columns:
pcds(formatted postcode, e.g.,SW1A 2AA),itl(ITL3 code, e.g.,TLI32),doterm(termination date -- blank for live postcodes) - NSPL does not include ITL region names -- those come from separate ONS "Names and Codes" CSV files (232 rows total across 3 levels)
Why NSPL over ONSPD: NSPL uses best-fit allocation via Census Output Areas (ONS-recommended for statistical purposes). ONSPD uses point-in-polygon from grid references. Both contain the same ITL field and postcode set.
Download location: https://geoportal.statistics.gov.uk -- search "National Statistics Postcode Lookup"; also mirrored on data.gov.uk
ITL Code Structure
| Level | Format | Regions | Example | Name |
|---|---|---|---|---|
| ITL1 | TLx (3 chars) |
12 | TLI |
London |
| ITL2 | TLxN (4 chars) |
41 | TLI3 |
Inner London - East |
| ITL3 | TLxNN (5 chars) |
179 | TLI32 |
Tower Hamlets |
All 12 ITL1 regions: TLC (North East), TLD (North West), TLE (Yorkshire and Humber), TLF (East Midlands), TLG (West Midlands), TLH (East of England), TLI (London), TLJ (South East), TLK (South West), TLL (Wales), TLM (Scotland), TLN (Northern Ireland).
UK Postcode Format
Six valid outward code patterns, always followed by a space and 3-character inward code (9AA):
| Pattern | Example |
|---|---|
A9 9AA |
M1 1AA |
A99 9AA |
B33 8TH |
A9A 9AA |
W1A 1HQ |
AA9 9AA |
CR2 6XH |
AA99 9AA |
DN55 1PT |
AA9A 9AA |
EC1A 1BB |
Proposed regex (space optional to handle common real-world input):
^([A-Z]{1,2}[0-9][0-9A-Z]?\s?[0-9][A-Z]{2})$
Normalization strips the space, so SW1A 2AA and SW1A2AA both become SW1A2AA for lookup.
Implementation Tasks
Task 1: Add UK postcode regex to postal_patterns.json
"UK": {
"regex": "^([A-Z]{1,2}[0-9][0-9A-Z]?\\s?[0-9][A-Z]{2})$",
"example": "SW1A 2AA, EC1A 1BB, M1 1AA, B33 8TH"
}No tercet_map needed -- normalize_postal_code() already strips spaces and uppercases.
Task 2: Add country alias GB to UK
UK postcodes may be submitted with ISO 3166-1 GB or the commonly used UK. Add an alias in lookup() (similar to GR to EL for Greece):
if cc == "GB":
cc = "UK"Task 3: Extend _parse_csv_content() column detection
Add NSPL column names to the existing alias lists in data_loader.py:
- Postal code aliases: add
"PCDS"(NSPL's formatted postcode column) - NUTS3/ITL aliases: add
"ITL","ITL3","ITL3CD"to the NUTS3 candidate list
This allows the existing CSV parser to handle NSPL format without a separate parser.
Task 4: Add NSPL download mechanism
The NSPL ZIP URL changes with each quarterly release (unlike TERCET's predictable URL pattern). Options:
Option A -- Dedicated configuration setting:
Add nspl_url to settings.json / config.py. The operator sets the URL to the current NSPL ZIP. This is simple but requires manual URL updates each quarter.
Option B -- Use extra_sources mechanism:
Configure the NSPL URL via PC2NUTS_EXTRA_SOURCES. This works today with the column detection fix from Task 3, but extra_sources uses overwrite mode (last-write-wins) which may not be desired for a primary data source.
Option C -- ONS Geoportal API discovery:
Query the ArcGIS Hub API to find the latest NSPL download URL automatically. More complex but eliminates manual URL maintenance.
Recommended: Start with Option A for simplicity. The URL only changes quarterly and can be updated via environment variable.
Task 5: Filter to live postcodes only
The NSPL includes ~900K terminated (historic) postcodes alongside ~1.79M live ones. During CSV parsing, filter rows where doterm (date of termination) is not blank. This requires detecting the DOTERM column and skipping rows where it has a value.
Task 6: Load ITL region names
NSPL doesn't include ITL region names. Download them from the ONS Geoportal "International Territorial Levels Names and Codes" CSV files (3 files, one per level, ~232 rows total). Store in _nuts_names alongside GISCO NUTS names.
Columns: ITLxyzCD (code), ITLxyzNM (name) -- exact column names vary by release.
Task 7: Add UK to settings.json countries list
"countries": [
"AT", "BE", "BG", "CY", "CZ", "DE", "DK", "EE", "EL", "ES",
"FI", "FR", "HR", "HU", "IE", "IT", "LT", "LU", "LV", "MT",
"NL", "PL", "PT", "RO", "SE", "SI", "SK",
"CH", "IS", "LI", "NO",
"MK", "RS", "TR",
"UK"
]Task 8: Document UK/ITL behaviour
Update the API description and field docstrings to note that for country=UK (or GB), the nuts1/2/3 fields contain ITL codes (which are the UK's successor to NUTS, with matching geographic boundaries).
Memory Considerations
Adding ~1.79M UK postcodes to the in-memory _lookup dict roughly doubles the current dataset (~900K TERCET entries). Estimated additional memory: ~150-200 MB.
Mitigation options if memory is a concern:
- Outward code aggregation: Map ~3,100 postcode districts to ITL3 via majority vote. Only ~3K entries, but some districts straddle ITL3 boundaries (reduced precision).
- SQLite-backed lookup for UK: Keep UK postcodes in SQLite and query on demand instead of loading into memory. Slower per-query but minimal RAM overhead.
- Postcode sector aggregation: ~12,500 sectors (outward code + first inward digit). Better precision than districts, much smaller than full postcodes.
Recommendation: Start with full postcode loading. If memory becomes a deployment constraint, switch to sector-level aggregation as a compromise.
API Compatibility
- Accept both
country=UKandcountry=GB(alias GB to UK, similar to GR to EL) - Response uses existing
nuts1/2/3field names -- no breaking API change - The
nuts_versionin/healthremains the GISCO version; UK data versioning (NSPL quarter) could be surfaced via a separate field or the existinglast_updatedtimestamp
Files Affected
| File | Change |
|---|---|
app/postal_patterns.json |
Add UK regex entry |
app/settings.json |
Add UK to countries, add NSPL URL config |
app/config.py |
Add nspl_url setting (if Option A) |
app/data_loader.py |
Column aliases, GB to UK alias, NSPL download, doterm filter, ITL names |
app/main.py |
GB to UK alias in /lookup endpoint |
app/models.py |
Update field descriptions to mention ITL for UK |