Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

`cnmaps-data` 的重要变更会记录在这里。

## 1.1.1

- 切换国外国家级边界到世界银行官方边界源,并补充中国周边边界处理说明与示意图。
- 解决了无法正确处理争议地区边界的问题,例如克什米尔等区域的边界组织与落库规则。
- 为行政区索引增加 `iso3` 字段,支持国家级记录使用 `ISO3` 或组合码查询。
- 统一中国相关记录的 `iso3` 规则:香港为 `HKG`,澳门为 `MAC`,台湾保持 `CHN`。
- 更新 README 顶部地球示意图,并优化展示尺寸与资源体积。

## 1.1.0

- 新增 `cn-neighbors` 数据集,提供基于中国边界口径整理的邻国国家级边界。
Expand Down
83 changes: 70 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<p align="center">
<img src="docs/assets/world-tech-globe-v4.png" alt="cnmaps-data globe" width="160" />
<img src="docs/assets/world-tech-globe-v4.png" alt="cnmaps-data globe" width="240" />
</p>

# cnmaps-data
Expand All @@ -16,13 +16,13 @@

当前 `cnmaps-data` 内置三类数据集:

- 行政区边界数据
- 行政区边界数据
- 索引库:`cnmaps_data/data/index/administrative.db`
- 数据根目录:`cnmaps_data/data/datasets/administrative/`
- 当前包含:
- `amap`:**高德**来源的中国省 / 市 / 县行政区边界(包内目录与索引字段 `source = 高德`;出处见「[数据来源](#数据来源)」)
- `cn-neighbors`:基于中国官方口径边界与世界国界数据派生的邻国国家级边界
- `world-countries`:除中国及 `cn-neighbors` 外的其他世界国家级边界
- `cn-neighbors`:基于中国官方口径边界与世界银行国界数据派生的邻国国家级边界(目录独立,但 SQLite 中 `source = 世界银行`)
- `world-countries`:除中国及 `cn-neighbors` 外的其他世界国家级边界(目录独立,但 SQLite 中 `source = 世界银行`)
- 地理边界数据
- 数据根目录:`cnmaps_data/data/datasets/geography/`
- 样例数据
Expand All @@ -33,6 +33,7 @@
- 它只提供“国”一级边界,不下探到邻国的省州级行政区。
- 它的几何是基于 `cnmaps-data` 中的中国边界,结合外部世界边界源数据裁剪/派生得到。
- 这是一套带明确口径说明的派生数据,不应与国际通行的中立边界数据混淆。
- 在 SQLite 中,它与 `world-countries` 一样统一标记为 `source = 世界银行`,二者的区别主要通过 `path` 目录前缀体现。

关于 `world-countries`:

Expand All @@ -43,16 +44,68 @@
- 它在写出前会统一扣除 `cnmaps-data` 当前中国边界所覆盖的几何区域,以避免与中国口径边界产生重叠。
- 中文名映射表只是维护辅助资料;最终名称仍直接写入 SQLite 和 GeoJSON 产物中。
- 除主权国家外,它现在也纳入了一批带 `iso3` 的海外领地/属地记录,例如格陵兰。
- 与 `cn-neighbors` 一样,它在 SQLite 中统一标记为 `source = 世界银行`。

关于 `iso3`:

- `ADMINISTRATIVE` 表现在正式包含 `iso3` 列。
- 国外国家 / 地区级记录写入各自的 `ISO3` 或自定义组合码,例如 `PSE`、`IND-PAK-JK`。
- 中国行政区记录默认写作 `CHN`。
- 香港特别行政区相关记录写作 `HKG`,澳门特别行政区相关记录写作 `MAC`。
- 台湾相关记录仍统一写作 `CHN`。

## 数据来源

行政区边界所依据的公开数据出处如下。仓库内几何与属性可能经过裁剪、拓扑处理、与中国边界做几何扣除或与中文名映射合并,以包内实际文件为准。

- **中国省 / 市 / 县**:原始数据来自 **高德(Amap)**。独立对照与学术引用可使用 [GaryBikini/ChinaAdminDivisonSHP](https://github.com/GaryBikini/ChinaAdminDivisonSHP) **v2.0**(2021),Zenodo DOI [10.5281/zenodo.4167299](https://doi.org/10.5281/zenodo.4167299)。
- **国外国家与地区(国界级)**:OpenDataSoft 数据集 [World Administrative Boundaries - Countries and Territories](https://public.opendatasoft.com/explore/dataset/world-administrative-boundaries/export/?flg=en-us)(门户内标识 `world-administrative-boundaries`,为全球 level 0 行政边界,含部分非主权领地)
- **国外国家与地区(国界级)**:**World Bank Official Boundaries - Admin 0**([World Bank Data Catalog](https://datacatalog.worldbank.org/search/dataset/0038272/world-bank-official-boundaries)),用于提供全球国家级边界、领地及争议区几何。仓库内的 `cn-neighbors` 与 `world-countries` 会在此基础上继续执行邻国吸附、中国口径扣除、争议区分类整理和中文名称回写

`cn-neighbors` 与 `world-countries` 的中国一侧几何与 `amap` 一致,国外一侧基于上述世界国界数据派生,详见各小节说明。

## 边界处理效果

下面几张图用于展示当前数据包对中国周边边界、争议区以及海上方向的处理效果:

- 中国使用暗红色表示
- 周边国家与地区使用蓝绿色表示
- 单独保留的争议地区使用浅黄色表示

### 总览

<p align="center">
<img src="docs/assets/cn-neighbors-overview.png" alt="cn-neighbors overview" width="92%" />
</p>

### 中印边界与克什米尔争议区

<p align="center">
<img src="docs/assets/india-border.png" alt="india border handling" width="92%" />
</p>

### 南海方向

<p align="center">
<img src="docs/assets/south-china-sea.png" alt="south china sea handling" width="92%" />
</p>

### 中国-塔吉克斯坦边界

<p align="center">
<img src="docs/assets/china-tajikistan-border.png" alt="china tajikistan border handling" width="92%" />
</p>

关于中国和塔吉克斯坦之间出现的空隙,需要额外说明:

- 中国和塔吉克斯坦之间长期存在未定国界问题。
- 世界银行及其他国际版本边界数据在中塔边界处采用的口径,与中国大陆正规地图的未定国界口径并不一致。
- 中国大陆当前主流正规地图在这一段边界上的口径,相比国际版本更小;这和其他“自己主张更大范围”的争议区不同,属于“当前中国大陆公开审图口径反而更吃亏”的情况。
- 天地图、高德、百度等带审图号的主流地图产品,目前在这里普遍都采用这一更小的版本。
- 因此,当 `amap` 的中国边界与世界银行的外国边界在中塔边境直接拼接时,会留下图中可见的一片空白区域。
- 基于最小改动原则,`cnmaps-data` 目前对这一处仅做说明,不做额外人工填补或再分配处理。

更多信息请参考维基百科:[中国—塔吉克斯坦边界](https://zh.wikipedia.org/zh-cn/%E4%B8%AD%E5%9B%BD%E2%80%94%E5%A1%94%E5%90%89%E5%85%8B%E6%96%AF%E5%9D%A6%E8%BE%B9%E7%95%8C)

## 与 cnmaps 的关系

`cnmaps` 运行时会优先发现并使用已安装的数据 provider。对官方数据包来说,`cnmaps-data` 会通过 Python entry point 暴露 provider,`cnmaps` 安装后默认会把它作为依赖一起安装。
Expand All @@ -72,13 +125,10 @@ pip install cnmaps

`cnmaps` 当前按以下优先级查找数据源:

1. 环境变量 `CNMAPS_DATA_DIR`
2. 已安装包里注册的 `cnmaps.data_providers` entry point
3. 官方包 `cnmaps_data.provider`
4. 本地同级源码目录 `cnmaps-data`
5. `cnmaps` 内置旧数据目录(兼容过渡)
1. 已安装包里注册的 `cnmaps.data_providers` entry point
2. 官方包 `cnmaps_data.provider`

因此,第三方数据包如果想兼容 `cnmaps`,推荐使用 entry point 方式提供自己的 provider。
因此,第三方数据包如果想兼容 `cnmaps`,推荐使用 entry point 方式提供自己的 provider。当前 `cnmaps 2.x` 不再依赖内置旧数据目录,也不再以同级源码目录作为正式运行时发现路径。

## 对第三方开发者

Expand Down Expand Up @@ -113,13 +163,13 @@ python scripts/generate_dataset_index_docs.py
如果需要重建 `cn-neighbors` 数据,可使用:

```bash
python scripts/generate_cn_neighbors.py --world-shp /path/to/world-administrative-boundaries.shp
python scripts/generate_cn_neighbors.py --world-shp /path/to/WB_GAD_ADM0_complete.shp
```

如果需要生成其他世界国家级边界,可使用:

```bash
python scripts/generate_world_countries.py --world-shp /path/to/world-administrative-boundaries.shp
python scripts/generate_world_countries.py --world-shp /path/to/WB_GAD_ADM0_complete.shp
```

这个脚本会在输出 `world-countries` 前,先对每个国家执行一次基于中国边界的几何扣除。
Expand All @@ -130,6 +180,12 @@ python scripts/generate_world_countries.py --world-shp /path/to/world-administra
python scripts/update_country_names.py
```

如果需要在修改数据库结构或生成逻辑后,重新生成国外名称索引页,可执行:

```bash
python scripts/generate_dataset_index_docs.py
```

构建结果会包含:

- `sdist`
Expand Down Expand Up @@ -178,4 +234,5 @@ python -m cnmaps_data.checker /path/to/your-data-package/cnmaps_data

- [开发者手册](docs/developer-guide.md)
- [数据集覆盖范围索引](docs/dataset-index.md)(省 / 市 / 县与国外名称列表,由索引库生成)
- [国家名称与 ISO3 映射表](docs/country-name-map.md)
- [更新日志](CHANGELOG.md)
2 changes: 1 addition & 1 deletion cnmaps_data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Official data package for cnmaps."""

__version__ = "1.1.0"
__version__ = "1.1.1"
30 changes: 28 additions & 2 deletions cnmaps_data/checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
ADMIN_COLUMNS = (
"id",
"country",
"iso3",
"province",
"city",
"district",
Expand Down Expand Up @@ -131,22 +132,47 @@ def _check_administrative_index(package_root: Path, manifest: dict, sample_limit
if columns != ADMIN_COLUMNS:
raise ValueError(f"ADMINISTRATIVE 表字段不符合要求: {columns}")

rows = list(cur.execute("SELECT id, path FROM ADMINISTRATIVE;"))
rows = list(
cur.execute(
"SELECT id, country, iso3, province, city, level, source, path FROM ADMINISTRATIVE;"
)
)
finally:
con.close()

if not rows:
raise ValueError("ADMINISTRATIVE 表为空")

seen_ids = set()
for idx, (row_id, relative_path) in enumerate(rows):
for idx, (row_id, country, iso3, province, city, level, source, relative_path) in enumerate(rows):
if row_id in seen_ids:
raise ValueError(f"ADMINISTRATIVE 表存在重复 id: {row_id}")
seen_ids.add(row_id)

if not relative_path:
raise ValueError(f"ADMINISTRATIVE 表存在空 path: id={row_id}")

if source == "高德":
expected_iso3 = "CHN"
if province == "香港特别行政区" or city == "香港特别行政区":
expected_iso3 = "HKG"
elif province == "澳门特别行政区" or city == "澳门特别行政区":
expected_iso3 = "MAC"

if str(iso3 or "").upper() != expected_iso3:
raise ValueError(
f"高德记录的 iso3 不符合要求: id={row_id}, country={country}, province={province}, "
f"city={city}, level={level}, expected={expected_iso3}, actual={iso3}"
)

if level == "国" and source == "世界银行":
if not iso3:
raise ValueError(f"世界银行国家级记录缺少 iso3: id={row_id}, country={country}")
if Path(relative_path).stem.upper() != str(iso3).upper():
raise ValueError(
f"世界银行国家级记录 iso3 与文件名不一致: id={row_id}, iso3={iso3}, path={relative_path}"
)

geojson_path = _resolve_relative_geojson_path(administrative_root, relative_path)
_check_geojson_file(geojson_path)

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"type":"Feature","properties":{"iso3":"ABW","name":"阿鲁巴","name_en":"Aruba","source":"WORLD_COUNTRIES","kind":"陆地","level":"国"},"geometry":{"type":"Polygon","coordinates":[[[-69.88223999999991,12.411110000000065],[-69.9469499999999,12.43667000000005],[-70.05903999999992,12.540210000000059],[-70.05965999999995,12.627780000000087],[-70.03319999999991,12.618330000000071],[-69.93223999999992,12.528060000000039],[-69.89695999999992,12.480830000000083],[-69.89139999999992,12.47222000000005],[-69.88555999999994,12.45778000000007],[-69.8748599999999,12.415280000000052],[-69.88223999999991,12.411110000000065]]]}}
{"type":"Feature","properties":{"iso3":"ABW","name":"阿鲁巴","name_en":"Aruba","source":"世界银行","kind":"陆地","level":"国"},"geometry":{"type":"Polygon","coordinates":[[[-70.05107999964224,12.565399999910142],[-70.04885000022728,12.5669599999041],[-70.04729000023326,12.570309999706751],[-70.04497000026527,12.57933999974125],[-70.04684000016584,12.588090000353304],[-70.04895999990401,12.592469999990556],[-70.05055999994403,12.59942000033277],[-70.05187999966222,12.60121999970329],[-70.05164000028577,12.604049999807671],[-70.05320999984156,12.606619999613315],[-70.05427999972216,12.611000000149943],[-70.05954999993253,12.616130000199405],[-70.05864999979764,12.619740000300794],[-70.05550999978664,12.623230000264357],[-70.04684000016584,12.620050000207414],[-70.04657000030522,12.617989999638837],[-70.04366999967073,12.614650000297331],[-70.03920000037965,12.612090000053513],[-70.0384099999215,12.610799999920118],[-70.03393999973116,12.609019999673308],[-70.03103999999598,12.605030000034503],[-70.02841000012137,12.605299999895124],[-70.02262999977461,12.60184999997756],[-70.01999000033823,12.598509999736734],[-70.01827000016033,12.594130000099483],[-70.0132499997876,12.58565000024737],[-70.00860999985156,12.582350000052486],[-70.00363999998598,12.5753799996873],[-69.99916000023376,12.572560000044007],[-69.99415999988406,12.568200000429727],[-69.98968000013184,12.562540000220906],[-69.98808999965371,12.558939999681286],[-69.98389000022331,12.557279999572359],[-69.9802200000529,12.55741999973327],[-69.97180000026975,12.552039999846158],[-69.9665500000824,12.549229999764748],[-69.96076000017382,12.54409000015346],[-69.95813999986109,12.544349999552878],[-69.9568099996817,12.539980000376772],[-69.95470000040467,12.537420000132954],[-69.94762000036269,12.539240000425707],[-69.94473000018934,12.537970000315283],[-69.93683000010441,12.529350000302315],[-69.93157000035518,12.525390000248365],[-69.92946000017884,12.524880000111978],[-69.92827000016035,12.518950000042537],[-69.92654999998246,12.517670000370288],[-69.92207999979212,12.511500000025023],[-69.91827000036017,12.508420000082992],[-69.91760000003995,12.505340000140961],[-69.9131400003107,12.504190000168421],[-69.90682999980459,12.499439999656317],[-69.90629999964523,12.49737999998706],[-69.90313999961126,12.496359999714286],[-69.89865999985909,12.487619999563378],[-69.89576000012386,12.484280000221872],[-69.89340000010986,12.483380000086981],[-69.88866999962073,12.483010000111449],[-69.88406000016886,12.476059999769234],[-69.88192999996949,12.467060000218908],[-69.88231000040616,12.46267999968228],[-69.8810000002498,12.46254999998257],[-69.87876000037369,12.457789999908641],[-69.87636999987558,12.449550000332295],[-69.8734599996792,12.443370000425205],[-69.874499999975,12.438729999589839],[-69.87213000039918,12.437180000057026],[-69.87105999961926,12.4302400001759],[-69.86826999956077,12.420969999865633],[-69.86589999998495,12.417109999926595],[-69.86694000028075,12.414529999659806],[-69.86930000029474,12.41491999965831],[-69.87322999986458,12.41322000040276],[-69.87349000016332,12.411930000269365],[-69.87952000034772,12.41191999980822],[-69.87979000020829,12.41345000021738],[-69.88529000023328,12.41370000005503],[-69.88412000023777,12.416020000023025],[-69.88556999965573,12.418850000127463],[-69.88899999955032,12.421159999634312],[-69.8918699997007,12.4216699997707],[-69.89424000017584,12.420889999773692],[-69.89607000003042,12.422429999744736],[-69.89843000004441,12.422289999583882],[-69.90185000037718,12.425760000423736],[-69.90606000026878,12.42895999960433],[-69.90867999968219,12.429079999742271],[-69.90947999970217,12.431399999710266],[-69.92022000035269,12.430329999829667],[-69.92310999962672,12.43289999963531],[-69.92626000009886,12.434559999744238],[-69.9247000001049,12.436250000337282],[-69.92550000012488,12.439090000003546],[-69.92917000029524,12.439840000415757],[-69.9349400001808,12.440079999792204],[-69.94149999962525,12.442370000175401],[-69.94177000038519,12.443660000308796],[-69.94571000041617,12.444420000282832],[-69.94939000014836,12.44801000036125],[-69.95569999975515,12.451600000439726],[-69.95622999991451,12.45392000040772],[-69.95938999994848,12.458030000184408],[-69.96780000016986,12.463149999772725],[-69.9701600001838,12.463659999909112],[-69.97279999962024,12.466739999851143],[-69.978579999967,12.469799999770203],[-69.97937000042515,12.4721300001994],[-69.98278999985854,12.475470000440225],[-69.98699999975014,12.476870000250358],[-69.98962999962475,12.47981999959336],[-69.99409000025327,12.4809699995659],[-69.9955300001094,12.479030000034527],[-69.9982800001219,12.479409999571885],[-70.00039000029824,12.48132999997955],[-70.00144999971769,12.484679999782202],[-70.00774999976267,12.486709999866662],[-70.00567000007038,12.490070000130459],[-70.0132899998336,12.493910000046526],[-70.01513000014933,12.49647999985217],[-70.01802000032274,12.498529999959601],[-70.02170000005492,12.49928999993358],[-70.02248999961375,12.500959999604333],[-70.02853000025925,12.502610000151435],[-70.0298499999775,12.505179999957079],[-70.0297400003007,12.50852999975973],[-70.03159000017831,12.512139999861176],[-70.03381999959328,12.514319999668317],[-70.03617999960727,12.514049999807696],[-70.04327999967222,12.51905000015745],[-70.04487000015035,12.521359999664298],[-70.05275999977414,12.527520000447737],[-70.05355999979412,12.530349999652799],[-70.05591999980811,12.530979999927126],[-70.05805000000743,12.535740000001056],[-70.06145000031722,12.53676000027383],[-70.06330000019483,12.538300000244874],[-70.0639600000539,12.54138999974873],[-70.05523000036419,12.553539999771203],[-70.05538999964875,12.560490000113475],[-70.05316999979561,12.563850000377272],[-70.05107999964224,12.565399999910142]]]}}

Large diffs are not rendered by default.

Loading
Loading