Skip to content
This repository was archived by the owner on Jan 5, 2026. It is now read-only.

picks iso codes for admin0_synonyms#236

Open
AleGoiko wants to merge 2 commits intomasterfrom
pick_iso_codes
Open

picks iso codes for admin0_synonyms#236
AleGoiko wants to merge 2 commits intomasterfrom
pick_iso_codes

Conversation

@AleGoiko
Copy link

tries to fix #233

Any feedback is welcome, as I am not sure this is the best approach.

@AleGoiko AleGoiko requested a review from ethervoid February 26, 2018 08:06
@AleGoiko AleGoiko self-assigned this Feb 26, 2018
Copy link
Contributor

@ethervoid ethervoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Is there any more differences between the adm0_a3 and the iso_a3 field?

@AleGoiko
Copy link
Author

AleGoiko commented Feb 26, 2018

do you mean more countries that change or format wise? if its the latter the format looked the same to me, if its the former, not that I saw but I may be missing something.

@ethervoid
Copy link
Contributor

I was thinking if moving from use adm0_a3 to iso_a3 as the principal data for countries changes anything more apart from South Sudan (more discrepancies between iso and the american format)

@AleGoiko
Copy link
Author

other than running test.sh is there any other way to test this?

@ethervoid
Copy link
Contributor

I'd compare the two tables to be completely sure

@AleGoiko
Copy link
Author

so i tweaked build_synonym_table.sql a little bit more and the result is as follows
(being admin0_synonyms_orig the original admin0_synonyms, and admin0_synonyms the new table and both tables containing only the inserts from ne_admin0_v3 ):

dataservices_db=> SELECT adm0_a3, name, rank FROM admin0_synonyms_orig EXCEPT SELECT adm0_a3, name, rank from admin0_synonyms;
 adm0_a3 |               name               | rank 
---------+----------------------------------+------
 SDS     | SDS                              |    3
 CCK     | CCK                              |    3
 BVT     | BVT                              |    3
 BES     | BES                              |    3
 PSX     | West Bank and Gaza               |    5
 ALD     | ALD                              |    3
 SDS     | Republic of South Sudan          |    5
 SAH     | Western Sahara                   |    9
 REU     | REU                              |    3
 SAH     | W. Sah.                          |    8
 SDS     | S. Sud.                          |    8
 GUF     | GUF                              |    3
 PSX     | PSX                              |    3
 GLP     | GLP                              |    3
 MTQ     | MTQ                              |    3
 FRH     | FRH                              |    3
 TKL     | TKL                              |    3
 SAH     | ESH                              |    0
 SDS     | SSD                              |    0
 SAH     | SAH                              |    3
 SJM     | SJM                              |    3
 PSX     | PSE                              |    0
 MYT     | MYT                              |    3
 ALD     | Åland Islands                    |    5
 CXR     | CXR                              |    3
 ALD     | ALA                              |    0
 SAH     | Sahrawi Arab Democratic Republic |    5
 SDS     | South Sudan                      |    9

the entries that are missing seem outdated or redundant anyway.

I was having a look at other files and I am not sure if we need to change them since the old codes are still around, only that with a lower ranking

@ethervoid
Copy link
Contributor

Ok, looks good. Check why the tests are failing and we can move on

@AleGoiko
Copy link
Author

restest this please

@AleGoiko
Copy link
Author

@ethervoid looks like it was a glitch 😅

@ethervoid
Copy link
Contributor

Great that fixes the underlying problem, we need to create a patch for the current data, as you can see here in order to avoid recreate all the adm0 data. What do you think?

@AleGoiko
Copy link
Author

ok so, here's where I stand right now with this:

I created the following tables:

  • admin0_synonyms_iso: using iso3 codes (admin0_synonyms but with iso codes)
  • admin0_synonyms_local: (made using the regular build_synonym_table.sql)
  • admin0_synonyms: loaded from a regular db_dump

the difference between admin0_synonyms and admin0_synonyms_iso is as follows(remove the .zip extension):
SELECT adm0_a3, name, rank FROM admin0_synonyms EXCEPT SELECT adm0_a3, name, rank from admin0_synonyms_iso;

It is mostly outdated or repetitive data except for lines such as:

ESP,Sp.,8

which seem to be missing from the new admin0_synonyms_iso table.

However, upon closer inspection of the build_synonym_table.sql this explains why it is missing:

INSERT INTO admin0_synonyms (name, rank,  adm0_a3) 
SELECT 
    abbrev, 8, adm0_a3
FROM
    ne_admin0_v3
WHERE
    char_length(regexp_replace(abbrev, '[^a-zA-Z\u00C0-\u00ff]+', '', 'g')) > 3;

for some reason the admin0_synonyms in the db_dump files does not seem to incluide this query, however the adm0_a3_local does include it as it follows build_synonym_table.sql. In this case the difference (between the iso table and the regular one created locally) the diff is as follows:
SELECT adm0_a3, name, rank FROM admin0_synonyms_local EXCEPT SELECT adm0_a3, name, rank from admin0_synonyms_iso;

 adm0_a3 |               name                | rank 
---------+-----------------------------------+------
 CCK     | CCK                               |    3
 CCK     | Cocos (Keeling) Islands           |    0
 BVT     | BVT                               |    3
 REU     | Reunion                           |    0
 MTQ     | Martinique                        |    0
 SAH     | Western Sahara                    |    9
 SDS     | S. Sudan                          |    0
 SDS     | S. Sud.                           |    8
 GUF     | GUF                               |    3
 SJM     | Svalbard and Jan Mayen            |    0
 MTQ     | MTQ                               |    3
 FRH     | FRH                               |    3
 ALD     | Aland                             |    0
 SAH     | ESH                               |    0
 FRH     | Corse                             |    0
 CXR     | CXR                               |    3
 SAH     | Sahrawi Arab Democratic Republic  |    5
 GLP     | Guadeloupe                        |    0
 SDS     | South Sudan                       |    9
 SDS     | SDS                               |    3
 BES     | BES                               |    3
 PSX     | West Bank and Gaza                |    5
 ALD     | ALD                               |    3
 SDS     | Republic of South Sudan           |    5
 REU     | REU                               |    3
 SAH     | W. Sah.                           |    8
 BES     | Bonaire (Sint Eustatius and Saba) |    0
 TKL     | Tokelau                           |    0
 PSX     | PSX                               |    3
 GLP     | GLP                               |    3
 GUF     | French Guiane                     |    0
 MYT     | Mayotte                           |    0
 CXR     | Christmas Island                  |    0
 BVT     | Bouvet Island                     |    0
 TKL     | TKL                               |    3
 SDS     | SSD                               |    0
 SAH     | SAH                               |    3
 SJM     | SJM                               |    3
 PSX     | PSE                               |    0
 MYT     | MYT                               |    3
 ALD     | Åland Islands                     |    5
 ALD     | ALA                               |    0
 PSX     | Palestine                         |    0
 SAH     | W. Sahara                         |    0

and it makes much more sense.

I am very lost at this point, not sure if the sql generated in admin0_synonyms_iso is a suitable candidate to replace the current admin0_synonyms in the dumps :(

@ethervoid
Copy link
Contributor

What does Rank mean? what is the main use of the rank column?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

South Sudan iso3 code is wrong

2 participants

Comments