-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Not sure if this is the right place for this Ben, but over the years I've collected (from pdfs of census city population tables) city populations for 1890 to 1940 for a decent set of cities. Of the 24711 city x year observations that I have data for and you have data for, only 1325 disagree. This is some combination of data entry errors on my part, data entry errors in Wikipedia, and just weird or bad merging of city names (I did this quick and dirty, so city names showing up multiple times in a state might be an issue). Plus a bunch of CT cities are listed as cities and towns in the raw pdf and I think I punched in a different row than what is on Wikipedia).
The list of disagreements is here: https://www.dropbox.com/s/w8wisqt27mir2hh/wiki_edits.csv?dl=0 and links to the raw pdfs are below. I wonder if we could get some interested (and compulsive) Wikipedia editors interested to correct as many of the Wikipedia city tables (some fraction of the 1325, but not sure what %). My understanding of this project is that any edits on Wikipedia will eventually flow through to this data, right?
Raw PDFs
1910 pdf with city populations: https://www.dropbox.com/s/4vfuwzkh3hmysfp/census_1910.pdf?dl=0
1930 and 1940 pdfs with city populations: https://www.dropbox.com/sh/ia56uz1bs13oaep/AADjHoxKJ1N3WS5vkNEGGoRla?dl=0