| title | GERDA Codebook | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| date | `r Sys.Date()` | ||||||||||
| variables |
|
||||||||||
| header-includes |
|
- General Notes
- Federal Elections (Municipality Level, Harmonized)
- State Elections (Municipality Level, Harmonized)
- Municipal Elections (Municipality Level, Harmonized)
- Mayoral Elections — Candidate-Level (
mayoral_candidates) - Mayor Panel (
mayor_panel/mayor_panel_harm) - Annual Mayor Panel (
mayor_panel_annual/mayor_panel_annual_harm) - European Elections (Municipality Level)
- Municipality Covariates (Area, Population, Employment)
- Identifier Structure (AGS): The primary geographic identifier is the
ags(Amtlicher Gemeindeschlüssel), representing municipalities. It is an 8-digit character string.- The first 2 digits represent the state (
statevariable). - The first 5 digits represent the county (
countyvariable). - The
agsoften includes a leading zero (e.g., "01...") which is crucial for correct interpretation and merging with other administrative data.
- The first 2 digits represent the state (
- Vote Shares: Variables representing party vote shares (e.g.,
cdu,spd,far_right) are proportions, typically ranging from 0 to 1, calculated based on the number of valid votes or total voters as specified in the dataset notes.
This codebook describes the main datasets provided in the GERDA project.
File: data/federal_elections/municipality_level/final/federal_muni_harm_21.rds or data/federal_elections/municipality_level/final/federal_muni_harm_21.csv
This dataset contains federal election results from 1990 to 2025 at the municipality level, harmonized to 2021 administrative boundaries.
| Variable | Type | Description |
|---|---|---|
ags |
character | Municipality identifier (Amtlicher Gemeindeschlüssel). |
election_year |
numeric | Year of the federal election. |
state |
character | State identifier (numeric code). |
county |
character | County identifier (numeric code). |
eligible_voters |
numeric | Number of eligible voters (harmonized). Derived from eligible_voters_orig after potential adjustments for mail-in voting distribution. |
number_voters |
numeric | Number of voters (harmonized). Derived from number_voters_orig after potential adjustments for mail-in voting distribution. |
valid_votes |
numeric | Number of valid votes cast (harmonized). |
turnout |
numeric | Voter turnout. Calculated as number_voters / eligible_voters_orig. Recalculated as number_voters_orig / eligible_voters_orig if > 1, then capped at 1. See 01_federal_muni_unharm.R. |
turnout_wo_mailin |
numeric | Voter turnout based on original (pre-mail-in distribution) counts. Calculated as number_voters_orig / eligible_voters_orig and capped at 1. See 01_federal_muni_unharm.R. |
cdu |
numeric | Vote share for CDU (Christlich Demokratische Union). Calculated as votes / number_voters. |
csu |
numeric | Vote share for CSU (Christlich-Soziale Union in Bayern). Calculated as votes / number_voters. |
spd |
numeric | Vote share for SPD (Sozialdemokratische Partei Deutschlands). Calculated as votes / number_voters. |
gruene |
numeric | Vote share for GRÜNE (Bündnis 90/Die Grünen). Includes votes for B90/GR from 1990. Calculated as votes / number_voters. |
fdp |
numeric | Vote share for FDP (Freie Demokratische Partei). Calculated as votes / number_voters. |
linke_pds |
numeric | Vote share for Die Linke or predecessor PDS. Calculated as votes / number_voters. |
afd |
numeric | Vote share for AfD (Alternative für Deutschland). Calculated as votes / number_voters. |
npd |
numeric | Vote share for NPD (Nationaldemokratische Partei Deutschlands). Calculated as votes / number_voters. |
rep |
numeric | Vote share for REP (Die Republikaner). Calculated as votes / number_voters. |
dvu |
numeric | Vote share for DVU (Deutsche Volksunion). Calculated as votes / number_voters. |
| ... (many other parties) | numeric | Vote share for various smaller parties. Calculated as votes / number_voters. |
cdu_csu |
numeric | Combined vote share for CDU/CSU. Calculated as votes / number_voters. |
far_right |
numeric | Aggregated vote share for designated far-right parties. Sum of shares for: afd, npd, rep, die rechte, dvu, iii. weg, fap, ddd, dsu. See 00_federal_muni_raw.R. |
far_left |
numeric | Aggregated vote share for designated far-left parties. Sum of shares for: dkp, kpd, mlpd, sgp, psg, kbw, v, spad, bsa. See 00_federal_muni_raw.R. |
far_left_w_linke |
numeric | Aggregated vote share for designated far-left parties including Linke/PDS. Sum of shares for: far_left, die linke, pds. See 00_federal_muni_raw.R. |
area |
character | Area of the municipality (km²). Sourced from official Gemeindeverzeichnis files, originally numeric but type may change during processing. |
population |
numeric | Population of the municipality (thousands). Scaled by dividing raw counts by 1000 during harmonization (02_federal_muni_harm_21.R). Sourced from official Gemeindeverzeichnis files. |
ags_21 |
numeric | Municipality identifier harmonized to 2021 boundaries, based on crosswalk merge in 02_federal_muni_harm_21.R. |
flag_naive_turnout_above_1 |
numeric | Flag (1/0) indicating if the initial turnout calculation (number_voters / eligible_voters_orig) resulted in a value > 1. See 01_federal_muni_unharm.R. |
flag_unsuccessful_naive_merge |
numeric | Flag (1/0) indicating if the initial merge between election data and crosswalk data failed and required alternative years/AGS for matching. See 02_federal_muni_harm_21.R. |
| ... (other flag variables) | numeric | Various flags indicating data properties or processing steps (e.g., related to mail-in voting). |
Notes:
- Vote shares are proportions of
number_voters(the potentially adjusted voter count). - Harmonization refers to adjustments made to account for municipal boundary changes over time, mapping results onto consistent 2021 municipality definitions using population or area weights from
ags_crosswalks.csv. - The variables
eligible_voters_origandnumber_voters_orig(present in intermediate steps, e.g.,01_federal_muni_unharm.R) represent counts before mail-in vote distribution adjustments. areaandpopulationare sourced from official municipality registers (Gemeindeverzeichnisse);areais km²,populationis scaled to thousands.
File: data/state_elections/final/state_unharm.rds (or .csv)
This dataset contains state election results from 1946 to 2024 at the municipality level, using each election year's original administrative boundaries. Covers all 16 German states with 126,989 rows and 439 columns (426 individual party columns).
| Variable | Type | Description |
|---|---|---|
ags |
character | Municipality identifier (8-digit AGS), using that year's boundaries. |
election_year |
numeric | Year of the state election. |
state |
character | State identifier (2-digit code, e.g., "01"=SH, "09"=BY). |
election_date |
Date | Date of the election. |
eligible_voters |
numeric | Number of eligible voters. NA for BY 1994–2013 (not in source) and HE 1958/62 non-kreisfreie municipalities. |
number_voters |
numeric | Number of voters (people who cast ballots). NA for HE 1958/62 (source gap). For BY: in-person + Briefwahl voters (each casts 2 ballots). |
valid_votes |
numeric | Number of valid votes. For BY 1950+: Gesamtstimmen (Erst+Zweit combined), so valid_votes ≈ 2 × number_voters. |
invalid_votes |
numeric | Number of invalid votes. For BY 1950+: number_voters × 2 - valid_votes. Clamped to ≥ 0. |
turnout |
numeric | Voter turnout (number_voters / eligible_voters). Capped at 1.5; values > 1.5 set to NA. NA where eligible_voters or number_voters missing. |
cdu, csu, spd, ... |
numeric | Vote share for each party (proportion of valid_votes, range 0–1). 426 individual party columns. NA means the party did not run in that state-year (zero-vote → NA recoding applied). |
other |
numeric | Residual vote share: max(0, 1 - sum(all named parties)). |
cdu_csu |
numeric | Combined CDU/CSU share. Equals csu in BY, cdu elsewhere. |
flag_naive_turnout_above_1 |
numeric | Flag (1/0): turnout > 1 before capping. Indicates Briefwahl allocation rounding or data quality issues. |
Notes:
- Bayern (BY) uses Gesamtstimmen (Erst+Zweitstimme combined). Both ballots count equally for proportional seat allocation and the 5% threshold. Party vote shares are proportions of Gesamtstimmen. The identity
valid_votes + invalid_votes = number_voters × 2holds for BY 1950+. The 1946 election was single-ballot. - Hamburg (HH) 2011+ and Bremen (HB) 2011+ use a 5-vote personalized-list system (Kumulieren/Panaschieren): each voter casts 5 Landesstimmen, which can be cumulated on one candidate or split across candidates and lists. GERDA reports party shares as proportions of cast Landesstimmen, so
valid_votes ≈ 5 × number_voters. Shares sum to 1 within a municipality and are comparable across HH (or HB) municipalities, but the per-voter denominator differs from single-ballot states. Earlier HH/HB elections used a single-vote system. - NRW 1947–1970: County-level only (synthetic AGS
050xx000). Cannot be harmonized; only in unharm. - HH 1982: Two elections (June + December) — both rows present, distinguished by
election_date.
Files: data/state_elections/final/state_harm_21.rds, state_harm_23.rds, state_harm_25.rds (or .csv)
State election results from 1990 to 2024 harmonized to fixed administrative boundaries (2021, 2023, or 2025) using population-weighted crosswalks. 67,393–67,613 rows, 451 columns.
| Variable | Type | Description |
|---|---|---|
ags |
character | Municipality identifier, mapped to the target year's boundaries (ags_21, ags_23, or ags_25). |
election_year |
numeric | Year of the state election. |
state |
character | State identifier (2-digit code). |
state_name |
character | Name of the state holding the election. |
election_date |
Date | Date of the election. |
eligible_voters |
numeric | Number of eligible voters (harmonized via population-weighted crosswalk). |
number_voters |
numeric | Number of voters (harmonized). |
valid_votes |
numeric | Number of valid votes cast (harmonized). |
invalid_votes |
numeric | Number of invalid votes (harmonized). |
turnout |
numeric | Voter turnout (number_voters / eligible_voters). |
cdu, csu, spd, ... |
numeric | Vote share for each party (proportion of valid_votes). 426 individual party columns. NA = party did not participate in that state-year. |
other |
numeric | Residual vote share for unlisted parties. |
cdu_csu |
numeric | Combined CDU/CSU share. Equals csu in BY, cdu elsewhere. |
far_right |
numeric | Aggregated far-right vote share: sum of afd, npd, rep, die_rechte, dvu, iii_weg, fap, ddd, dsu, die_heimat_heimat, die_republikaner_rep. |
far_left |
numeric | Aggregated far-left vote share: sum of dkp, kpd, mlpd, sgp, psg, kbw. |
far_left_w_linke |
numeric | Far-left including Linke/PDS: far_left + linke_pds + pds. |
total_vote_share |
numeric | Sum of all individual party vote shares (excluding derived columns). Quality check — ideally = 1.0. |
perc_total_votes_incogruence |
numeric | Continuous deviation: total_vote_share - 1. Positive = shares sum to > 1. |
flag_total_votes_incongruent |
numeric | Flag (1/0): total_vote_share outside [0.999, 1.001]. |
flag_unsuccessful_naive_merge |
numeric | Flag (1/0): initial crosswalk merge failed (recovered via fuzzy time matching or self-mapping). |
ags_name_21 |
character | Municipality name (2021 boundaries). Only in state_harm_21. |
area_ags |
numeric | Municipality area in km². |
population_ags |
numeric | Municipality population (thousands). |
employees_ags |
numeric | Number of employees subject to social insurance contributions. |
pop_density_ags |
numeric | Population density (persons per km²). |
Notes:
- Vote shares are proportions of
valid_votes. - Harmonization uses population-weighted crosswalks from
data/crosswalks/. Municipalities that merged are combined; municipalities that split have votes distributed proportionally by population. - Zero-vote → NA recoding: parties that received zero votes across all municipalities in a state-year are recoded from 0 to NA (distinguishes "did not participate" from "ran but got 0 votes").
- BY Gesamtstimmen:
valid_votes ≈ 2 × number_votersfor BY. See unharmonized notes above. - See
docs/state_pipeline_audit.mdfor detailed data quality documentation.
File: data/municipal_elections/final/municipal_harm.rds or data/municipal_elections/final/municipal_harm.csv
This dataset contains municipal council (Stadtrat/Gemeinderat) election results since 1990 at the municipality level, harmonized to 2021 administrative boundaries.
| Variable | Type | Description |
|---|---|---|
ags |
character | Municipality identifier (Amtlicher Gemeindeschlüssel). |
ags_name |
character | Name of the municipality. |
election_year |
numeric | Year of the municipal election. |
state |
character | State identifier (numeric code). |
county |
character | County identifier (numeric code). Derived from substr(ags, 1, 5) in 02_municipal_harm.R. |
eligible_voters |
numeric | Number of eligible voters (harmonized). Source: Wahlberechtigteinsgesamt from various raw files. |
number_voters |
numeric | Number of voters (harmonized). Source: Wähler from various raw files. |
valid_votes |
numeric | Number of valid votes cast (harmonized). Source: GültigeStimmen from various raw files. |
turnout |
numeric | Voter turnout. Calculated as number_voters / eligible_voters (Wähler / Wahlberechtigteinsgesamt) in 01_municipal_unharm.R. |
cdu_csu |
numeric | Vote share for CDU/CSU. |
spd |
numeric | Vote share for SPD. |
linke_pds |
numeric | Vote share for Die Linke (or predecessor PDS). |
gruene |
numeric | Vote share for GRÜNE. |
afd |
numeric | Vote share for AfD. |
piraten |
numeric | Vote share for Piratenpartei Deutschland. |
fdp |
numeric | Vote share for FDP. |
die_partei |
numeric | Vote share for Die PARTEI. |
flag_unsuccessful_naive_merge |
numeric | Flag (1/0) indicating if the initial merge with crosswalk data failed. See 02_municipal_harm.R. |
area |
numeric | Area of the municipality (km²). Sourced/calculated via crosswalk file (ags_crosswalks.csv) from official Gemeindeverzeichnis data. |
population |
numeric | Population of the municipality (thousands). Sourced/calculated/scaled via crosswalk file (ags_crosswalks.csv) from official Gemeindeverzeichnis data. |
Notes:
- Vote shares are proportions of
valid_votes(calculated asprop_*variables in01_municipal_unharm.R). areaandpopulationare sourced from official municipality registers (Gemeindeverzeichnisse);areais km²,populationis scaled to thousands.- Kumulieren and Panaschieren. Municipal council elections in Baden-Württemberg, Bayern, Hessen, Rheinland-Pfalz, Mecklenburg-Vorpommern, Schleswig-Holstein, Saarland, Sachsen, Sachsen-Anhalt, Thüringen, Brandenburg, Bremen and Niedersachsen allow voters to cumulate votes on a single candidate (Kumulieren) and to split votes across lists (Panaschieren). Each voter casts multiple votes equal to the number of council seats, so
valid_votescounts cast individual votes (Stimmen), not ballots, and the ratiovalid_votes / number_votersreflects seat-count × cumulation behavior rather than a ballot count. Party shares (cdu_csu,spd, …) are proportions of these cast individual votes and sum to 1 within a municipality. They are directly comparable across municipalities within a state but not to single-vote systems (NRW is the main exception, where each voter casts one list vote). State-specific rules (e.g., up-to-3 cumulation in BW vs. up-to-5 in HE/RP) affect the realizedvalid_votes / number_votersratio but not the share interpretation.
File: data/european_elections/final/european_muni_unharm.rds (or .csv)
This dataset contains European Parliament election results from 2009 to 2024 at the municipality level, using each election year's original administrative boundaries. 44,722 rows × 87 columns (71 party columns).
| Variable | Type | Description |
|---|---|---|
ags |
character | Municipality identifier (8-digit AGS), using that year's boundaries. |
county |
character | County identifier (5-digit code). |
state |
character | State identifier (2-digit code). |
state_name |
character | State name in English. |
election_year |
integer | Year of the European election (2009, 2014, 2019, 2024). |
election_date |
Date | Date of the election. |
eligible_voters |
numeric | Number of eligible voters. |
number_voters |
numeric | Number of voters (including invalid ballots). Includes allocated mail-in votes. |
valid_votes |
numeric | Number of valid votes cast. |
invalid_votes |
numeric | Number of invalid votes cast. |
voters_wo_sperrvermerk |
numeric | Eligible voters without Sperrvermerk (A1). |
voters_w_sperrvermerk |
numeric | Eligible voters with Sperrvermerk (A2, EU citizens). |
voters_par24_2 |
numeric | Voters registered under § 24(2) EuWO (A3, Germans abroad). |
voters_w_wahlschein |
numeric | Voters with Wahlschein (absentee ballot certificate, B1). |
turnout |
numeric | Voter turnout (number_voters / eligible_voters). Capped at 1. |
cdu, spd, gruene, ... |
numeric | Vote share for each party (proportion of number_voters, range 0–1). 71 party columns across all 4 elections. 0 means the party ran but received no votes; parties that did not run in a given year are also 0. |
flag_turnout_above_1 |
integer | Flag (1/0): turnout exceeded 1 before capping (Briefwahl allocation rounding artifact). |
Notes:
- Vote shares use
number_votersas denominator, consistent with the federal pipeline. Party shares sum to approximatelyvalid_votes / number_voters(< 1.0 due to invalid votes). - Berlin appears as 14 Bezirke rows per year (not aggregated in the unharm file).
- Mail-in votes from shared Briefwahl districts (Ämter/Verbandsgemeinden) are allocated proportionally by eligible voters within each
(county, BWBez)group.
File: data/european_elections/final/european_muni_harm.rds (or .csv)
This dataset contains the same European election results harmonized to 2021 municipality boundaries. 42,986 rows × 90 columns.
All variables from the unharmonized version, plus:
| Variable | Type | Description |
|---|---|---|
flag_unsuccessful_naive_merge |
integer | Flag (1/0): initial crosswalk merge failed; resolved via year-1 fallback, identity code, or manual mapping. |
flag_aggregated |
integer | Flag (1/0): municipality was aggregated from multiple predecessor municipalities. |
n_predecessors |
integer | Number of predecessor municipalities that were merged into this 2021 boundary. |
Notes:
- Berlin is aggregated to a single row (AGS 11000000) per year.
- Harmonization converts shares → counts, applies population-weighted crosswalk aggregation, then recomputes shares.
- Crosswalk year mapping: 2009→2009, 2014→2014, 2019→2019, 2024→2020.
File: data/covars_municipality/final/ags_area_pop_emp.rds or data/covars_municipality/final/ags_area_pop_emp.csv
This dataset provides yearly time-series data for basic municipality characteristics, harmonized to 2021 administrative boundaries. It is generated by code/crosswalks/01_ags_crosswalk.R.
| Variable | Type | Description |
|---|---|---|
ags_21 |
numeric | Municipality identifier, harmonized to 2021 boundaries. |
ags_name_21 |
character | Name of the municipality (2021 definition). |
year |
numeric | Year of observation. |
area_ags |
numeric | Area of the municipality (km²). Sourced from official Gemeindeverzeichnis files. |
population_ags |
numeric | Population of the municipality (thousands). Scaled during crosswalk processing (01_ags_crosswalk.R). Sourced from official Gemeindeverzeichnis files. |
employees_ags |
numeric | Number of employees subject to social security contributions (thousands). Scaled during crosswalk processing (01_ags_crosswalk.R). Data available from 1997 onwards. |
pop_density_ags |
numeric | Population density (per km²). Calculated as (population_ags * 1000) / area_ags in 01_ags_crosswalk.R. |
Notes:
- The dataset is a panel covering years from 1990 onwards.
- Area, population, and employee data originate from official Gemeindeverzeichnis sources provided by BBSR.
File: data/mayoral_elections/final/mayoral_candidates.rds or data/mayoral_elections/final/mayoral_candidates.csv
This dataset contains candidate-level results for mayoral elections in 7 German states (Bayern, Niedersachsen, Nordrhein-Westfalen, Rheinland-Pfalz, Saarland, Sachsen, Schleswig-Holstein), 1945--2025. One row per candidate per election cycle (wide format, with both Hauptwahl and Stichwahl results in columns). Includes predicted candidate characteristics (gender, migration background).
| Variable | Type | Description |
|---|---|---|
ags |
character | Municipality identifier (8-digit AGS), original boundaries. |
ags_name |
character | Municipality name. |
state |
character | State identifier (2-digit code). |
state_name |
character | State name. |
election_year |
numeric | Year of the election cycle. |
election_date |
Date | Hauptwahl (first round) date. |
election_date_sw |
Date | Stichwahl (runoff) date. NA if no runoff. |
election_type |
character | Type of election (Buergermeisterwahl, Oberbuergermeisterwahl, Landratswahl, VG-Buergermeisterwahl, SG-Buergermeisterwahl). |
has_stichwahl |
logical | TRUE if this election went to a runoff. |
eligible_voters |
numeric | Number of eligible voters (Hauptwahl). NA for RLP (percentage-only data). |
number_voters |
numeric | Number of voters (Hauptwahl). NA for RLP. |
valid_votes |
numeric | Number of valid votes (Hauptwahl). NA for RLP. |
invalid_votes |
numeric | Number of invalid votes (Hauptwahl). NA for RLP. |
turnout |
numeric | Hauptwahl turnout as proportion (0--1). |
turnout_sw |
numeric | Stichwahl turnout. NA if no runoff. |
candidate_name |
character | Full candidate name. NA for Bayern (source data has no names). |
candidate_last_name |
character | Last name. |
candidate_first_name |
character | First name. |
candidate_gender |
character | Gender: "m" (male) or "w" (female). From raw data (RLP, SL) or predicted via gender-guesser. |
candidate_party |
character | Party affiliation or label (e.g., CSU, SPD, Parteilos, EB). |
candidate_votes_hw |
numeric | Hauptwahl vote count. NA for RLP. |
candidate_voteshare_hw |
numeric | Hauptwahl vote share (0--1). |
candidate_rank_hw |
numeric | Hauptwahl rank by votes (1 = most votes). |
n_candidates_hw |
numeric | Number of candidates in the Hauptwahl. |
candidate_votes_sw |
numeric | Stichwahl vote count. NA if not in runoff. |
candidate_voteshare_sw |
numeric | Stichwahl vote share. NA if not in runoff. |
candidate_rank_sw |
numeric | Stichwahl rank. NA if not in runoff. |
n_candidates_sw |
numeric | Number of candidates in the Stichwahl. |
is_winner |
numeric | 1 if the candidate won the election (HW outright or SW), 0 otherwise. |
candidate_birth_year |
numeric | Birth year. NI only. |
candidate_profession |
character | Profession. NI only. |
office_type |
character | Office type. BY and SL only. |
candidate_gender_source |
character | Gender data source: "raw" (from election authority data) or "predicted" (from name classification). |
candidate_gender_method |
character | Classification method: raw, full_de (Germany-specific match), full_global, hyphen_first_de, hyphen_first_global, accent_norm_global, manual. |
candidate_gender_prob |
numeric | Confidence score for gender classification (0--1). 1.0 for raw data; 0.99 for full_de/manual; 0.95 for hyphen_first_de; 0.90 for global matches. |
candidate_name_origin |
character | Predicted name origin: "german", "turkish", "arabic", "eastern_european", "southern_european". |
candidate_name_origin_conf |
numeric | Confidence in origin classification (0.50--0.95). |
candidate_name_origin_method |
character | Detection method: "combined" (first+last match), "surname_match", "firstname_match", "surname_pattern", "default". |
candidate_migration_bg |
integer | Binary migration background: 0 = German-origin name, 1 = likely non-German origin name. |
candidate_migration_bg_prob |
numeric | Probability of migration background (continuous, 0--1). |
candidate_local_surname |
integer | Placeholder (NA). Local surname rootedness — awaiting telephone directory data. |
candidate_surname_county_share |
numeric | Placeholder (NA). Share of surname occurrences in the focal county. |
candidate_surname_n_counties |
integer | Placeholder (NA). Number of counties where this surname appears. |
candidate_surname_overrep_ratio |
numeric | Placeholder (NA). Ratio of observed/expected surname frequency in focal county. |
Notes:
- Gender classification uses the Python
gender-guesserpackage (Jorg Michael'snam_dict.txt, ~70,000 names with country-specific gender codes). Raw gender data from RLP and SL takes precedence over predictions. The lookup is pre-computed bycode/mayoral_elections/04a_build_gender_lookup.py. Coverage: 100% of named candidates. Cross-validation accuracy: 99.79% against RLP raw data (F1 = 0.989), 100% against SL raw data. - Migration background is a probabilistic estimate based on name patterns (Turkish, Arabic, Eastern European, Southern European surname/firstname lists and regex patterns). It should not be interpreted as verified migration status. Coverage: all candidates with last names (14,859). Low-confidence classifications (conf < 0.80) are predominantly Eastern European surname endings.
- Local surname rootedness columns are placeholders populated with NA values. Implementation is blocked on telephone directory data.
- Bayern has no candidate names in the source data, so gender, migration background, and local surname columns are all NA for Bayern rows.
Files: data/mayoral_elections/final/mayor_panel.rds (or .csv), data/mayoral_elections/final/mayor_panel_harm.rds
One row per person per election. Tracks individual mayors across multiple terms using unique person IDs. The _harm version maps AGS codes to 2021 municipal boundaries. Includes the same candidate characteristics columns as mayoral_candidates (gender, migration background), carried forward from the winning candidate.
| Variable | Type | Description |
|---|---|---|
person_id |
character | Unique mayor identifier (e.g., p_09_00001 for Bayern, p_05_00001 for NRW). |
ags |
character | Municipality identifier (8-digit AGS), original boundaries. |
ags_21 |
character | Municipality identifier mapped to 2021 boundaries (_harm only). |
state |
character | State identifier (2-digit code). |
election_year |
numeric | Year of the election. |
election_date |
Date | Date of the decisive round (Stichwahl date if applicable). |
term_number |
numeric | Sequential term count within (person, municipality), starting at 1. |
consecutive_terms |
numeric | Number of consecutive terms (resets if gap > 1 election cycle). |
winner_party |
character | Party of the winning candidate. |
winner_voteshare |
numeric | Vote share in the decisive round (0--1). |
winning_margin |
numeric | Vote share difference between winner and runner-up (0--1). |
margin_change |
numeric | Change in winning margin from previous election. |
n_candidates |
numeric | Number of candidates in the election. |
is_incumbent |
numeric | 1 if term_number >= 2, else 0. |
next_runs_again |
numeric | 1 if this person wins the next election, 0 if different person wins, NA if no subsequent election. |
party_switch |
numeric | 1 if the winning party changed from the previous election. |
is_new_party_mayor |
numeric | 1 if this is the first time this party wins in this municipality. |
tenure_start |
numeric | Year of the mayor's first election in this municipality. |
years_in_office |
numeric | election_year - tenure_start. |
term_start_date |
Date | Date of first taking office (Bayern: Amtsantritt; others: first election date). |
n_terms |
numeric | Total number of terms observed for this person. |
total_tenure_years |
numeric | Year span from first to last election. |
has_margin_variation |
logical | Whether winning margin varies across this person's terms (useful for FE feasibility). |
candidate_gender |
character | Mayor's gender: "m" / "w". From raw data or predicted. NA for Bayern. |
candidate_gender_source |
character | "raw" or "predicted". NA for Bayern. |
candidate_gender_prob |
numeric | Confidence score (0--1). See mayoral_candidates section for details. |
candidate_gender_method |
character | Classification method. See mayoral_candidates section for details. |
candidate_migration_bg |
integer | Binary migration background (0/1). NA for Bayern. |
candidate_migration_bg_prob |
numeric | Probability of migration background (0--1). |
candidate_name_origin |
character | Fine-grained name origin category. |
candidate_name_origin_conf |
numeric | Confidence in origin classification (0.50--0.95). |
candidate_name_origin_method |
character | Detection method for origin classification. |
Coverage: 14,452 unique mayors (unharm) / 13,971 (harm), spanning 34,495 / 33,319 person-elections. Candidate characteristics available for 3,089 person-elections (non-Bayern states).
Files: data/mayoral_elections/final/mayor_panel_annual.rds (or .csv), data/mayoral_elections/final/mayor_panel_annual_harm.rds
One row per mayor per year. Forward-fills election-level data across the mayor's term. The _harm version maps AGS codes to 2021 boundaries. Candidate characteristics (gender, migration background) are constant within each mayor-term.
| Variable | Type | Description |
|---|---|---|
ags |
character | Municipality identifier (8-digit AGS), original boundaries. |
ags_21 |
character | Municipality identifier mapped to 2021 boundaries (_harm only). |
year |
numeric | Calendar year. |
person_id |
character | Unique mayor identifier. |
state |
character | State identifier (2-digit code). |
election_year |
numeric | Year of the election that started this term. |
election_date |
Date | Date of the decisive round. |
term_number |
numeric | Term count within (person, municipality). |
winner_party |
character | Party of the mayor (constant within term). |
winner_voteshare |
numeric | Vote share in the decisive round (constant within term). |
winning_margin |
numeric | Winner-runner-up margin (constant within term). |
n_candidates |
numeric | Number of candidates (constant within term). |
is_incumbent |
numeric | 1 if term_number >= 2. |
next_runs_again |
numeric | Whether this person wins the next election. |
years_since_election |
numeric | year - election_year. |
years_to_next_election |
numeric | Years until the next election in this municipality (NA if unknown). |
electoral_cycle_pos |
numeric | Position in the electoral cycle, 0 (election year) to <1 (year before next election). |
tenure_start |
numeric | Year of first election. |
term_start_date |
Date | Date of first taking office. |
candidate_gender |
character | Mayor's gender (constant within term). NA for Bayern. |
candidate_gender_source |
character | "raw" or "predicted". |
candidate_gender_prob |
numeric | Confidence score (0--1). |
candidate_gender_method |
character | Classification method. |
candidate_migration_bg |
integer | Binary migration background (0/1, constant within term). |
candidate_migration_bg_prob |
numeric | Probability of migration background (0--1). |
candidate_name_origin |
character | Fine-grained name origin category. |
candidate_name_origin_conf |
numeric | Confidence in origin classification. |
candidate_name_origin_method |
character | Detection method for origin classification. |
Coverage: 185,112 person-years (unharm) / 179,011 (harm), years 1945--2025.
The database is work in progress. If you have any suggestions, comments, or issues, please feel free to email us or to file an issue.
Please cite the accompanying paper when using this dataset:
Heddesheimer, Vincent, Hanno Hilbig, Florian Sichart, & Andreas Wiedemann. 2025. GERDA: German Election Database. Nature: Scientific Data, 12: 618.
@article{Heddesheimer2025GERDA,
author = {Vincent Heddesheimer and Hanno Hilbig and Florian Sichart and Andreas Wiedemann},
doi = {10.1038/s41597-025-04811-5},
issn = {2052-4463},
issue = {1},
journal = {Scientific Data},
month = {4},
pages = {618},
title = {GERDA: The German Election Database},
volume = {12},
url = {https://www.nature.com/articles/s41597-025-04811-5},
year = {2025}
}