This repository provides tools for researchers working with population well-being data. It contains standardized data processing scripts for multiple well-being related datasets, helping researchers clean and prepare data for analysis in a consistent manner.
This project is developed by the Population Well-being Lab at the University of Toronto, under the direction of Professor Felix Cheung.
A tool to generate customized data cleaning scripts for Gallup World Poll data.
- Clone/download this repository to your local machine
- Generate a cleaning script using the script generator 1_GallupWorldPoll_cleaningScript_generation.rmd
- Copy the generated script to your own project repository
- Run the cleaning script in your project to process your Gallup World Poll data
View detailed Gallup World Poll documentation
Tools to download and process World Bank indicators related to well-being and development.
- Clone/download this repository to your local machine
- Run or modify the download script 1_WorldBank_dataDownload.Rmd to customize which indicators you need
- Use the processed data in your research projects
View detailed World Bank documentation
This repository is designed to:
- Standardize data preparation across research projects
- Save time by providing ready-to-use data processing scripts
- Improve reproducibility by using consistent data cleaning approaches
- Enable customization while maintaining core processing standards
Researchers can:
- Use the scripts as-is for standard processing
- Customize parameters to fit specific research needs
- Contribute improvements or extensions to the processing scripts
- Request support for additional data sources
This project is licensed under the GNU General Public License v3.0 (GPL-3.0). This means:
- You can freely use, modify, and distribute this software
- Any derivative work must also be distributed under the same license (GPL-3.0)
- You must include the original copyright notice and license text
- There is no warranty for this software
For more details, see the full license text on the GNU website.
This is an ongoing project. We plan to add support for more well-being related datasets.
Want to contribute or request a new dataset? Please open an issue or submit a pull request.
- R and RStudio installed
- Required R packages:
dplyr,glue,rio - Extracted metadata file (already provided, you don't need to run the extraction script yourself)
-
Open
gallup_world_poll/scripts/1_GallupWorldPoll_cleaningScript_generation.rmdin RStudio/VSCode. -
Configure the parameters in the "User Parameters" section, e.g.:
- Set your project name
- Specify your data file name
- Select variables to extract
- Choose whether to include affect calculations
-
Run the script to generate your custom cleaning code
-
Find the generated script in the
gallup_world_poll/scripts/generated_scripts/folder. The script will be named according to your project name and the data file name you specified. -
Copy the generated script to your own project repository for further analysis.
For detailed instructions, parameter explanations, and troubleshooting information, please refer to the documentation within the RMD file.
Note: For an example of a generated cleaning script, see gallup_world_poll/scripts/generated_scripts/GWP_cleaningCode_Example_250414.Rmd
The script expects the following folder structure:
gallup_world_poll/
├── data/
│ ├── metadata/ # Contains variable metadata
│ │ └── Gallup_World_Poll_XXXXXX_Attributes.rds
│ └── raw/ # Place raw Gallup data files here if 0_GallupWorldPoll_attributes_extraction.rmd needs to be run
└── scripts/
├── 0_GallupWorldPoll_attributes_extraction.rmd # Script for extracting metadata (already executed, you don't need to run this)
├── 1_GallupWorldPoll_cleaningScript_generation.rmd # The generator script
├── generated_scripts/ # Will contain generated cleaning scripts
└── templates/ # Contains template files for script generation
├── affects_calculation_template.R
├── binary_conversion_template.R
├── cleaning_script_template.R
└── na_conversion_template.R
The generator produces an R markdown file with:
- Variable extraction code
- Non-substantive response conversion to NA
- Binary response recoding (Yes/No to 1/0)
- Optional affect indices calculation
- R and RStudio installed
- Required R packages:
WDI,tidyverse,countrycode
-
Open
world_bank/scripts/1_WorldBank_dataDownload.Rmdin RStudio/VSCode. -
Modify the indicators list if needed to include the specific World Bank indicators relevant to your research
-
Run the script to download and process the data
-
Find the processed data in the
world_bank/data/processed/folder
For detailed information on available indicators and customization options, please refer to the documentation within the RMD file.
For questions or support regarding this repository, please contact:
- Kenith Chan
- GitHub: ken1th