From 427cfe7fadb3c6e5eccae81d0381a7968d0bec51 Mon Sep 17 00:00:00 2001 From: Lars Vilhuber Date: Wed, 17 Dec 2025 08:42:49 -0500 Subject: [PATCH 1/4] First draft to convert it --- INSTALL.md | 37 +++++++++++++++++++++++++++++++++++++ jpal_codebook.ado | 6 ++++++ jpal_codebook.pkg | 12 ++++++++++++ stata.toc | 5 +++++ 4 files changed, 60 insertions(+) create mode 100644 INSTALL.md create mode 100644 jpal_codebook.pkg create mode 100644 stata.toc diff --git a/INSTALL.md b/INSTALL.md new file mode 100644 index 0000000..257060b --- /dev/null +++ b/INSTALL.md @@ -0,0 +1,37 @@ +# Installation Instructions + +## Installing the Data Publication Codebook Package + +You can install this package directly from GitHub using Stata's `net install` command: + +```stata +net install jpal_codebook, from("https://raw.githubusercontent.com/J-PAL/Data_publication_codebook/main/") replace +``` + +Alternatively, if you have downloaded the files locally, navigate to the directory containing the package files and run: + +```stata +net install jpal_codebook, from("`c(pwd)'") replace +``` + +## Usage + +After installation, you can use the command: + +```stata +jp_codebook "path/to/your/data/directory" +``` + +This will create a codebook Excel file in your current working directory with two tabs: + +- "Variables": Information about each variable found in the .dta files +- "Value labels": Value label mappings for encoded variables + +## Files Included + +- `jpal_codebook.ado` - Main program file +- `jpal_codebook.do` - Alternative do-file version +- `jpal_codebook.pkg` - Package description file +- `stata.toc` - Table of contents file +- `README.md` - Documentation +- `LICENSE` - MIT License diff --git a/jpal_codebook.ado b/jpal_codebook.ado index 7890b56..f127a3a 100644 --- a/jpal_codebook.ado +++ b/jpal_codebook.ado @@ -1,5 +1,11 @@ +*! version 1.0.0 17dec2025 +*! Data Publication Codebook +*! Author: Jack Cavanagh +*! Create codebooks for directories containing Stata datasets + capture program drop jp_codebook program jp_codebook + version 13 syntax anything(name=search_directory id="path of directory to search") diff --git a/jpal_codebook.pkg b/jpal_codebook.pkg new file mode 100644 index 0000000..f42586a --- /dev/null +++ b/jpal_codebook.pkg @@ -0,0 +1,12 @@ +v 3 +d Data Publication Codebook +d Program to make a codebook for a directory that contains Stata datasets +d Distribution-Date: 20251217 +d +d Author: Jack Cavanagh +d Support: email +d +f jpal_codebook.ado +f jpal_codebook.do +f README.md +f LICENSE \ No newline at end of file diff --git a/stata.toc b/stata.toc new file mode 100644 index 0000000..2358c50 --- /dev/null +++ b/stata.toc @@ -0,0 +1,5 @@ +v 3 +d J-PAL Data Publication Tools +d This directory contains Stata packages developed by J-PAL +d +p jpal_codebook Data Publication Codebook \ No newline at end of file From c7df23e3bb2251c816ed9a5a7aa3be0aee83d1bd Mon Sep 17 00:00:00 2001 From: Lars Vilhuber Date: Wed, 17 Dec 2025 08:47:48 -0500 Subject: [PATCH 2/4] Streamlined --- README.md | 41 +++++++++- jpal_codebook.do | 192 ----------------------------------------------- 2 files changed, 37 insertions(+), 196 deletions(-) delete mode 100644 jpal_codebook.do diff --git a/README.md b/README.md index 1790e3e..e7dda47 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,49 @@ # Data_publication_codebook + Stata program to make a codebook for a directory that contains stata datasets. ## Contents: - This repo contains two main files: "jpal_codebook.do" and "jpal_codebook.ado." The outputs are identical, the difference is just in use. Adding the "ado" file to your personal ado library will make the command "jp_codebook" available to you to run. If you do not want to add the ado file, the do file can be run by itself. + + This repo a command `jpal_codebook` via [`jpal_codebook.ado`](jpal_codebook.ado). ## Use: - - ado: run the command "jp_codebook" followed by a string containing the path of the directory you would like to make a codebook for. - - do: insert a string containing the path of the directory you would like to make a codebook for in the space noted in the first line and then run the entire do-file. + - Install the ado file (you can remove it later if you don't want to keep it). + - run the command `jp_codebook path_to_dir`, where `path_to_dir` is a string containing the path of the directory you would like to make a codebook for. ## Output: - The program outputs an excel file containing a codebook to the current working directory. The excel file has two tabs: - "Variables" contains each distinct variable found in the .dta datasets in the specified folder, along with information about it, including but not limited to label, value label, # of distinct values, and mean, median, etc. for numeric variables - - "Value labels" contains the value labels used in encoded variables in the dataset, and maps their #s to the underlying values. + - "Value labels" contains the value labels used in encoded variables in the dataset, and maps their numbers to the underlying values. + +## Installation Instructions + +You can install this package directly from GitHub using Stata's `net install` command: + +```stata +global githubbase "https://raw.githubusercontent.com/" +net install jpal_codebook, from("$githubbase/J-PAL/Data_publication_codebook/main/") replace +``` + +Alternatively, if you have downloaded the files locally, navigate to the directory containing the package files and run: + +```stata +net install jpal_codebook, from("`c(pwd)'") replace +``` +## De-installation (optional) + +To uninstall the package, you can use the following command in Stata: + +```stata +ado uninstall jpal_codebook +``` + +## Files Included + +- `jpal_codebook.ado` - Main program file +- `jpal_codebook.do` - Alternative do-file version +- `jpal_codebook.pkg` - Package description file +- `stata.toc` - Table of contents file +- `README.md` - Documentation +- `LICENSE` - MIT License diff --git a/jpal_codebook.do b/jpal_codebook.do deleted file mode 100644 index 4328c1f..0000000 --- a/jpal_codebook.do +++ /dev/null @@ -1,192 +0,0 @@ -local search_directory = "" //INSERT THE PATH TO THE FOLDER YOU WOULD LIKE TO MAKE A CODEBOOK OF - - -************ Getting list of files to loop through *********************** - tempfile file_list - filelist, directory(`search_directory') pattern("*.dta") - gen temp="/" - egen file_path = concat(dirname temp filename) - keep file_path - save `file_list' - - qui count - local total_files = `r(N)' - forvalues i=1/`r(N)' { - local file_`i' = file_path[`i'] - } - - - -********** Opening output file ************* - capture file close cb - file open cb using codebook_temp.csv, write replace text - foreach header in "dataset" "name" "min" "max" "median" "mean" "sd"{ - file write cb _char(34) `"`header'"' _char(34) "," - } - file write cb _n - loc num_lab_tot = 0 - loc boo = "0" - forvalues i = 1/`total_files'{ - di "Using `file_`i''" - use "`file_`i''", clear - preserve - uselabel, clear - count - loc num_lab = `r(N)' - local num_lab_tot = `num_lab_tot' + `num_lab' - if `num_lab' == 0{ - if `i' == 1 | "`boo'" == "1"{ - local boo = "1" - } - restore - } - else{ - drop trunc - tempfile lab_temp - save "`lab_temp'", replace - restore - if `i' == 1 | "`boo'" == "1"{ - preserve - local boo = "0" - use "`lab_temp'", clear - tempfile labl - save "`labl'", replace - restore - } - else{ - preserve - use "`labl'", clear - append using "`lab_temp'" - duplicates drop - save "`labl'", replace - restore - } - } - *** Exporting vars that come with built-in describe command - preserve - describe, replace clear - drop if substr(name, 1, 2) == "__" - tempfile cb_temp - save "`cb_temp'", replace - restore - if `i' == 1{ - preserve - use "`cb_temp'", clear - tempfile cb - save "`cb'", replace - restore - } - else{ - preserve - use "`cb'", clear - append using "`cb_temp'" - tempfile cb - duplicates drop name, force - save "`cb'", replace - restore - } - foreach var of varlist * { - tempvar nm2 - qui egen `nm2' = total(!missing(`var')) - if `nm2' == 0{ - drop `var' - continue - } - drop `nm2' - ***First column: Dataset name - file write cb _char(34) `"`file_`i''"' _char(34) "," - ***Second column: Var name - file write cb _char(34) `"`var'"' _char(34) "," - capture decode `var', gen(_`var') - if _rc==0{ - drop `var' - ren _`var' `var' - } - capture confirm string var `var' - if _rc==0 { - forvalues iter = 1/5{ - file write cb _char(34) "N/A" _char(34) "," - } - } - else{ - qui: sum `var', det - local min = "`r(min)'" - file write cb _char(34) "`min'" _char(34) "," - local max = "`r(max)'" - file write cb _char(34) "`max'" _char(34) "," - local median = "`r(p50)'" - file write cb _char(34) "`median'" _char(34) "," - local mean = "`r(mean)'" - file write cb _char(34) "`mean'" _char(34) "," - local stdev = "`r(sd)'" - file write cb _char(34) "`stdev'" _char(34) "," - } - file write cb _n - } - di "File `i' done" - } - - file close cb - import delimited "codebook_temp.csv", varnames(1) clear - duplicates drop name, force - merge 1:1 name using "`cb'" - drop if _merge == 2 - drop _merge - drop position v8 - duplicates drop name, force - order varlab vallab isnumeric type format, after(name) - replace dataset = subinstr(dataset,"`search_directory'","",.) - - foreach x of varlist min-sd{ - qui: destring `x', force replace - qui: replace `x' = round(`x', .001) - } - sort dataset - qui: count - local var_n = `r(N)' + 1 - export excel using "codebook.xlsx", firstrow(variables) sheet("Variables") locale(C) replace - rm "codebook_temp.csv" - - if `num_lab_tot' >0 { - use "`labl'",clear - qui: count - local val_n = `r(N)' + 1 - export excel using "codebook.xlsx", firstrow(variables) sheet("Value labels") locale(C) - } - //Formatting - putexcel set codebook, modify sheet("Variables") - putexcel A1:L`var_n', overwritefmt border(all) - putexcel save - if `num_lab_tot' >0 { - putexcel set codebook, modify sheet("Value labels") - putexcel A1:K`val_n', overwritefmt border(right) - putexcel save - } - - mata:b = xl() - mata:b.load_book("codebook.xlsx") - mata:b.set_sheet("Variables") - mata:b.set_column_width(2,2,33) //make title column widest - mata:b.set_column_width(3,3,75) //make each column fit title - mata:b.close_book() - - if `num_lab_tot' >0 { - mata:b = xl() - mata:b.load_book("codebook.xlsx") - mata:b.set_sheet("Value labels") - mata:b.set_column_width(1,1,33) //make title column widest - mata:b.set_column_width(3,3,75) //make each column fit title - mata:b.close_book() - } - - - -di "" -di "---------------------------------------------------------------------" -di "" -di "Finished" -di "" -di "---------------------------------------------------------------------" -di "" - - From 74a6277c1d70463337b3fc45dd11eb2a64c78118 Mon Sep 17 00:00:00 2001 From: Lars Vilhuber Date: Wed, 17 Dec 2025 08:48:10 -0500 Subject: [PATCH 3/4] Removed redundant file --- INSTALL.md | 37 ------------------------------------- 1 file changed, 37 deletions(-) delete mode 100644 INSTALL.md diff --git a/INSTALL.md b/INSTALL.md deleted file mode 100644 index 257060b..0000000 --- a/INSTALL.md +++ /dev/null @@ -1,37 +0,0 @@ -# Installation Instructions - -## Installing the Data Publication Codebook Package - -You can install this package directly from GitHub using Stata's `net install` command: - -```stata -net install jpal_codebook, from("https://raw.githubusercontent.com/J-PAL/Data_publication_codebook/main/") replace -``` - -Alternatively, if you have downloaded the files locally, navigate to the directory containing the package files and run: - -```stata -net install jpal_codebook, from("`c(pwd)'") replace -``` - -## Usage - -After installation, you can use the command: - -```stata -jp_codebook "path/to/your/data/directory" -``` - -This will create a codebook Excel file in your current working directory with two tabs: - -- "Variables": Information about each variable found in the .dta files -- "Value labels": Value label mappings for encoded variables - -## Files Included - -- `jpal_codebook.ado` - Main program file -- `jpal_codebook.do` - Alternative do-file version -- `jpal_codebook.pkg` - Package description file -- `stata.toc` - Table of contents file -- `README.md` - Documentation -- `LICENSE` - MIT License From 8dd404b4032987c22855ae1a72a8d494d3cb8c97 Mon Sep 17 00:00:00 2001 From: Lars Vilhuber Date: Wed, 17 Dec 2025 08:48:51 -0500 Subject: [PATCH 4/4] Cleanup --- .DS_Store | Bin 6148 -> 0 bytes 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 .DS_Store diff --git a/.DS_Store b/.DS_Store deleted file mode 100644 index 5008ddfcf53c02e82d7eee2e57c38e5672ef89f6..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 6148 zcmeH~Jr2S!425mzP>H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3 zem<@ulZcFPQ@L2!n>{z**++&mCkOWA81W14cNZlEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ zLs35+`xjp>T0