Authors: Sneha Roy (Matriculation number : 50078578) Biswajit Palit (Matriculation Number : 50071214)
This project has been jointly conducted by us. The main idea of the project is to test the efficacy of different econometric methods in the context of difference in differences estimation when the treatment is in a staggered approach. We have tried to analyse the abilty of the different methods to provide us with a correctly sized test (Type I error of 5%, because the chosen significance level is 5%) and the power of the methods to detect real effects, in the context of a panel dataset. The different methods that we test are Ordinary Least Squares (OLS) , Cluster-Robust Standard Error (CRSE), Residual Aggregation, Wild-Cluster Bootstrapping and Feasible Generalised Least Squares (FGLS).
To this end, we have conducted extensive Monte Carlo Simulations with a variety of data generation processes such as Homogenous AR1, Heterogenous AR1 and Homogenous MA1 process across 50 states and 20 time periods. We also cross-validated the efficacy of these methods on a real world dataset of the Census Population Survey concerning the weekly earnings of women for the 50 states in USA across the years 1980 to 2000 inclusive.
Finally, we apply the best method that we identify ( which gives the consistent test size and highest power) to understand the treatment effect of WTO ascension on the agricultural imports of the Article XII members (the developing countries which ascended to WTO post 1995 in a staggered approach).
The detailed methodology of the procedure that we followed is given in the paper named "did.pdf" that generates automatically once the project is built. A presentation called "did_pres.pdf" contaning the highlights of our results will also be generated.
Our results are completely reproducible as we have accounted all avenues of randomness in the project. The seed 42 was used globally across all the functions.
We have kept the number of simulations for all the analyses to be 200, to restrict the run time below an hour. To run the analysis for more simulations, the argument 'num_simulations' can be updated accordingly in the data_info.yaml file.
We have included both the functions for error handling and the testing in the respective test folders.
To get started, create and activate the environment with
$ conda/mamba env create -f environment.yml
$ conda activate did
$ pip install kaleido==0.1.0.post1To build the project, type
$ pytaskTo test the functions, type
$ pytest-
SRC : The SRC folder containing all the codes, regarding building the project is structured into four subfolders.
-
data : This folder contains the CPS micro data and the Empirical Project raw data namely cps_00006.csv.gz and Trade_Indices_E_All_Data_NOFLAG.csv.gz respectively.
-
data_management :
- cps_clean_data.py : This file contains the functions to clean and aggregate the micro level CPS data.
- empirical_clean_data.py : This file contains the fucntions to clean and prepare the empirical dataset for the regression analysis.
- data_info.yaml : This file has all the necessary arguments defined to build the project in form of a dictionary.
- task_data_management.py : This file contains all the related tasks of this folder for execution.
-
monte_carlo_analysis :
- synthetic_data_gen.py : This file contains the synthetic data generation fucntions.
- subfunctions.py : This file contains all the subfunctions which are required to run the Monte Carlo simulation functions.
- homogenous_AR1.py : This file contains the Monte Carlo simulations for all the methods, for the data generation process of homogenous AR1.
- heterogenous_AR1.py : This file contains the Monte Carlo simulations for all the methods, for the data generation process of heterogenous AR1.
- homogenous_MA1.py : This file contains the Monte Carlo simulations for all the methods, for the data generation process of homogenous MA1.
- cps_monte_carlo.py : This file contains the Monte Carlo simulations for all the methods for the CPS dataset.
- empirical_regression.py : This file contains the regression analysis of the empirical dataset using the OLS and the CRSE method.
- task_analysis.py : This file contains all the related tasks of this folder for execution.
-
final :
- plot.py : This file contains the plot functions of both Type I Error and Power for all the data generation processes of all the methods.
- task_final.py : This file contains all the related tasks of this folder for execution.
-
config.py : This file contains all the necessary configurations and lists used for executing the tasks.
-
utilities.py : This file contains the function to read the .yaml files.
-
-
tests : The tests folder contains all the tests for the functions.
- analysis : This folder has 2 files namely test_data_gen.py and test_subfunctions.py
- data_management : This folder contains 2 files namely test_cps_clean_data.py and test_empirical_clean_data.py
-
paper : The paper folder contains the .tex files for generating the pdfs, namely did.tex , did_pres.tex and the task_paper.py which contains all the tasks for the generation of the Latex outputs.
-
Structure of the BLD folder which generates on running the project :
The BLD folders has 2 subfolders : latex and python.
- latex : It contains the paper and the presentation generated.
- python : It is structured as follows :
- cps : This folder contains the processed CPS data and all the results pertaining to the CPS dataset.
- empirical_study : This folder contains the processed emprical project data and all the results pertaining to the empirical project dataset.
- tables : It contains 3 subfolders for each data generation process and each of these folder contains the table outputs of the Monte Carlo Analysis for each method.
- figures : It contains 2 folders type_1 and power, each of which contains the Type I and the Power convergence plots for each data generation method.
This project was created with cookiecutter and the econ-project-templates.