-
Notifications
You must be signed in to change notification settings - Fork 7
User Collection Pipeline
Robert Netzorg edited this page Apr 29, 2017
·
2 revisions
How to Run User Collection Pipeline:
-
Run
python repos_from_github.py- When you want to get a different range, change the stars range: ie stars:1000..1100 to stars:1100..1200
- Generates a list of repos in a file called repos.csv
-
Run
python contributors.py- From repos.csv, queries GitHub API for the contributors' list of each repo
-
Run
python getusers.py- Generates list of users in file users.txt
-
To get unique users, run
cat users.txt | sort -n | uniq > usersuniq.txt- Diff with old files (such as usersuniq_1.txt) to ensure no overlap