Skip to content

User Collection Pipeline

Robert Netzorg edited this page Apr 29, 2017 · 2 revisions

How to Run User Collection Pipeline:

  1. Run python repos_from_github.py

    • When you want to get a different range, change the stars range: ie stars:1000..1100 to stars:1100..1200
    • Generates a list of repos in a file called repos.csv
  2. Run python contributors.py

    • From repos.csv, queries GitHub API for the contributors' list of each repo
  3. Run python getusers.py

    • Generates list of users in file users.txt
  4. To get unique users, run cat users.txt | sort -n | uniq > usersuniq.txt

    • Diff with old files (such as usersuniq_1.txt) to ensure no overlap

Clone this wiki locally