We have a small task for you for us. At Coya want to collect public data to assess the plausibility of claims. The data team has found the following dataset:
https://data.sfgov.org/City-Infrastructure/Case-Data-from-San-Francisco-311-SF311-/vw6y-z8j6
Create a document (or edit this one) telling us your ideas on the following:
As a general concept, how would you design a pipeline which extracts this data set? And how would you extend the pipeline to include further data sets from other data sources?
Please create a procedure to achieve the following: evaluate the daily trend on Damage Property Category in particular, and category distribution. What considerations should be taken regarding data quality?
Please include everything as commits in a git repository starting from this one. Don't worry too much about making it all nice or perfect, we'll discuss it later with you. Please send us back this repository as a git archive or a link to a git repository.
Good luck!