- Download data from https://data.consumerfinance.gov/dataset/Consumer-Complaints/s6ew-h6mp
- Discard the rows where consumer complaint narrative is blank. How many rows does this yield?
- Draw a histogram of number of complaints by company name. What can you conclude about which institutions are causing the most complaints?
- Create a model to predict the product based on the consumer complaint narrative. What modeling techniques could you use? How accurate are your predictions for each field, and how did you evaluate this?
- Answer the questions in #4 for subproduct, issue-and sub-issue
• About the dataset: https://www.consumerfinance.gov/data-research/consumer- complaints/ • Extracting named entities from text http://www.nltk.org/book/ch07.html • Learning to classify text can be found at http://www.nltk.org/book/ch06.html