RedditWebScrapper/README.md at master · Ashwin-Ravi/RedditWebScrapper

Reddit Web Scrapper

Given a Subreddit URL, The script scrapes the content of the webpage. It picks the first 10 items. Subsequently Identifying the top 10 comments by points (including nested comments) for each. The script then identifies the high frequency (top 10) words for each item. Providing an approximation of the trending topics and words in Reddit.

Example:

Input: https://www.reddit.com/r/all/

Output: [ { "url": "https://i.imgflip.com/1kzbwx.jpg", "commentsUrl": "https://www.reddit.com/r/AdviceAnimals/comments/5y54am/my_uncle_is_an_awesome_boss/", "topComments": [ "The cashier is a young woman from Guatemala (legally immigrated here 4 years ago) (17, I think), she looked genuinely scared lifeless. My uncle drove her home to make sure that jackass didn't try anything and worked her shift.", "I like pancakes.", "Emoposer is a reference to goth episode of South Park.", ... (10 elements) ], "HighFrequencyWords": [ "cashier", "South Park", ... (10 elements) ] }, { ... }, ... (10 elements) ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls