Skip to content

Ashwin-Ravi/RedditWebScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Reddit Web Scrapper

Given a Subreddit URL, The script scrapes the content of the webpage. It picks the first 10 items. Subsequently Identifying the top 10 comments by points (including nested comments) for each. The script then identifies the high frequency (top 10) words for each item. Providing an approximation of the trending topics and words in Reddit.

Example:

Input: https://www.reddit.com/r/all/

Output: [ { "url": "https://i.imgflip.com/1kzbwx.jpg", "commentsUrl": "https://www.reddit.com/r/AdviceAnimals/comments/5y54am/my_uncle_is_an_awesome_boss/", "topComments": [ "The cashier is a young woman from Guatemala (legally immigrated here 4 years ago) (17, I think), she looked genuinely scared lifeless. My uncle drove her home to make sure that jackass didn't try anything and worked her shift.", "I like pancakes.", "Emoposer is a reference to goth episode of South Park.", ... (10 elements) ], "HighFrequencyWords": [ "cashier", "South Park", ... (10 elements) ] }, { ... }, ... (10 elements) ]

About

Given a Subreddit URL, The script scrapes the content of the webpage. It picks the first 10 items. Subsequently Identifying the top 10 comments by points (including nested comments) for each. The script then identifies the high frequency (top 10) words for each item. Providing an approximation of the trending topics and words in Reddit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages