Skip to content

anouksha27/TechWatchProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 

Repository files navigation

Tech Watch News Aggregator & Summarizer

This project is an automated Python script designed to scrape, summarize, and archive news articles from specific online sources related to a company and its key topics. It provides a comprehensive solution for generating daily reports without manual work.

Features โœจ

  • Targeted Scraping: Utilizes the SerpAPI to find relevant news articles from a predefined list of sources and topics.
  • AI-Powered Summaries: Integrates the Google Gemini API to generate a concise summary for each news article, highlighting the most important information.
  • Dual Output: With each run, the script creates a new, timestamped Google Sheet and a text file containing all the summarized news.
  • Automated Cloud Storage: The generated Google Sheet and the summary text file are automatically saved to a specified folder in your Google Drive.

Prerequisites ๐Ÿ› ๏ธ

Before running this script, you will need the following:

  1. Python 3.x installed on your system.
  2. A Google Cloud Project with the following APIs enabled:
    • Google Sheets API
    • Google Drive API
    • Google Gemini API
  3. A SerpAPI API key.
  4. A Service Account with access to your Google Cloud Project. You will need to download the JSON key file for this account.
  5. A designated folder on your Google Drive to save the output files.

Setup and Installation ๐Ÿš€

  1. Clone the repository to your local machine:

    git clone [https://github.com/your-username/your-repository.git](https://github.com/your-username/your-repository.git)
    cd your-repository
  2. Install the required Python libraries:

    pip install pandas requests gspread google-generativeai gspread_dataframe google-auth google-auth-oauthlib google-api-python-client beautifulsoup4
  3. Place your authentication files:

    • Save your Google Service Account JSON key file as client_secret.json in the project directory.
    • The script will automatically generate a token.json file on the first run.
    • Ensure your Google Drive folder is shared with the service account's email address (found in the client_secret.json file).
  4. Configure the script: Open the Python script and fill in the configuration variables at the top of the file:

    • SERPAPI_API_KEY: Paste your SerpAPI key here.
    • GEMINI_API_KEY: Paste your Gemini API key here.
    • PARENT_FOLDER_ID: Paste the ID of your Google Drive folder.
  5. Run the script: Execute the script from your terminal. The first time you run it, it will open a browser window for you to authorize access to your Google account.

    python your_script_name.py

    Upon successful completion, a new folder will appear in your Google Drive containing a new Google Sheet and a text file with the news summaries.

About

An automated Python script that scrapes and summarizes news articles from specific sources about a given company. It uses the SerpAPI for targeted searches and the Google Gemini API to create concise summaries, saving a new, dated Google Sheet and text file to Google Drive with each run.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages