This project is an automated Python script designed to scrape, summarize, and archive news articles from specific online sources related to a company and its key topics. It provides a comprehensive solution for generating daily reports without manual work.
- Targeted Scraping: Utilizes the SerpAPI to find relevant news articles from a predefined list of sources and topics.
- AI-Powered Summaries: Integrates the Google Gemini API to generate a concise summary for each news article, highlighting the most important information.
- Dual Output: With each run, the script creates a new, timestamped Google Sheet and a text file containing all the summarized news.
- Automated Cloud Storage: The generated Google Sheet and the summary text file are automatically saved to a specified folder in your Google Drive.
Before running this script, you will need the following:
- Python 3.x installed on your system.
- A Google Cloud Project with the following APIs enabled:
- Google Sheets API
- Google Drive API
- Google Gemini API
- A SerpAPI API key.
- A Service Account with access to your Google Cloud Project. You will need to download the JSON key file for this account.
- A designated folder on your Google Drive to save the output files.
-
Clone the repository to your local machine:
git clone [https://github.com/your-username/your-repository.git](https://github.com/your-username/your-repository.git) cd your-repository -
Install the required Python libraries:
pip install pandas requests gspread google-generativeai gspread_dataframe google-auth google-auth-oauthlib google-api-python-client beautifulsoup4
-
Place your authentication files:
- Save your Google Service Account JSON key file as
client_secret.jsonin the project directory. - The script will automatically generate a
token.jsonfile on the first run. - Ensure your Google Drive folder is shared with the service account's email address (found in the
client_secret.jsonfile).
- Save your Google Service Account JSON key file as
-
Configure the script: Open the Python script and fill in the configuration variables at the top of the file:
SERPAPI_API_KEY: Paste your SerpAPI key here.GEMINI_API_KEY: Paste your Gemini API key here.PARENT_FOLDER_ID: Paste the ID of your Google Drive folder.
-
Run the script: Execute the script from your terminal. The first time you run it, it will open a browser window for you to authorize access to your Google account.
python your_script_name.py
Upon successful completion, a new folder will appear in your Google Drive containing a new Google Sheet and a text file with the news summaries.