🧰 A.I Web Scraper

Welcome to the A.I Web Scraper project! This project is a tool built using Streamlit to scrape web content, parse it using A.I, and export the results to an Excel file. It utilizes various web scraping techniques and AI-powered parsing to make data extraction efficient and easy.

Features

Web Scraping: Scrape any website for relevant content.
A.I Parsing: Using A.I models to parse and summarize scraped content.
Database Integration: Store scraped data in a SQLite database.
Export to Excel: Export all scraped data to Excel for easy access.

Prerequisites

Ensure you have the following installed:

Python 3.x: The programming language for the app.
Streamlit: The framework for building the interactive web app.
pandas: For managing and exporting data.
sqlite3: For the database functionality.
openpyxl: To export to Excel.

Required Libraries

To run this project, you need to install the required libraries. You can do so using pip:

pip install streamlit pandas sqlite3 openpyxl
pip install -r requirements.txt

How to Use

Clone this repository:
- You can clone this repository to your local machine with:
```
git clone https://github.com/yourusername/repository-name.git
```
Replace yourusername and repository-name with your actual GitHub username and repo name.
Run the Streamlit app: Navigate to the project directory and run:
```
streamlit run main.py
```
Scrape a Website:
- Enter the URL of the website you wish to scrape and press "Scrape Site".
- The tool will scrape the website and extract content from it.
Parse the Content:
- Once the content is scraped, you can describe the type of data you want to extract.
- For example, you can extract all headers, summaries, or specific sections.
View Scraped Records:
- You can view the previously saved scraped records, which will display the URL, Parsed Content, and Timestamp.
Export Data:
- Export all the scraped data to an Excel file using the "Export All Data to Excel" button.

Database Integration

All scraped data is stored in a SQLite database called scraper_data.db. The database schema consists of:

id: Primary key
url: URL of the scraped website
parsed_content: The parsed data/content
timestamp: The time the data was scraped

Contributing

If you would like to contribute to this project, feel free to fork the repository, create a new branch, and submit a pull request. We welcome suggestions and improvements!

Acknowledgments

Streamlit: For the framework.
OLLAMA MODEL 3.1: For AI-powered parsing.

Contact

For any issues or suggestions, feel free to open an issue on this GitHub repository or contact me via email at parejasarronkian@gmail.com.

Other Features 💡

Other Features: The repository might contain additional features such as:
- Web scraping for specific data types. 🌐
- Integration with external APIs like OLLAMA. 🔌
- Data parsing and processing utilities. 📊

Please ensure you review the relevant files to understand how these features are implemented. 📂

Setup Instructions 🛠️

1. Fork the Repository 🍴

Go to the repository page and click on Fork to create a copy of the repo under your own GitHub account.

Clone your forked repository to your local machine:

git clone https://github.com/your-username/repo-name.git
cd repo-name

Notes 📝

Important: You cannot use or try the PARSING directly on the live viewing for the web scraper. If you want to test it out, please follow one of the options below:
- Fork the repository and download all the files from this repo to run and test it locally on your machine. 🚀
- Alternatively, you can use an OLLAMA API key and integrate it into the code to test the functionality. 🔑

Made with 💚 using Streamlit and Python.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
ai		ai
LICENSE		LICENSE
README.md		README.md
chromedriver.exe		chromedriver.exe
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧰 A.I Web Scraper

Features

Prerequisites

Required Libraries

How to Use

Database Integration

Contributing

Acknowledgments

Contact

Other Features 💡

Setup Instructions 🛠️

1. Fork the Repository 🍴

Notes 📝

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧰 A.I Web Scraper

Features

Prerequisites

Required Libraries

How to Use

Database Integration

Contributing

Acknowledgments

Contact

Other Features 💡

Setup Instructions 🛠️

1. Fork the Repository 🍴

Notes 📝

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages