Skip to content

arico97/RAG_summarizer

Repository files navigation

RAG Summarizer

Project Overview

The RAG Summarizer is a Retrieval-Augmented Generation (RAG) model designed to generate concise summaries from input documents. By leveraging both retrieval mechanisms and generative capabilities, it produces accurate and contextually relevant summaries.

Features

  • Document Retrieval: Fetches relevant information from a predefined dataset to enhance summary generation.
  • Text Summarization: Generates concise summaries using advanced natural language processing techniques.
  • Streamlit Interface: Provides an interactive web application for users to input text and receive summaries.

Installation

  1. Clone the Repository:

    git clone https://github.com/arico97/RAG_summarizer.git
    cd RAG_summarizer
  2. Set Up a Virtual Environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Configure LLM Credentials:

    Create a .env file in the project root directory and add your Large Language Model (LLM) credentials:

    LLM_API_KEY=your_api_key_here
    

Usage

  1. Run the Application:
    .\run.sh
  2. Access the Application:

Open your web browser and navigate to http://localhost:8501 to interact with the RAG Summarizer.

Project Structure

  • src/: Contains the core modules for document retrieval and summarization.
  • streamlit_app.py: Hosts the Streamlit web application interface.
  • requirements.txt: Lists all necessary Python dependencies.
  • Dockerfile: Defines the Docker image setup for containerized deployment.
  • rag-summarizer-deployment.yaml: Kubernetes deployment configuration for the application.

Deployment

Docker

  1. Build the Docker Image:

    docker build -t rag-summarizer:latest .
  2. Run the Docker Container:

    docker run -p 8501:8501 rag-summarizer:latest

Kubernetes

  1. Apply the Deployment Configuration:

    kubectl apply -f rag-summarizer-deployment.yaml
  2. Expose the Service: Ensure the service is accessible by configuring the appropriate Kubernetes service resources.

Testing

Execute the test suite to verify the functionality of the summarizer:

python test.py -o [option]

The -o argument specifies the type of test to run. The available options are:

  • youtube: Tests summarization of YouTube video transcripts. [option]=4
  • pdf: Tests summarization of local PDF documents. [option]=2
  • pdf on web: Tests summarization of PDF documents in a url. [option]=1
  • web: Tests summarization of web page content. [option]=3
  • epub: Tests summarization of epub documents. [option]=5

It's important to set in the doc variable either the document path or the url of the webpage, document or YouTube video. Example usage:

python test.py -o 1

Contributing

Contributions are welcome! Please fork the repository and create a pull request with your enhancements.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Special thanks to the open-source community and the developers of the libraries utilized in this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors