TripleExtractor

TripleExtractor is a cutting-edge application designed to extract subject-predicate-object triples from input text, URLs, or text files. Powered by TPLinker, a state-of-the-art model for relation extraction, this project supports various output formats including JSON-LD, CSV, RDF, and XML. This project integrates a FastAPI backend and a React frontend, providing an intuitive user interface and seamless server-client communication. For ease of deployment, the project is fully Dockerized.

Features

Multiple Input Options:

Enter plain text.
Provide a URL to fetch and extract text from web pages.
Upload text files for processing.

Downloadable Output Formats:

JSON-LD
CSV
RDF (Turtle format)
XML

Language Validation: Ensures that the input text is in English, providing descriptive error messages for unsupported languages.
Downloadable Results: Allows users to download results in their preferred format.
Advanced Error Handling: Handles input errors, server errors, and network issues gracefully.
Output: The output is in the tabular format for better visualization.
Latest Update: An additional LLM is added to validate the generated triples which uses Llama 3 8B Instruct from Huggingface.

Pre-Setup

You need to navigate to HuggingFace-Llama 3 8B Instruct page and request permission to access the model.
Generate a SSH key and set it in a .env variable in root of backend folder under like: HF_TOKEN_LLAMA = hf_***
You need at least 2GB of GPU access to run this code.

Installation

Follow the steps below to set up the project locally.

Download BERT-BASE-CASED and put it under backend/pretrained_models.
Download Model-State and put it under backend/tplinker/default_log_dir/r584cHKZ.

Dockerized Setup

To run the project using Docker:

Clone the Repository:

git clone https://github.com/RezaeiAlireza/TripleExtractor.git
cd TripleExtractor

Build and Run Docker Containers:

docker-compose up --build

This command will:

Build and start the backend container (FastAPI).
Build and start the frontend container (React).

Access the Application:

Frontend: Open your browser and go to http://localhost:3000.
Backend API: Access the backend API at http://localhost:8000.

Manual Setup

If you prefer to set up and run the project manually:

Prerequisites

Python 3.9, 3.10
Node.js 16
Conda (Recommended) for environment management
Git for version control

Backend Setup

Clone the Repository:

git clone https://github.com/RezaeiAlireza/TripleExtractor.git
cd TripleExtractor/backend

Set Up Conda Environment:

conda create -n tplinker python=3.9 -y
conda activate tplinker

Install Dependencies:

pip install -r requirements.txt

Run Backend:

uvicorn main:app --reload

The backend will be available at http://127.0.0.1:8000.

Frontend Setup

Navigate to Frontend:

cd ../frontend

Install Dependencies:

npm install

Start Frontend:

npm start

The frontend will be available at http://127.0.0.1:3000.

Usage

Launch the application in your browser: http://127.0.0.1:3000.
Choose your input type (Text, URL, or File).
Choose the model between NYT or WebNLG.

Optional: You can select additional LLM verification to yes in order for a post-filtering on result to improve results.

Enter your text, paste the URL, or upload a text file.
Press extract and wait for the model to process and show the output.
Use the Download section to choose the desired format to download the generated output.

Project Structure

TripleExtractor/
│
├── backend/
│   ├── data4bert/
│   ├── pretrained_models/ 
│   ├── common/
│   ├── tplinker/    
│   ├── main.py      
│   ├── Dockerfile
│   └── requirements.txt 
│
├── frontend/
│   ├── public/          
│   ├── Dockerfile
│   ├── src/             
│   │   ├── components/     
│   │   ├── pages/          
│   │   └── App.js        
│   └── package.json       
│
│   ├── docker-compose.yaml    
└── README.md

Example Input and Output

Input (Text)

Text: "Vienna is the capital of Austria."

Output (JSON-LD)

{
  "@context": {
    "subject": "http://schema.org/subject",
    "predicate": "http://schema.org/predicate",
    "object": "http://schema.org/object"
  },
  "@graph": [
    {
      "@type": "Triple",
      "subject": "Vienna",
      "predicate": "/location/location/contains",
      "object": "Austria"
    }
  ]
}

Screenshots

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

TPLinker: A joint extraction framework for subject-predicate-object triples. FastAPI: For providing a robust backend framework. React: For a user-friendly frontend interface. For questions or support, feel free to contact [rezaei.alireza1290@gmail.com].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TripleExtractor

Features

Pre-Setup

Installation

Dockerized Setup

Manual Setup

Prerequisites

Backend Setup

Frontend Setup

Usage

Project Structure

Example Input and Output

Input (Text)

Output (JSON-LD)

Screenshots

License

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Folders and files

Latest commit

History

Repository files navigation

TripleExtractor

Features

Pre-Setup

Installation

Dockerized Setup

Manual Setup

Prerequisites

Backend Setup

Frontend Setup

Usage

Project Structure

Example Input and Output

Input (Text)

Output (JSON-LD)

Screenshots

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages