TripleExtractor is a cutting-edge application designed to extract subject-predicate-object triples from input text, URLs, or text files. Powered by TPLinker, a state-of-the-art model for relation extraction, this project supports various output formats including JSON-LD, CSV, RDF, and XML. This project integrates a FastAPI backend and a React frontend, providing an intuitive user interface and seamless server-client communication. For ease of deployment, the project is fully Dockerized.
- Multiple Input Options:
- Enter plain text.
- Provide a URL to fetch and extract text from web pages.
- Upload text files for processing.
- Downloadable Output Formats:
- JSON-LD
- CSV
- RDF (Turtle format)
- XML
-
Language Validation: Ensures that the input text is in English, providing descriptive error messages for unsupported languages.
-
Downloadable Results: Allows users to download results in their preferred format.
-
Advanced Error Handling: Handles input errors, server errors, and network issues gracefully.
-
Output: The output is in the tabular format for better visualization.
-
Latest Update: An additional LLM is added to validate the generated triples which uses Llama 3 8B Instruct from Huggingface.
- You need to navigate to HuggingFace-Llama 3 8B Instruct page and request permission to access the model.
- Generate a SSH key and set it in a .env variable in root of backend folder under like: HF_TOKEN_LLAMA = hf_***
- You need at least 2GB of GPU access to run this code.
Follow the steps below to set up the project locally.
-
Download BERT-BASE-CASED and put it under
backend/pretrained_models. -
Download Model-State and put it under
backend/tplinker/default_log_dir/r584cHKZ.
To run the project using Docker:
- Clone the Repository:
git clone https://github.com/RezaeiAlireza/TripleExtractor.git
cd TripleExtractor- Build and Run Docker Containers:
docker-compose up --buildThis command will:
- Build and start the backend container (FastAPI).
- Build and start the frontend container (React).
- Access the Application:
- Frontend: Open your browser and go to http://localhost:3000.
- Backend API: Access the backend API at http://localhost:8000.
If you prefer to set up and run the project manually:
- Python 3.9, 3.10
- Node.js 16
- Conda (Recommended) for environment management
- Git for version control
- Clone the Repository:
git clone https://github.com/RezaeiAlireza/TripleExtractor.git
cd TripleExtractor/backend- Set Up Conda Environment:
conda create -n tplinker python=3.9 -y
conda activate tplinker- Install Dependencies:
pip install -r requirements.txt- Run Backend:
uvicorn main:app --reloadThe backend will be available at http://127.0.0.1:8000.
- Navigate to Frontend:
cd ../frontend- Install Dependencies:
npm install- Start Frontend:
npm startThe frontend will be available at http://127.0.0.1:3000.
- Launch the application in your browser: http://127.0.0.1:3000.
- Choose your input type (Text, URL, or File).
- Choose the model between NYT or WebNLG.
- Optional: You can select additional LLM verification to yes in order for a post-filtering on result to improve results.
- Enter your text, paste the URL, or upload a text file.
- Press extract and wait for the model to process and show the output.
- Use the Download section to choose the desired format to download the generated output.
TripleExtractor/
│
├── backend/
│ ├── data4bert/
│ ├── pretrained_models/
│ ├── common/
│ ├── tplinker/
│ ├── main.py
│ ├── Dockerfile
│ └── requirements.txt
│
├── frontend/
│ ├── public/
│ ├── Dockerfile
│ ├── src/
│ │ ├── components/
│ │ ├── pages/
│ │ └── App.js
│ └── package.json
│
│ ├── docker-compose.yaml
└── README.md
Text: "Vienna is the capital of Austria."
{
"@context": {
"subject": "http://schema.org/subject",
"predicate": "http://schema.org/predicate",
"object": "http://schema.org/object"
},
"@graph": [
{
"@type": "Triple",
"subject": "Vienna",
"predicate": "/location/location/contains",
"object": "Austria"
}
]
}This project is licensed under the MIT License. See the LICENSE file for details.
TPLinker: A joint extraction framework for subject-predicate-object triples. FastAPI: For providing a robust backend framework. React: For a user-friendly frontend interface. For questions or support, feel free to contact [rezaei.alireza1290@gmail.com].


