Turkish Text Preprocessing Toolkit

A web application for Turkish text preprocessing including tokenization, stemming, normalization, and stopword removal.

Developed by Atahan Uz & Gizem Yılmaz

Paper

View Paper (PDF)

Prerequisites

Python 3.x
Node.js and npm
Docker (optional)

Installation

Install Python dependencies:

pip install flask flask-cors

Install React dependencies:

cd GUI
npm install
cd ..

Run BOUN TULAP Morphological Parser (Optional)

This step is optional but recommended as it will improve the accuracy of the Normalizer.

Repository: https://github.com/BOUN-TABILab-TULAP/Morphological-Parser

Follow the instructions in the repository to install and run the Docker container.

Test that it's working with this command:
```
curl -X POST http://localhost:4444/evaluate \
-H 'Content-Type: application/json' \
-d '{"textarea":"Genç çellistin büyük heyecan ve duyarlılıkla çalmasına salondaki seyirciler hayran oldu ."}'
```

Running the App

Simply run the following command:

python START.py

This will start both the Python backend server and the React frontend automatically.

The app will be available at: http://localhost:3000

To stop: Press Ctrl+C to stop both processes.

Training codes

In the training_test_codes folder, you can find the scripts used to train the models and evaluate their performance. For Naive Bayes, no additional training code is provided, as training is so fast that it is performed during inference.

Credits

Prof. Tunga Güngör for his guidance during the project

Boğaziçi University TULAP for the morphological analyser

Follow us at: https://tabilab.cmpe.bogazici.edu.tr

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
GUI		GUI
data		data
normalization		normalization
sentence_splitter		sentence_splitter
stemmer		stemmer
stopword_removal		stopword_removal
tokenizer		tokenizer
training_test_codes		training_test_codes
.gitignore		.gitignore
Paper.pdf		Paper.pdf
README.md		README.md
START.py		START.py
image.png		image.png
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Turkish Text Preprocessing Toolkit

Paper

Prerequisites

Installation

Running the App

Training codes

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Turkish Text Preprocessing Toolkit

Paper

Prerequisites

Installation

Running the App

Training codes

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages