Autocorrect project

Sample query autocorrection project. Frames autocorrect as a ranking problem, assuming a set of candidate corrections is given. The highest ranked candidate can be treated as the autocorrected query.

We won't bother to split the data into training/validation/test sets in this project.

setup

Use Python 3. You may want to use a virtual environment as well

pip install -r requirements.txt ()

prepare query data

python prepare_queries.py

Some prelabelled queries are provided, including both correctly spelled queries, and typo-corrected query pairs. In a real scenario, we could generate candidates with Elasticsearch and treat them as incorrect candidates. In this project, we generate some random alterations of the input query.

train model

python model.py train

Using XGBoost ranker. Main features used in this project are edit distance related counts. Eg. for the pair tshot -> tshirt, the feature replace_o_to_i is 1, add_r is 1 The idea is some edits are more common/natural than others (eg. characters close together on the keyboard)

evaluate

python model.py evaluate

interactive predict

python model.py predict

edit distance unit test

pytest test_edit_distance.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
data		data
edit_distance.py		edit_distance.py
features.py		features.py
misspell.py		misspell.py
model.py		model.py
prepare_queries.py		prepare_queries.py
readme.md		readme.md
requirements.txt		requirements.txt
test_edit_distance.py		test_edit_distance.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autocorrect project

setup

prepare query data

train model

evaluate

interactive predict

edit distance unit test

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

AdeptMind/adept-interview-autocorrect

Folders and files

Latest commit

History

Repository files navigation

Autocorrect project

setup

prepare query data

train model

evaluate

interactive predict

edit distance unit test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages