This repository contains the implementation of a Question Answering (QA) system using the DistilBERT model from the Hugging Face transformers library. This system is designed to comprehend text context and provide answers to related questions, mirroring the capabilities demonstrated in the paper, "A BERT Baseline for the Natural Questions." The goal is to replicate the short answer and no answer results discussed in the paper, enhancing our understanding of BERT's application in real-world NLP tasks.
Paper Link: read paper here
This project aims to apply the principles of machine learning to natural language processing tasks specifically focused on understanding context and providing concise answers to questions. Our system tests the robustness of the DistilBERT model in identifying pertinent information and discerning when no adequate answer is available within the provided texts.
The dataset used for this project has been curated to align closely with the specific objectives of replicating selected scenarios from the referenced paper. This reduced dataset encompasses only two of the paper's four annotated answer types, which are:
- No answer: Instances where the questions cannot be answered with the provided context.
- Short answer: Questions that have concise answers directly extractable or inferable from the context.
Data Link: see data here
Here is the link for the video:
To set up and run the project locally, follow these steps:
git clone https://github.com/YanmiYu/CSCI_1460_final.git
cd Computational Linguistic