Relation Classification Challenge - Solution

Overview

This solution implements a BERT-based relation classification model using the specified architecture: google/bert_uncased_L-4_H-256_A-4.

Dataset Information

Training samples: 1,919
Public test samples: 480
Private test samples: 407
Number of relation labels: 18

Relation Labels

Cause-Effect(e1,e2) / Cause-Effect(e2,e1)
Component-Whole(e1,e2) / Component-Whole(e2,e1)
Content-Container(e1,e2) / Content-Container(e2,e1)
Entity-Destination(e1,e2)
Entity-Origin(e1,e2) / Entity-Origin(e2,e1)
Instrument-Agency(e1,e2) / Instrument-Agency(e2,e1)
Member-Collection(e1,e2) / Member-Collection(e2,e1)
Message-Topic(e1,e2) / Message-Topic(e2,e1)
Product-Producer(e1,e2) / Product-Producer(e2,e1)
Other

Solution Architecture

Model

Base Model: google/bert_uncased_L-4_H-256_A-4 (Small BERT)
Classification Head: Linear layer on top of BERT's [CLS] token
Special Tokens: Added [E1], [/E1], [E2], [/E2] to mark entity boundaries

Key Features

Entity Marking: Replaces <e1>, </e1>, <e2>, </e2> with special tokens
Data Split: 85% train, 15% validation (stratified by label)
Evaluation Metric: Macro F1 score across all 18 classes
Optimization: AdamW optimizer with learning rate 2e-5

Hyperparameters

Max sequence length: 128
Batch size: 16
Epochs: 10
Learning rate: 2e-5
Dropout: 0.3
Validation split: 15%

Usage

1. Install Dependencies

pip install -r requirements.txt

2. Train Model and Generate Predictions

python relation_classifier.py

This will:

Train the model for 10 epochs
Save the best model based on validation F1 score
Generate predictions for both public and private test sets
Create submission files:
- public_test_submission.csv
- private_test_submission.csv

Output Files

best_model.pt: Best model checkpoint
public_test_submission.csv: Public test predictions
private_test_submission.csv: Private test predictions

Model Performance

The model is evaluated using Macro-averaged F1 score across all 18 relation classes, which ensures balanced performance across both frequent and rare relation types.

Implementation Details

Data Preprocessing

Sentences are tokenized using BERT tokenizer
Entity tags are replaced with special tokens for better entity representation
Sequences are padded/truncated to max length of 128 tokens

Training Strategy

Cross-entropy loss for multi-class classification
Early stopping based on validation F1 score
Progress bars for training visibility

Prediction

Batch prediction for efficiency
Argmax over logits for final label prediction
Label mapping back to original relation names

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
frontend		frontend
nlp_private_test/private_test		nlp_private_test/private_test
nlp_public_test/public_test		nlp_public_test/public_test
nlp_train/train		nlp_train/train
.gitattributes		.gitattributes
.gitignore		.gitignore
CI_CD_SETUP.md		CI_CD_SETUP.md
Dockerfile.backend		Dockerfile.backend
QUICK_START.md		QUICK_START.md
README.md		README.md
analyze_data.py		analyze_data.py
app.py		app.py
best_model.pt		best_model.pt
docker-compose.yml		docker-compose.yml
inference.py		inference.py
pipeline_in_terminal.txt		pipeline_in_terminal.txt
private_test_submission.csv		private_test_submission.csv
public_test_submission.csv		public_test_submission.csv
relation_classifier.py		relation_classifier.py
requirements.txt		requirements.txt
run_app.ps1		run_app.ps1
run_backend.ps1		run_backend.ps1
run_frontend.ps1		run_frontend.ps1
test_samples.csv		test_samples.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relation Classification Challenge - Solution

Overview

Dataset Information

Relation Labels

Solution Architecture

Model

Key Features

Hyperparameters

Usage

1. Install Dependencies

2. Train Model and Generate Predictions

Output Files

Model Performance

Implementation Details

Data Preprocessing

Training Strategy

Prediction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Relation Classification Challenge - Solution

Overview

Dataset Information

Relation Labels

Solution Architecture

Model

Key Features

Hyperparameters

Usage

1. Install Dependencies

2. Train Model and Generate Predictions

Output Files

Model Performance

Implementation Details

Data Preprocessing

Training Strategy

Prediction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages