The Dynamic Pricing System is a machine learning project designed to generate data-driven nightly price recommendations for short-term rental listings (e.g., Airbnb).
Pricing rental properties effectively is a complex problem — setting prices too high reduces bookings, while pricing too low leads to revenue loss. This system leverages historical listing data and advanced feature engineering techniques to predict optimal pricing strategies.
💡 This project demonstrates an end-to-end ML pipeline — from data preprocessing to model tuning and evaluation — with a focus on real-world applicability.
Dynamic-Pricing-System: https://dynamic-pricing-system-bbovjcwe6xyf5ujvnmkdao.streamlit.app
Hosts often rely on intuition rather than data, ignoring key factors such as:
- Local market trends
- Property characteristics
- Listing availability
- Review activity
- Competition within neighbourhoods
This results in suboptimal pricing decisions.
👉 This project builds a predictive pricing engine that learns patterns from historical data to recommend competitive prices.
- ✔️ End-to-end ML pipeline (EDA → Feature Engineering → Modeling → Evaluation)
- ✔️ Advanced feature engineering (log transforms, target encoding, clustering)
- ✔️ Handling of high-cardinality features (neighbourhood encoding)
- ✔️ Hyperparameter tuning using RandomizedSearchCV
- ✔️ Comparison of multiple models (Linear, KNN, Random Forest, Boosting)
- ✔️ Real-world dataset with 220K+ listings
- Location → City, Neighbourhood (target encoded)
- Property → Room type
- Availability → Minimum nights, availability_365
- Reviews → Number of reviews
- Host → Listing count
- 📌 Log Transformations → Reduced skewness in numerical features
- 📌 Target Encoding → Neighbourhood pricing signal
- 📌 KMeans Clustering (Location Intelligence) → Grouped latitude/longitude into meaningful regions
- 📌 One-Hot Encoding → Low-cardinality features (city, room type, clusters)
- 📌 Outlier Handling → Removed extreme pricing and minimum night anomalies
🚀 These steps significantly improved model performance and stability.
- Python 3.11
- Pandas / NumPy – Data processing
- Matplotlib / Seaborn – Visualization
- Scikit-learn – ML models & pipelines
- XGBoost – Gradient boosting model
- Jupyter Notebook / Google Colab – Development
After feature engineering and hyperparameter tuning:
| Model | MAE | RMSE | R² Score |
|---|---|---|---|
| Random Forest ✅ | 0.346 | 0.466 | 0.595 |
| XGBoost | 0.360 | 0.478 | 0.574 |
| Gradient Boosting | 0.367 | 0.485 | 0.561 |
Also tested out other regression models with the scaled dataset
| Model | MAE | RMSE | R² Score |
|---|---|---|---|
| Linear Regression | 0.73 | 169.3 | 0.22 |
| KNN. | 0.670 | 142.67 | 0.344 |
📊 The Random Forest model achieved the best performance, explaining ~59% of price variance.
-
Data Cleaning
- Removed invalid and extreme values
- Handled missing data
-
Feature Engineering
- Log transformation of skewed features
- Target encoding for neighbourhood
- KMeans clustering on geolocation
- One-hot encoding for categorical variables
-
Train/Test Split
- Strict separation to prevent data leakage
-
Scaling
- Applied where required (linear models)
-
Model Training
- Linear Regression (baseline)
- KNN
- Random Forest (best performer)
- Gradient Boosting
- XGBoost
-
Hyperparameter Tuning
- RandomizedSearchCV for optimal performance
- Distribution analysis of pricing and features
- Detection of extreme outliers
- Correlation analysis with target variable
- Feature behavior across cities and room types
- 📍 Location is the strongest pricing driver
- 🏠 Room type significantly impacts price
- 📉 Extreme values (outliers) degrade model performance
- 🔄 Log transformation improves model stability
- 🤖 Ensemble models outperform linear approaches
- Source: Kaggle – NYC Airbnb Open Data
- Size: 226,000+ listings
- Features: Location, availability, reviews, host info
- Data cleaning & preprocessing
- Feature engineering
- Model training & evaluation
- Hyperparameter tuning
- Model stacking (planned)
- Deployment (Streamlit)
- Advanced feature engineering (ongoing)
- 🔹 Model stacking / ensembling
- 🔹 Advanced location features (distance-based metrics)
- 🔹 Time-based features (seasonality)
- 🔹 Deployment using Streamlit
- 🔹 Integration with real-time pricing APIs
git clone https://github.com/AryanSharma1017/Dynamic-Pricing-System.git
cd Dynamic-Pricing-System
pip install -r requirements.txtRun the notebook in Jupyter or Colab.
Aryan Sharma
- GitHub: https://github.com/AryanSharma1017
- LinkedIn: https://www.linkedin.com/in/aryansharma007
This project demonstrates not just model building, but practical machine learning skills including:
- Feature engineering
- Handling real-world messy data
- Model tuning and evaluation
- Avoiding data leakage
🚀 Built with the mindset of solving real-world pricing problems using machine learning.