Skip to content

connectashish028/SolarForecastingWithML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Forecasting Solar Power Generation

Introduction

This project focuses on forecasting solar power generation using advanced machine learning models, including XGBoost and Random Forest. Utilizing a dataset obtained from Kaggle, the project encompasses comprehensive steps such as data cleaning, preprocessing, feature engineering, exploratory data analysis (EDA), model building, hyperparameter tuning, and model evaluation.

Data Collection and Preprocessing

Dataset Acquisition

The dataset was sourced from Kaggle, containing historical data on weather features and solar power generation.

Data Quality Assurance

  • Missing Values: Handled missing values through imputation or removal to maintain data integrity.
  • Outlier Detection: Identified and addressed outliers to prevent distortion of model training.
  • Data Splitting: The dataset was split into training and testing sets to facilitate unbiased evaluation of model performance.
  • Feature Scaling: Features were scaled using StandardScaler to enhance model stability and convergence during training.

Feature Engineering

  • Cyclical Feature Extraction: Time-based features such as DaySin, DayCos, YearSin, and YearCos were created to capture seasonal and cyclical patterns inherent in solar power generation data.

Exploratory Data Analysis (EDA)

  • Data Distribution Visualization: Histograms were used to visualize and understand the distribution of each feature.
  • Correlation Analysis: Investigated relationships between features and the target variable (SystemProduction) to identify key predictors.
  • Time Series Analysis: Conducted to observe trends, seasonality, and anomalies in solar power generation.

Model Development and Evaluation

Model Selection

Various regression models were evaluated, including:

  • Linear Regression
  • Support Vector Regression (SVR)
  • Decision Tree Regressor
  • Random Forest Regressor
  • Gradient Boosting Regressor
  • XGBoost

Model Performance Metrics

Models were assessed using R-squared, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE).

Summary of Model Performance

Model MSE RMSE MAE
Linear Regression 0.637294 52.1485 4.0848 9.39
Decision Tree 0.715784 39.1776 6.2581 1.82
Random Forest 0.902032 20.3694 4.5132 1.98
SVR 0.041909 102.6813 10.1333 7.23
XGBoost 0.882280 28.5447 5.3422 1.62
Gradient Boosting 0.862676 30.3851 5.5123 1.18

Hyperparameter Tuning

  • Optimization: Utilized RandomizedSearchCV to fine-tune hyperparameters for the top-performing models (Random Forest, Gradient Boosting, and XGBoost), resulting in enhanced predictive accuracy.

Feature Importance Analysis

  • Key Features Identified: Analyzed feature importance to gain insights into the model's decision-making process. Radiation, sunshine, and air temperature emerged as significant predictors of solar power generation.

Conclusion

The Random Forest Regressor was identified as the best-performing model, achieving an R-squared score of 0.90. The project highlights the effectiveness of machine learning in forecasting solar power generation, providing valuable insights into the factors influencing solar energy production.

Future Work

Future enhancements could include:

  • Implementing advanced neural network architectures (e.g., LSTM) for better time series forecasting.
  • Exploring additional weather features or external data sources to improve model accuracy.
  • Conducting sensitivity analysis to evaluate the impact of different features on model predictions.

This technical report summarizes the methodologies and findings of the solar power forecasting project, showcasing the potential of machine learning to predict renewable energy generation. The complete code and detailed analysis can be found in the accompanying Jupyter notebook on GitHub.

About

This project focuses on forecasting solar power generation using advanced machine learning models, including XGBoost and Random Forest. The analysis highlights data cleaning, preprocessing, and feature engineering, with Random Forest achieving an R-squared score of 0.91, emphasizing radiation as the most significant predictor.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors