The objectives of this project work are as follows:
- Predict Student Academic Performance based on some behavior data of the student
- Find out behavior that affect student academic performance the most
- Evaluate the machine learning prediction used for accuracy and precision
In this project the machine learning algorithm used for predicting student academic performance from behavioral dataset is Random Forest Classifier which involves leveraging the power of ensemble learning and decision trees. Random Forest is an ensemble of decision trees, where each tree is constructed using a random subset of features and a random subset of the training data. The predictions from multiple trees are then aggregated to make a final prediction.
- Features: The behavioral dataset contains features related to student behavior. These include attributes like attendance scores, study habits, hobbies, family issues, ...
- Target Variable: The target variable is the student academic performance (student grades)
The figure below shows the entire features in our dataset.
-
Handling missing values, if any.
-
Converting categorical variables into a suitable format (e.g., one-hot encoding).
-
Spliting the dataset into training and testing sets.
The figure above depicts the raw dataset as collected from our survey.
In the above figure, we can deduce that 46.1% of students in the dataset are very good academically, 34% are Good, 18.4% are Fair, and the weak and poor students are negligible.
The correlation heat map above shows the behavior that affect student positively the most are Study habit, Class attendance & Family issues. Financial stability moderately affects students performance while habits that affect students negatively are Accommodation distance from school, Distractions (hobbies or addiction), and Impact of extracurricular activities.
We can conclude from the graph above that very good student study hard with their percentage being the highest as 80%
The figure above shows that top students attend class regularly
Family issues affect all students. A stable home impact students psychology and concentration in school.
From the above figure, financial stability moderately affects students performance
Accommodation issues affect students negatively the most, students that stay on campus do perform better than those that live outside the school campus.
Addictions or engagement with hobbies affect students the most after accommodation issues
- Import the
RandomForestClassifierfrom scikit-learn. - Create an instance of the classifier.
- Train the model using the training dataset.
- From the figure below, we saved the trained model so that we can use it in a webapp without retraining the model every time we need it.
- For first candidate
- For second canditat
- For third candidate
- We evaluate the model using metrics such as accuracy, precision, recall, F1-score an confusion matrix
The model accuracy is 82.8%
We can further experiment with hyperparameter tuning to optimize the model's performance. Parameters like n_estimators, max_depth, and others can be fine-tuned.
Random Forest classifiers are robust, handle non-linear relationships well and are less prone to overfitting compared to individual decision trees. They are suitable for classification tasks, especially when dealing with complex datasets with multiple features. However, it's essential to interpret the results carefully and consider the context of the specific educational scenario. In this project, we conclude that habits that affect students academic performance positively are Study habit, Family support, Class attendance. Students' financial stability moderately impacts students performance. Some negative habits that affect student academic performance the most are Accommodation (nearness to school), distractions and impact of extracurricular activities. The use of machine learning(AI) in this project is to predict student academic potential by given behavior metrics such as study habit, Class attendance and other metrics used for our model training.