- Aakash Shivanandappa Gowda
A20548984 - Dhyan Vasudeva Gowda
A20592874 - Hongi Jiang
A20506636 - Purnesh Shivarudrappa Vidhyadhara
A20552125
This project implements an Elastic Net regression model (ElasticNetModel) that combines L1 (Lasso) and L2 (Ridge) regularization techniques. It is designed to handle high-dimensional datasets, feature selection, and multicollinearity problems. The model uses gradient descent for optimization, function consists of the MSE loss, L1 loss, L2 loss.
The ElasticNetModel class implements the Elastic Net regression model. Below are the key methods and their descriptions:
Initializes the Elastic Net model with the following parameters:
lambdas(float, default=0.1): Penalty coefficient for regularization.thresh(float, default=0.5): Mixing parameter between L1 and L2 regularization. Value ranges from 0 to 1, where 0 means L1 regularization only and 1 means L2 regularization only.max_iter(int, default=1000): Maximum number of iterations for gradient descent.tol(float, default=1e-4): Tolerance for the stopping condition. If changes in weights are smaller thantol, training stops.learning_rate(float, default=0.01): Step size for gradient descent updates.
Trains the Elastic Net regression model using gradient descent on the provided feature matrix X and target values y.
- Parameters:
X(numpy array): Feature matrix of the training dataset (without intercept).y(numpy array): Target values corresponding to the training dataset.
- Returns:
- An instance of
ElasticNetResults, containing the trained weight coefficients and intercept.
- An instance of
This class stores the results after training the Elastic Net model.
Predicts target values for the provided feature matrix X.
- Parameters:
X(numpy array): Feature matrix of the test dataset.
- Returns:
- Predicted target values.
from elasticnet/models import ElasticNetModel
# Initialize the model
model = ElasticNetModel(lambdas=0.1, thresh=0.5, max_iter=1000, tol=1e-4, learning_rate=0.01)
# Fit the model on training data
model.fit(X_train, y_train)
# Predict on test data
predictions = model.predict(X_test)The generate_negative_data function generates synthetic data with two distinct patterns:
- The first half of the features has a monotonic increasing trend.
- The second half of the features has a linear decreasing trend (i.e., negative slope).
This function can be useful for testing model's ability in negative correlations data.
range_x(tuple): The range of feature values (min, max).noise_scale(float): Standard deviation of the Gaussian noise added to the data.size(int): Number of samples in the dataset.num_features(int): Total number of features (half with increasing trend, half with decreasing trend).seed(int): Random seed for reproducibility.
X(numpy array): Generated feature matrix with both increasing and decreasing trends.y(numpy array): Target values with contributions from the features and added noise.
from generate_negative_regression_data import generate_negative_data
# Generate negative slope data
X, y = generate_negative_data(range_x=(0, 10), noise_scale=1.0, size=100, num_features=6, seed=42)The generate_rotated_positive_data function generates synthetic data with two patterns:
- The first half of the features follows a monotonic increasing trend.
- The second half exhibits a wavy (S-shaped) pattern, adjusted by a rotation matrix to create a slanted shape.
This function generates more complex data to test model's ability.
range_x(tuple): Specifies the range of feature values (min, max).noise_scale(float): Standard deviation of the Gaussian noise added to the data.size(int): Number of samples in the dataset.num_features(int): Total number of features (half with increasing trend, half with a wavy pattern).seed(int): Random seed for reproducibility.rotation_angle(float, default=45): Angle (in degrees) to rotate the wavy pattern, introducing a slanted S-shape.mode(int, default=0): Determines the scaling factors for the feature values.
X(numpy array): Generated feature matrix with increasing trends and rotated S-shaped patterns.y(numpy array): Target values influenced by the features and noise.
from generate_positive_regression_data import generate_rotated_positive_data
# Generate positive data with rotation and S-shaped curves
X, y = generate_rotated_positive_data(range_x=(0, 10), noise_scale=1.0, size=100, num_features=6, seed=42, rotation_angle=45, mode=0)The model we've crafted, called ElasticNetModel, is an implementation of Elastic Net regression. What makes it special is that it not only uses MSE as the loss function, but also combines L1 (Lasso) and L2 (Ridge) regularization techniques. Here's a clearer view on how we implemented this model:
- The loss function consists of the MSE loss, L1 loss, and L2 loss. The loss function is shown as follows :
Then use the gradient descent algorithm to calculate the gradient of the loss function and then update.
- Gradient calculation results of MSE: :
- Gradient calculation results of L1::
- Gradient calculation results of L2: :
- Total Gradient:
- Weight Update:
In summary, our model implements hybrid regularization by combining L1 and L2 penalties. This simplifies feature selection and improves model stability. Additionally, we uses gradient descent to efficiently update parameters and weights.
Hybrid Regularization: Elastic Net is unique because it utilizes both L1 and L2 penalties. The L1 penalty naturally eliminates lower importance variables, which essentially refines the model to focus on more important aspects, thus simplifying feature selection and interpretation. This is especially helpful when you have tons of potential predictors and want to keep a model that is not overfitted or one that cannot be interpreted. The L2 penalty is also useful when variables are correlated with each other, which happens often. This will occur by spreading the influence of those features more broadly so that the model stays stable and reliable.
Gradient Descent Optimization: This is very useful in large datasets to optimize the model, where other methods would take too long. It slowly and systematically changes the model to find the perfect performance, based on how quick it is learning and how accurate we want our model.
Broad Application Spectrum: It works best in any scenario requiring accurate and tough predictions. Great for things like Economics or Healthcare where the data and relationships can be pretty complex.
Handling Complexity and Ensuring Accuracy: Great for understanding messy, complex data. From gene-centric analysis in the field of genomics to making sense of large-scale trends in powerful industries, this model is a way to juggle high-volume information without losing focus.
Real-World Use Cases: In real estate, it can be used for predicting prices based on several factors (e.g., location, age) as well as through engineered features reflecting how the property compares to others. It may predict stock prices in finance by analyzing some economic metrics and company data.
The ElasticNetModel is not only used for predictions; it is also indispensable for understanding the data. It is beneficial not only for numbers but also for anyone who deals with complex data puzzles and draws smart conclusions. Researchers and analysts who often encounter difficult modeling scenarios find the power to simplify this model while dealing with complex relationships invaluable.
We are testing and validating extensively to make our model not only accurate but also robust. So scrolling through, let us now see how it helps in providing a complete review of a model by each action described.
-
Method: We used a method
generate_rotated_s_wave_datafor generating synthetic datasets which makes the relationship of features with target values explicit. -
Why It Matters: This allows us to go deep into how the model is recognizing and learning the patterns we have defined. If we are using synthetic data where we obviously know the outcomes, it is possible to thoroughly test how much of this set relationship between elements that the model has actually learned.
-
What We Did: The model spent a good deal of time learning from these synthetic datasets. During this phase, we kept a close eye on its parameters, checking how well the learned coefficients and intercepts matched the expected patterns.
-
Why It Matters: It is important as it confers that the model captures the underlying phenomenon in data. It helps prove that the groundwork of the model is strong and it responds well to various nuances that come in real scenarios.
-
Trial: We tried the model's predictions on new data that it has never seen in training.
-
Why That Matters: It's critical that the model be able to generalize its learning to new, unobserved data as this is essential in making sure the model is actually useful for the real world, where variability will always exist.
-
Metrics Used: We measured the model’s accuracy and reliability using several standard regression metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R²).
-
Why They Matter: These measures are objective figures that show us how well the model performs on unseen data. These metrics help us in differentiating so many of the things that we encounter, for example, what is the average size of errors, how much variance in our target variable has been explained by the model.
-
How We Do It: Scatter plots of actual vs. predicted values. The model would be making perfect predictions if every point is close to the diagonal line (x and y values are identical).
-
The Benefits: These visualizations provide an immediate, visceral understanding of the model's accuracy. These are especially useful for seeing whether the model tends to over or underestimate in general (systematic bias) or if it is all over the place (high variance). They also allow us to easily observe any preferable prediction ability of some characteristics over the others.
-
Continuously Iterating Model: We keep refining the model based on insights derived from the visual data and other metrics. It might entail adjusting the settings of the model to more closely reflect reality — particularly in areas where we see gaps.
-
Why Work On It: It is to give the model adaptations so as to fit not very systematic characteristics the new data comes with, this guarantees that the model keeps effective and would still work properly. This endless circle of iteration is what keeps the model out front in confronting new obstacles.
The improved, tight clustering along the diagonal for Feature 2 vs. Target when compared to Feature 6 vs. Target indicates that we are handling Feature 2 better. In this case, we will wonder whether Feature 6 requires additional preprocessing, or whether the model should use more powerful feature engineering resources to learn better and deeper data patterns. By combining these detailed testing methodologies, we are able to not only validate the theoretical basis of the model but also certify that it can take on a wide range of real applications with reliable performance. This thorough validation and refinement front is paramount for developing robust predictive models that can scale to novel queries and provide you with reliable predictions.
Q3.What parameters have you exposed to users of your implementation in order to tune performance? (Also perhaps provide some basic usage examples.)
We gave some parameters to change in order to improve the performance of our model. This section is a breakdown of each parameter and what each does/means, as well as touches on how tuning them could lead to improved model performance:
- Function: Lambdas are the key elements to the regularization strategies of our model, which we utilized in both L1 (Lasso) and L2 (Ridge) techniques. This alleviates a common overfitting problem in which the model learns noise as if it were real data signals.
- Effect of Regularization: When the value of lambdas is increased, it increases the penalty on coefficients so that they are constrained to be near zero. This adjustment is beneficial when dealing with complex datasets that could potentially lead to misleading noise.
- When we use it: When we observe our model is overfitting with training data, we then increase lambdas to rebuild our model in a much generalized way to new, unseen data.
-
Function:
Thresh decides the elimination point for feature selection after filtering in, shaving the model down to select characteristics that are necessary and excluding others that may distort. -
Adjustment Effect:
Raising the thresh value may cause getting rid of secondary variables as their coefficients become zeroes. Such a reduction in the model leads to less accurate results but occasionally, the results are expected to be better for unseen data. -
Where To Use It:
Thresh is used to control the measure of variable importance, allowing identifying important features, and at the same time, excluding an excessive number of predictors that have no beneficial impact or, on the contrary, may negatively affect the model’s ability to make adequate predictions. This is more applicable especially where there are many features more than the dependent variables.
- Function: This parameter tells our algorithm to stop after running for the permitted number of iterations as stated below.
- Impact of Adjustment: Increasing the number of epochs may provide better chances for the model to adjust its weights with respect to data for better performance.
- When We Use It: If the model is not in its best condition yet, then it may be required to increase the max_iter. However, it is possible to have the model take more time for a training session, which calls for compromise between precision and time taken on the training session.
- Function: This is the same as tolerance; it defines a small the error which must fit our training data before further adjustments are made to stop.r adjustments stop.
- Impact of Adjustment: Lowering the tol value ensures the model won’t stop learning until the fit is very tight, improving accuracy but extending training time.
- When We Use It: If rate of accuracy must be in the consideration. The most important one is that if you set the tolerance level too low then really long training can actually make no gain at all.
- Function: This hyperparameter controls by how much the model should change towards the optimization in the updating of the model.
- Impact of Adjustment: It reduces the steps taken towards implementation and is much slower but more precise, because it is a smaller learning rate and its implementations are more careful.
- When We Use It: However, if the model is over the point that represents the minimal loss, it means that the learning rate is oversized, thus should be decreased to improve the stability and accuracy of the convergence. Nonetheless, the rate of learning may be reduced to an extreme low in the presence of a low rate.
These two parameters, therefore, can be tuned to optimize accuracy for given data distribution and complexity of task, computational cost, and model complexity.
We can improve this model by tinkering with these parameters to help the model adapt well to the nuances of our data and the requirements of our task, ensuring that we have a good trade-off between accuracy, efficiency of computation, and model lean size.
Q4.Are there specific inputs that your implementation has trouble with? Given more time, could you work around these or is it fundamental?
- The Challenge: This is because Elastic Net has an Optimistic World View, it believes that the world can be described using straight lines and linearly separable features. But life is rarely that clean or bumpy, and data exhibits the kind of curved relationships that a model like biases against.
- What Happens: Elastic Net might miss with its predictions in these instances since they involve some of the sharp edges/turns and complex patterns that it cannot fully understand.
- Feature Engineering: One way to make the model learn the underlying patterns is by transforming those input features. This could involve squaring or cubing the features, maybe adding in interaction terms, possibly doing mathematical transformations (like logarithms or exponentials) to help straighten out those nonlinear relationships.
- Kernel Methods: Taking a cue from support vector machine, these can be kernelized as well.
- Hybrid Models:Two heads (or models) as well. We can combine the advantages of Elastic Net when teamwork with nonlinear models like decision trees or neural networks, which allows it to predict well in wider set of scenarios.
-
The Challenge:Think of your data as listening to one of those quiet FM radio stations during a thunderstorm, that is what this is like. This then becomes static (noise) that can drown the music (true patterns), and thus it makes it hard for the Elastic Net to hear what is important.
-
What Happens:If the data is too noisy, Elastic Net might fit to the noise instead of the music and learn something that works great on training data but will falter in practice.
-
What We Can Do:
-
Increase Regularization:we have the option here to turn up the lambdas knob and increase penalties for complexity, allowing our model to focus in on every loud symptomatic signal while ignoring that static.
-
Threshold Tuning: Changing the thresh parameter to allow us to be more selective about which features we let into the model, and cut out those that tend to contain more noise than signal.
-
Noise Reduction Techniques: Even before feeding data into our model, we can clean it up to filter out the oddball values or make some signal smoother so that what we pass on to our mod is as clean and clear as possible.
-
By acknowledging these limitations and using smart strategies to overcome them, we can increase the potential of Elastic Net because another graphical preferred in predictive modelling.
This image shows a series of scatter plots, each representing the relationship between a different feature (Feature 1 to Feature 6) and Y_train based on training data. Each plot displays the data points in two dimensions:
- The horizontal axis (X-axis) shows values of a specific feature.
- The vertical axis (Y-axis) represents the values of Y_train.
The purpose of these plots is typically to visually assess the relationship between the features and the target variable. Observations can be made regarding the distribution, trend, or any potential outliers in the data. From the plots:
- Feature 1, Feature 2, Feature 3: These features display a scattered distribution with respect to Y_train as well as relatively clear linear relationships.
- Feature 4, Feature 5, Feature 6: These features show some vertical dispersion and a less clear linear relationship.
Overall, these scatter plots can help in determining which features might be relevant for modeling, although none of the features shown here have a distinct or strong linear relationship with the target variable.
The image shows scatter plots of six different features (labeled "Test Feature 1" through "Test Feature 6") plotted against "Y_Test". Each subplot represents the data of one specific demension of feature.
These plots visualize the dataset to help understand our first test dataset. This test dataset is used to test the model's predictive ability on partially linear positively correlated data and data without obvious correlation.
The figure above shows the effect of our model on the first test dataset based on the first training dataset, where the red line is the straight line fitted using the least squares method. The yellow points are true values and the blue points are predicted values. It can be seen that the fitting effect of our model is very good on the first dataset.
The image shows scatter plots of six features (Feature 1 through Feature 6) plotted against "Y_Train" on the y-axis. The x-axis in each subplot corresponds to the respective feature values.
-
Feature 1 to Feature 3:
- All three features show a strong positive linear correlation.
- The points align closely along a positively sloped line, indicating that as these feature values increase, the target variable increases in a linear fashion.
-
Feature 4 to Feature 6:
- These features show a strong negative linear correlation.
- The points align closely along a negatively sloped line, suggesting that as these feature values increase, the target variable decreases.
The image shows scatter plots of six features (Test Feature 1 through Test Feature 6) plotted against "Y_Test." Each subplot represents the data of one specific demension of feature.
- The data of the first dimension are scattered in all the plots and do not show obvious characteristics.
- The data of the second and third dimensions show insignificant positive correlation.
- The data of the last three dimensions show insignificant negative correlation.
These plots visualize the dataset to help understand our second test dataset. This test dataset is used to test the model's predictive ability on partially linear positively correlated data and partially linear negatively correlated data.
The figure above shows the effect of our model on the second test dataset based on the second training dataset, where the red line is the straight line fitted using the least squares method. The yellow points are true values and the blue points are predicted values. It can be seen that the fitting effect of our model is very good on linear positive correlation and linear negative correlation data sets.





