Skip to content

Domdieun/soil_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

This project applies machine learning regression techniques to model and predict soil pH based on bacterial microbiome composition. The dataset consists of farmland soil samples collected from multiple geographic locations, each characterized by microbial profiles derived from 16S rRNA amplicon sequencing. The study aims to explore the relationship between the microbial community structure and soil health metrics, with pH serving as the primary target variable. Understanding this relationship can contribute to precision agriculture, sustainable land management, and microbiome-driven soil diagnostics.

Objective To explore the microbial composition of farmland soils using dimensionality reduction and clustering, and to build regression models that predict soil pH from microbiome data. The project compares different machine learning algorithms and evaluates their performance on high-dimensional biological data.

Dataset Samples: 753 farmland soil samples Features: 6,798 amplicon sequence variants (ASVs) representing bacterial species/strains Additional attributes: 12 soil health metrics (including pH, water capacity, etc.) Target variable: Soil pH File: soil_health.csv.gz

Methodology

Data Exploration & Preprocessing Analyze microbiome composition and soil pH distribution Handle missing data and normalize ASV abundance values Apply PCA and unsupervised clustering (e.g., k-means or hierarchical clustering) to visualize community structure Model Selection & Training Implement and compare multiple regression models: Linear Regression / Ridge / Lasso Random Forest Regressor Gradient Boosted Trees (XGBoost / LightGBM) Support Vector Regression (SVR) or Neural Networks (optional) Optimize hyperparameters via cross-validation Evaluation Evaluate models using: R² score Mean Absolute Error (MAE) Root Mean Squared Error (RMSE) Compare predictive performance and interpret feature importance Interpretation & Insights Identify microbial taxa most correlated with soil pH variation Discuss potential biological and ecological implications of results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors