Skip to content

YuminosukeSato/scigo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

133 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

SciGo πŸš€

SciGo Mascot Gopher

SciGo's official mascot - Ready, Set, SciGo!

The blazing-fast scikit-learn compatible ML library for Go

Say "Goodbye" to slow ML, "Sci-Go" to fast learning!

CI Codecov Go Report Card License: MIT Go Version GoDoc Release


🌟 Why SciGo?

SciGo = Statistical Computing In Go

SciGo brings the power and familiarity of scikit-learn to the Go ecosystem, offering:

  • πŸ”₯ Blazing Fast: Native Go implementation with built-in parallelization
  • 🎯 scikit-learn Compatible: Familiar Fit/Predict API for easy migration
  • 🌲 LightGBM Support: Full compatibility with Python LightGBM models (.txt/JSON/string)
  • πŸ“– Well Documented: Complete API documentation with examples on pkg.go.dev
  • 🌊 Streaming Support: Online learning algorithms for real-time data
  • πŸš€ Zero Heavy Dependencies: Pure Go implementation (only scientific essentials)
  • πŸ“Š Comprehensive: Regression, classification, clustering, tree-based models, and more
  • πŸ§ͺ Production Ready: Extensive tests, benchmarks, and error handling
  • ⚑ Superior to leaves: Not just inference - full training, convenience features, and numerical precision

πŸ“¦ Installation

Go Module (Recommended)

go get github.com/YuminosukeSato/scigo@latest

Quick Start Options

  • 🐳 Docker: docker run --rm -it ghcr.io/yuminosukesato/scigo:latest
  • ☁️ GitPod: Open in Gitpod
  • πŸ“¦ Go Install: go install github.com/YuminosukeSato/scigo/examples/quick-start@latest

πŸš€ Quick Start

πŸ’‘ Tip: For complete API documentation with examples, visit pkg.go.dev/scigo

Option 1: One-Liner with LightGBM 🌲

package main

import (
    "github.com/YuminosukeSato/scigo/sklearn/lightgbm"
    "gonum.org/v1/gonum/mat"
)

func main() {
    // Super convenient one-liner training!
    X := mat.NewDense(100, 4, data) // Your data
    y := mat.NewDense(100, 1, labels) // Your labels
    
    // Train and predict in one line!
    result := lightgbm.QuickTrain(X, y)
    predictions := result.Predict(X_test)
    
    // Or use AutoML for automatic tuning
    best := lightgbm.AutoFit(X, y)
    
    // Load Python LightGBM models directly!
    model := lightgbm.NewLGBMClassifier()
    model.LoadModel("python_model.txt") // Full compatibility!
    predictions, _ := model.Predict(X_test)
}

Option 2: Classic Linear Regression

package main

import (
    "fmt"
    "log"
    
    "github.com/YuminosukeSato/scigo/linear"
    "gonum.org/v1/gonum/mat"
)

func main() {
    // Create and train model - just like scikit-learn!
    model := linear.NewLinearRegression()
    
    // Training data
    X := mat.NewDense(4, 2, []float64{
        1, 1,
        1, 2,
        2, 2,
        2, 3,
    })
    y := mat.NewDense(4, 1, []float64{
        2, 3, 3, 4,
    })
    
    // Fit the model
    if err := model.Fit(X, y); err != nil {
        log.Fatal(err)
    }
    
    // Make predictions
    XTest := mat.NewDense(2, 2, []float64{
        1.5, 1.5,
        2.5, 3.5,
    })
    predictions, _ := model.Predict(XTest)
    
    fmt.Println("Ready, Set, SciGo! Predictions:", predictions)
}

πŸ“š API Documentation

GoDoc

πŸ“– Package Documentation

Package Description Go Doc
sklearn/lightgbm 🌲 LightGBM with Python model compatibility & convenience features GoDoc
sklearn/linear_model Linear models with full scikit-learn compatibility GoDoc
preprocessing Data preprocessing utilities (StandardScaler, MinMaxScaler, OneHotEncoder) GoDoc
linear Linear machine learning algorithms (LinearRegression) GoDoc
metrics Model evaluation metrics (MSE, RMSE, MAE, RΒ², MAPE) GoDoc
core/model Base model with weight export/import and scikit-learn compatibility GoDoc

πŸ“‹ Complete API Examples

The documentation includes comprehensive examples for all major APIs. Visit the Go Doc links above or use go doc locally:

# View package documentation
go doc github.com/YuminosukeSato/scigo/preprocessing
go doc github.com/YuminosukeSato/scigo/linear
go doc github.com/YuminosukeSato/scigo/metrics

# View specific function documentation
go doc github.com/YuminosukeSato/scigo/preprocessing.StandardScaler.Fit
go doc github.com/YuminosukeSato/scigo/linear.LinearRegression.Predict
go doc github.com/YuminosukeSato/scigo/metrics.MSE

# Run example tests
go test -v ./preprocessing -run Example
go test -v ./linear -run Example
go test -v ./metrics -run Example

πŸ“š Algorithms

Supervised Learning

Linear Models

  • βœ… Linear Regression - Full scikit-learn compatible implementation with QR decomposition
  • βœ… SGD Regressor - Stochastic Gradient Descent for large-scale learning
  • βœ… SGD Classifier - Linear classifiers with SGD training
  • βœ… Passive-Aggressive - Online learning for classification and regression

Data Preprocessing

  • βœ… StandardScaler - Standardizes features by removing mean and scaling to unit variance
  • βœ… MinMaxScaler - Scales features to a given range (e.g., [0,1] or [-1,1])
  • βœ… OneHotEncoder - Encodes categorical features as one-hot numeric arrays

Tree-based Models

  • βœ… LightGBM - Full Python model compatibility (.txt/JSON/string formats)
    • LGBMClassifier - Binary and multiclass classification
    • LGBMRegressor - Regression with multiple objectives
    • QuickTrain - One-liner training with automatic model selection
    • AutoFit - Automatic hyperparameter tuning
    • Superior to leaves - training + convenience features
  • 🚧 Random Forest (Coming Soon)
  • 🚧 XGBoost compatibility (Coming Soon)

Unsupervised Learning

Clustering

  • βœ… MiniBatch K-Means - Scalable K-Means for large datasets
  • 🚧 DBSCAN (Coming Soon)
  • 🚧 Hierarchical Clustering (Coming Soon)

Special Features

Online Learning & Streaming

  • βœ… Incremental Learning - Update models with new data batches
  • βœ… Partial Fit - scikit-learn compatible online learning
  • βœ… Concept Drift Detection - DDM and ADWIN algorithms
  • βœ… Streaming Pipelines - Real-time data processing with channels

🎯 scikit-learn Compatibility

SciGo implements the familiar scikit-learn API with full compatibility:

// Just like scikit-learn!
model.Fit(X, y)              // Train the model
model.Predict(X)              // Make predictions  
model.Score(X, y)             // Evaluate the model
model.PartialFit(X, y)        // Incremental learning

// New in v0.3.0 - Full scikit-learn compatibility
model.GetParams(deep)         // Get model parameters
model.SetParams(params)       // Set model parameters
weights, _ := model.ExportWeights()  // Export model weights
model.ImportWeights(weights)  // Import with guaranteed reproducibility

// Streaming - unique to Go!
model.FitStream(ctx, dataChan) // Streaming training

πŸ†• New Features in v0.3.0

  • Complete Weight Reproducibility - Guaranteed identical outputs with same weights
  • gRPC/Protobuf Support - Distributed training and prediction
  • Full Parameter Management - GetParams/SetParams for all models
  • Model Serialization - Export/Import with full precision

πŸ“Š Performance Benchmarks

SciGo leverages Go's concurrency for exceptional performance:

Algorithm Dataset Size SciGo scikit-learn (Python) Speedup
Linear Regression 1MΓ—100 245ms 890ms 3.6Γ—
SGD Classifier 500KΓ—50 180ms 520ms 2.9Γ—
MiniBatch K-Means 100KΓ—20 95ms 310ms 3.3Γ—
Streaming SGD 1M streaming 320ms 1.2s 3.8Γ—

Benchmarks on MacBook Pro M2, 16GB RAM

Memory Efficiency

Dataset Size Memory Allocations
100Γ—10 22.8KB 22
1,000Γ—10 191.8KB 22
10,000Γ—20 3.4MB 57
50,000Γ—50 41.2MB 61

πŸ—οΈ Architecture

scigo/
β”œβ”€β”€ linear/           # Linear models
β”œβ”€β”€ sklearn/          # scikit-learn compatible implementations
β”‚   β”œβ”€β”€ linear_model/ # SGD, Passive-Aggressive
β”‚   β”œβ”€β”€ cluster/      # Clustering algorithms
β”‚   └── drift/        # Concept drift detection
β”œβ”€β”€ metrics/          # Evaluation metrics
β”œβ”€β”€ core/            # Core abstractions
β”‚   β”œβ”€β”€ model/       # Base model interfaces
β”‚   β”œβ”€β”€ tensor/      # Tensor operations
β”‚   └── parallel/    # Parallel processing
β”œβ”€β”€ datasets/        # Dataset utilities
└── examples/        # Usage examples

πŸ“Š Metrics

Comprehensive evaluation metrics with full documentation:

πŸ§ͺ Testing & Quality

# Run tests
go test ./...

# Run benchmarks
go test -bench=. -benchmem ./...

# Check coverage (76.7% overall coverage)
go test -cover ./...

# Run linter (errcheck, govet, ineffassign, staticcheck, unused, misspell)
make lint-full

# Run examples to see API usage
go test -v ./preprocessing -run Example
go test -v ./linear -run Example
go test -v ./metrics -run Example
go test -v ./core/model -run Example

Quality Gates

  • βœ… Test Coverage: 76.7% (target: 70%+)
  • βœ… Linting: golangci-lint with comprehensive checks
  • βœ… Documentation: Complete godoc for all public APIs
  • βœ… Examples: Comprehensive example functions for all major APIs

πŸ“š Examples

Check out the examples directory:

🀝 Contributing

We welcome contributions! Please see our Contributing Guide.

Development Setup

# Clone the repository
git clone https://github.com/YuminosukeSato/scigo.git
cd scigo

# Install dependencies
go mod download

# Run tests
go test ./...

# Run linter
golangci-lint run

πŸš€ Continuous Delivery (CD)

SciGo uses automated continuous delivery for releases:

  • Automatic Release: Every push to the main branch triggers an automatic patch version release
  • Version Management: Versions are automatically incremented (e.g., 0.4.0 β†’ 0.4.1)
  • Release Assets: Binaries for Linux, macOS, and Windows are automatically built and attached
  • Docker Images: Docker images are automatically built and pushed to GitHub Container Registry (ghcr.io)
  • Documentation: pkg.go.dev is automatically updated with the latest version

Release Process

  1. Merge PR to main: When a PR is merged to main branch
  2. Automatic Tests: CI runs all tests and coverage checks
  3. Version Bump: Patch version is automatically incremented
  4. Create Release: GitHub Release is created with:
    • Multi-platform binaries (Linux, macOS, Windows)
    • Release notes from CHANGELOG.md
    • Docker image at ghcr.io/yuminosukesato/scigo:VERSION
  5. Post-Release: An issue is created to track post-release verification tasks

Manual Release

For major or minor version releases, create and push a tag manually:

git tag v0.5.0 -m "Release v0.5.0"
git push origin v0.5.0

This will trigger the release workflow via the existing release.yml workflow.

πŸ—ΊοΈ Roadmap

Phase 1: Core ML (Current)

  • βœ… Linear models
  • βœ… Online learning
  • βœ… Basic clustering
  • 🚧 Tree-based models

Phase 2: Advanced Features

  • Neural Networks (MLP)
  • Deep Learning integration
  • Model serialization (ONNX export)
  • GPU acceleration

Phase 3: Enterprise Features

  • Distributed training
  • AutoML capabilities
  • Model versioning
  • A/B testing framework

πŸ“– Documentation

Core Documentation

API Quick Reference

API Package Documentation
StandardScaler preprocessing pkg.go.dev/preprocessing.StandardScaler
MinMaxScaler preprocessing pkg.go.dev/preprocessing.MinMaxScaler
OneHotEncoder preprocessing pkg.go.dev/preprocessing.OneHotEncoder
LinearRegression linear pkg.go.dev/linear.LinearRegression
BaseEstimator core/model pkg.go.dev/model.BaseEstimator

Migration & Advanced Guides

πŸ™ Acknowledgments

πŸ“„ License

SciGo is licensed under the MIT License. See LICENSE for details.

πŸ“§ Contact


πŸš€ Ready, Set, SciGo! πŸš€

Where Science Meets Go - Say goodbye to slow ML!

Made with ❀️ and lots of β˜• in Go
### Running scikit-learn parity tests

Development-only parity tests compare the Go implementation against scikit-learn outputs. They are not part of the default go test; use the parity build tag explicitly.

Steps

  1. Generate golden data
    • Use uv instead of pip.
    • Command: uv run --with scikit-learn --with numpy --with scipy python scripts/golden/gen_logreg.py
  2. Run parity tests
    • Command: go test ./sklearn/linear_model -tags=parity -run Parity -v

One-liner

make parity-linear

Notes

  • Current LogisticRegression uses simplified gradient descent. After implementing lbfgs/newton-cg, tolerances will be tightened.
  • Golden file is written to tests/golden/logreg_case1.json.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors