A Pandas-like DataFrame library for Go, powered by DuckDB.
DuckFrame brings the familiar DataFrame experience to Go, using DuckDB as the execution engine. Get the simplicity of Pandas with the performance of a vectorized columnar database — all in a single binary, no Python required.
db, _ := engine.Open("")
defer db.Close()
df, _ := duckframe.ReadCSV(db, "employees.csv")
defer df.Close()
result, _ := df.Filter("age > 30").
GroupBy("country").
Agg("salary", "mean")
result.Show()
// DataFrame [3 rows x 2 cols]
// country mean(salary)
// --------- ------------
// Brazil 94000.41
// Germany 82500.37
// USA 70000.00- Pandas-like API —
Filter,Select,GroupBy,Agg,Sort,Join,Union,Describe... - DuckDB-powered — vectorized execution, automatic parallelism, zero-copy Parquet/CSV reads
- Multi-format — CSV, Parquet, JSON Lines (read & write)
- External connectors — SQLite, PostgreSQL, MySQL, any
database/sqldriver - Concurrency —
ParallelApply,ReadCSVChunked, context-aware operations - SQL escape hatch —
df.Sql("SELECT ... FROM {df} WHERE ...") - Type-safe materialization —
ToSlice(&[]MyStruct{})with struct tag mapping - Fluent error handling — chain operations safely, check errors at the end
- Single binary — no runtime dependencies, CGO for DuckDB only
go get github.com/lserra/duckframeRequirement: CGO must be enabled (
CGO_ENABLED=1) and a C compiler available (gcc/clang).
- macOS:
xcode-select --install- Ubuntu/Debian:
sudo apt install build-essential
package main
import (
"fmt"
"log"
"github.com/lserra/duckframe"
"github.com/lserra/duckframe/internal/engine"
)
func main() {
// Open in-memory DuckDB
db, err := engine.Open("")
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Read CSV
df, err := duckframe.ReadCSV(db, "data/employees.csv")
if err != nil {
log.Fatal(err)
}
defer df.Close()
// Explore
df.Show()
r, c, _ := df.Shape()
fmt.Printf("Rows: %d, Cols: %d\n", r, c)
// Filter + Select
result, _ := df.Filter("salary > 80000")
defer result.Close()
selected, _ := result.Select("name", "country", "salary")
defer selected.Close()
selected.Show()
// GroupBy + Agg
grouped, _ := df.GroupBy("country").Agg("salary", "mean")
defer grouped.Close()
grouped.Show()
// Raw SQL
top, _ := df.Sql("SELECT name, salary FROM {df} ORDER BY salary DESC LIMIT 3")
defer top.Close()
top.Show()
// Materialize to structs
type Employee struct {
Name string `df:"name"`
Salary float64 `df:"salary"`
}
var emps []Employee
df.ToSlice(&emps)
for _, e := range emps {
fmt.Printf("%s: $%.0f\n", e.Name, e.Salary)
}
}| Function | Description |
|---|---|
New(db, columns, rows) |
From Go data (maps) |
FromQuery(db, sql) |
From any SQL query |
ReadCSV(db, path) |
Read CSV file |
ReadParquet(db, path) |
Read Parquet file |
ReadJSON(db, path) |
Read JSON Lines file |
ReadSQLite(db, path, table) |
Read from SQLite |
ReadPostgres(db, dsn, query) |
Read from PostgreSQL |
ReadMySQL(db, dsn, query) |
Read from MySQL |
ReadFromDB(db, extDB, query) |
Read from any database/sql |
| Method | Description |
|---|---|
Select(cols...) |
Select columns |
Filter(expr) |
Filter rows with SQL expression |
Sort(col, asc) |
Sort by column |
Limit(n) |
First n rows |
Distinct() |
Remove duplicates |
GroupBy(cols...).Agg(col, fn) |
Aggregate (mean, sum, count, min, max) |
Join(other, on, how) |
Join (inner, left, right, full) |
Union(other) |
Combine DataFrames |
Rename(old, new) |
Rename column |
Drop(cols...) |
Drop columns |
WithColumn(name, expr) |
Add/replace computed column |
Head(n) / Tail(n) |
First/last n rows |
Sql(query) |
Raw SQL with {df} placeholder |
| Method | Description |
|---|---|
Show(maxRows...) |
Print formatted table |
Shape() |
(rows, cols) |
Columns() |
Column names |
Dtypes() |
Column types |
Describe() |
Descriptive statistics |
| Method | Description |
|---|---|
Collect() |
To []map[string]interface{} |
ToSlice(&dest) |
To slice of structs (via df tags) |
WriteCSV(path) |
Export to CSV |
WriteParquet(path) |
Export to Parquet |
WriteJSON(path) |
Export to JSON |
| Function/Method | Description |
|---|---|
ParallelApply(dfs, fn) |
Apply function to multiple DFs in parallel |
ReadCSVChunked(ctx, db, path, size) |
Stream CSV in chunks |
FilterContext(ctx, expr) |
Context-aware filter |
SortContext(ctx, col, asc) |
Context-aware sort |
FromQueryContext(ctx, db, sql) |
Context-aware query |
ReadCSVContext(ctx, db, path) |
Context-aware CSV read |
See the examples/ directory for complete, runnable examples:
| Example | Description |
|---|---|
basic |
Core operations: read, filter, select, group, SQL |
etl |
CSV → filter → Parquet pipeline |
analysis |
Exploratory data analysis |
concurrent |
Parallel processing & chunked reading |
http-api |
REST API serving DataFrame queries |
Performance comparison across Go and Python DataFrame libraries. Tests run on macOS, Intel 8-core, with 3 iterations averaged (1 for Gota on 1M rows).
See
benchmarks/for full scripts and reproducible methodology.
| Operation | DuckFrame (Go) | Gota (Go) | DuckDB (Python) | Pandas | Polars |
|---|---|---|---|---|---|
| ReadCSV | 102 | 130 | 132 | 58 | 5 |
| Filter | 29 | 9 | 28 | 3 | 1 |
| Sort | 72 | 89 | 67 | 9 | 6 |
| GroupBy+Agg | 6 | 252 | 5 | 6 | 2 |
| Join | 68 | 142 | 66 | 10 | 5 |
| Select | 29 | 2 | 27 | 3 | 1 |
| WriteCSV | 43 | 69 | 48 | 217 | 6 |
| Operation | DuckFrame (Go) | Gota (Go) | DuckDB (Python) | Pandas | Polars |
|---|---|---|---|---|---|
| ReadCSV | 209 | 1,434 | 217 | 592 | 38 |
| Filter | 203 | 90 | 138 | 32 | 7 |
| Sort | 321 | 1,377 | 126 | 151 | 73 |
| GroupBy+Agg | 21 | 2,463 | 11 | 61 | 7 |
| Join | 119 | 1,328 | 94 | 130 | 38 |
| Select | 45 | 9 | 37 | 37 | 2 |
| WriteCSV | 218 | 679 | 161 | 2,310 | 45 |
- DuckFrame vs Gota: DuckFrame dominates on analytical queries (GroupBy 45-116x faster, Join 2-11x, Sort 1.2-4.3x). Gota wins on simple column selections (in-memory pointer operations).
- DuckFrame vs DuckDB Python: Near-identical performance — both use the same DuckDB engine. DuckFrame adds Go's type safety and single-binary deployment.
- DuckFrame vs Pandas: DuckFrame is 3-5x faster on GroupBy/Sort/Join at scale. Pandas wins on simple filters and selects with small data due to NumPy vectorization overhead.
- Polars is the fastest overall, leveraging Rust's zero-cost abstractions and Apache Arrow columnar format.
- DuckFrame's sweet spot: Go applications needing analytical queries (GroupBy, Join, Sort) without Python dependencies. Best-in-class for Go DataFrames.
make build # Build
make test # Run tests
make coverage # Coverage report
make lint # golangci-lint
make vet # go vet
make fmt # Format code
make all # All of the above| Phase | Status |
|---|---|
| 0 — Setup | ✅ |
| 1 — Core | ✅ |
| 2 — MVP | ✅ |
| 3 — Fluent API | ✅ |
| 4 — Data Formats | ✅ |
| 5 — Advanced Operations | ✅ |
| 6 — Concurrency & Streaming | ✅ |
| 7 — External Connectors | ✅ |
| 8 — Quality & Tooling | ✅ |
| 9 — Docs & Examples | ✅ |
| 10 — Benchmarks & Launch | ✅ |
See ROADMAP.md for the detailed plan.
See CONTRIBUTING.md for guidelines.
If you prefer, you can also pay me a coffe, just click on the link below:
MIT — see LICENSE for details.
DuckFrame — DataFrames for Go. Powered by DuckDB. Built for engineers.