Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,42 @@ production stability.

------------------------------------------------------------------------

## [0.4.0] - 2026-03-19

Integrity Verification Layer

This release introduces a complete integrity verification system for coldkeep,
covering metadata consistency, container structure validation, and full
end-to-end data integrity checks.

The system is designed in three verification levels:

- Standard: metadata integrity checks
- Full: metadata + container structure and hash validation
- Deep: full physical verification by reading container data and recomputing chunk hashes

### Added
- `verify system` command with three verification levels (standard, full, deep)
- `verify file <id>` command with per-file verification (standard, full, deep)
- Deep verification logic that reads container data and validates chunk hashes
- Record-level validation (header hash + stored size + data hash)
- Container-wide integrity verification across all sealed containers
- Comprehensive integration tests for verification (positive and corruption scenarios)

### Improved
- Verification coverage across file, chunk, and container layers
- Error reporting with aggregated verification failures
- Internal consistency checks for chunk offsets, sizes, and container bounds

### Notes
- Deep verification performs full disk reads and may be slow on large datasets
- Whole-container compression is still present but will be removed in a future release in favor of block-level compression

coldkeep remains an experimental research project and is not production ready.
The on-disk format may change before v1.0.

------------------------------------------------------------------------

## [0.3.0] - 2026-03-15

Safe garbage collection foundation.
Expand Down
70 changes: 61 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
> **Status:** Experimental research projec.\
> **Not production-ready. Do not use for real or sensitive data.**

coldkeep is an experimental **local-first content-addressed file storage engine**
coldkeep is an experimental **local-first content-addressed file storage engine with verifiable integrity**
written in Go.

Files are split into **content-addressed chunks**, packed into
Expand All @@ -34,6 +34,54 @@ storage.
- Run garbage collection to remove unreferenced chunks.
- Recover safely from interrupted operations on startup.
- Display storage statistics and container health information.
- Multi-level integrity verification (metadata, container structure, and full data integrity)

------------------------------------------------------------------------

## Verification

coldkeep provides a multi-level integrity verification system to ensure
consistency and detect corruption across metadata and stored data.

### Levels

- **Standard**
- Validates metadata integrity
- Checks reference counts, chunk ordering, and orphan records

- **Full**
- Includes all standard checks
- Verifies container files exist and match recorded sizes
- Validates container hashes and chunk-to-container consistency

- **Deep**
- Includes all full checks
- Reads container data and recomputes chunk hashes
- Detects physical data corruption at the byte level

### Usage

Verify the entire system:

```bash
coldkeep verify system --level standard
coldkeep verify system --level full
coldkeep verify system --level deep
```

Verify an specific file

```bash
coldkeep verify file <file_id> --level standard
coldkeep verify file <file_id> --level full
coldkeep verify file <file_id> --level deep
```

### Notes

Deep verification performs full reads of container files and may be slow

Recommended for periodic integrity audits rather than frequent execution

------------------------------------------------------------------------

Expand Down Expand Up @@ -76,13 +124,17 @@ recover safely on startup.
│ └─ coldkeep/ # CLI entrypoint
├─ internal/
│ ├─ container/ # container format + container management
│ ├─ chunk/ # chunking and compression logic
│ ├─ db/ # database connection helpers
│ ├─ storage/ # store / restore / remove pipeline
│ ├─ maintenance/ # gc and stats
│ ├─ listing/ # file listing operations
│ └─ utils/ # small helper utilities
│ ├─ chunk/ # chunking and compression logic
│ ├─ container/ # container format + container management
│ ├─ db/ # database connection helpers
│ ├─ listing/ # file listing operations
│ ├─ maintenance/ # gc, stats, and verify_command
│ ├─ recovery # system recovery logic
│ ├─ storage/ # store / restore / remove pipeline
│ ├─ utils_compresion/ # small compresion helper utilities
│ ├─ utils_env/ # small env helper utilities
│ ├─ utils_print/ # small print helper utilities
│ └─ verify/ # verify logc for system or file
├─ tests/ # integration tests
├─ scripts/ # smoke / development scripts
Expand Down Expand Up @@ -278,7 +330,7 @@ data.

## Build

go build ./cmd/coldkeep
go build -o coldkeep ./cmd/coldkeep

## Tests

Expand Down
105 changes: 75 additions & 30 deletions cmd/coldkeep/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@ import (
"github.com/franchoy/coldkeep/internal/maintenance"
"github.com/franchoy/coldkeep/internal/recovery"
"github.com/franchoy/coldkeep/internal/storage"
"github.com/franchoy/coldkeep/internal/verify"
)

const version = "0.3.0"
const version = "0.4.0"

func main() {

Expand Down Expand Up @@ -89,17 +90,60 @@ func main() {
err = listing.SearchFiles(os.Args[2:])

case "verify":
var target string
var verifyLevel verify.VerifyLevel
var fileID int64
//target can be "system" or "file"
if len(os.Args) > 2 {
switch os.Args[2] {
case "--full", "--full-check", "full", "full-check":
err = maintenance.RunVerify(maintenance.VerifyFull)
case "--deep", "--deep-check", "deep", "deep-check":
err = maintenance.RunVerify(maintenance.VerifyDeep)
target = os.Args[2]
switch target {
case "system":
//target is system, verify level can be --standard, --full, or --deep
if len(os.Args) > 3 {
switch os.Args[3] {
case "--standard", "standard", "":
verifyLevel = verify.VerifyStandard
case "--full", "full":
verifyLevel = verify.VerifyFull
case "--deep", "deep":
verifyLevel = verify.VerifyDeep
default:
log.Fatal("Unknown option for system verify: ", os.Args[3])
}
} else {
verifyLevel = verify.VerifyStandard
}
case "file":
if len(os.Args) > 3 {
fileID, err = strconv.ParseInt(os.Args[3], 10, 64)
if err != nil {
log.Fatal("Invalid fileID: ", err)
}
} else {
log.Fatal("Usage: coldkeep verify file <fileID> [--standard|--full|--deep]")
}
if len(os.Args) > 4 {
switch os.Args[4] {
case "--standard", "standard", "":
verifyLevel = verify.VerifyStandard
case "--full", "full":
verifyLevel = verify.VerifyFull
case "--deep", "deep":
verifyLevel = verify.VerifyDeep
default:
log.Fatal("Unknown option for file verify: ", os.Args[4])
}
} else {
verifyLevel = verify.VerifyStandard
}
default:
log.Fatal("Unknown option for verify: ", os.Args[2])
log.Fatal("Unknown target for verify: ", target)

//call verify command with target, fileID, and verifyLevel
err = maintenance.VerifyCommand(target, int(fileID), verifyLevel)
}
} else {
err = maintenance.RunVerify(maintenance.VerifyStandard)
log.Fatal("Usage: coldkeep verify file <fileID> [--standard|--full|--deep]")
}

default:
Expand All @@ -117,33 +161,34 @@ func main() {
}

func printHelp() {
fmt.Println("coldkeep (V0.3.0)")
fmt.Println("coldkeep (V0.4.0)")
fmt.Println()
fmt.Println("Usage:")
fmt.Println(" coldkeep <command> [arguments]")
fmt.Println()
fmt.Println("Commands:")
fmt.Println(" store <file> Store a single file")
fmt.Println(" store-folder <folder> Store all files in a folder recursively")
fmt.Println(" restore <fileID> <dir> Restore file by ID into directory")
fmt.Println(" remove <fileID> Remove logical file (decrement refcounts)")
fmt.Println(" gc [options] Run garbage collection")
fmt.Println(" (no options) Perform standard GC")
fmt.Println(" gc --dry-run Show what would be removed without deleting")
fmt.Println(" stats Show storage statistics")
fmt.Println(" verify [options] Verify stored files")
fmt.Println(" (no options) Perform standard verification (metadata only)")
fmt.Println(" verify --full Perform full verification (metadata + content)")
fmt.Println(" verify --deep Perform deep verification (metadata + content + checksums)")
fmt.Println(" help Show this help message")
fmt.Println(" version Show version information")
fmt.Println(" list List stored logical files")
fmt.Println(" search [filters] Search files by filters")
fmt.Println()
fmt.Println("Search Filters:")
fmt.Println(" --name <substring>")
fmt.Println(" --min-size <bytes>")
fmt.Println(" --max-size <bytes>")
fmt.Println(" store <file> Store a single file")
fmt.Println(" store-folder <folder> Store all files in a folder recursively")
fmt.Println(" restore <fileID> <dir> Restore file by ID into directory")
fmt.Println(" remove <fileID> Remove logical file (decrement refcounts)")
fmt.Println(" gc [options] Run garbage collection")
fmt.Println(" (no options) Perform standard GC")
fmt.Println(" gc --dry-run Show what would be removed without deleting")
fmt.Println(" stats Show storage statistics")
fmt.Println(" verify [target] [fileID] [options] Verify stored files")
fmt.Println(" [target] can be 'system' or 'file'")
fmt.Println(" [options] can be '--standard', '--full', or '--deep'")
fmt.Println(" no options defaults to '--standard'")
fmt.Println(" verify system [options] Perform system-wide verification")
fmt.Println(" verify file <fileID> [options] Perform verification for specific file")
fmt.Println(" help Show this help message")
fmt.Println(" version Show version information")
fmt.Println(" list List stored logical files")
fmt.Println(" search [filters] Search files by filters")
fmt.Println(" Filters:")
fmt.Println(" --name <substring>")
fmt.Println(" --min-size <bytes>")
fmt.Println(" --max-size <bytes>")
fmt.Println()
fmt.Println("Environment Variables:")
fmt.Println(" DB_HOST")
Expand Down
1 change: 1 addition & 0 deletions db/init.sql
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ CREATE TABLE IF NOT EXISTS container (
id BIGSERIAL PRIMARY KEY,
filename TEXT NOT NULL UNIQUE,
sealed BOOLEAN NOT NULL DEFAULT FALSE,
container_hash TEXT DEFAULT NULL,
quarantine BOOLEAN NOT NULL DEFAULT FALSE,
current_size BIGINT NOT NULL DEFAULT 0 CHECK (current_size >= 0),
max_size BIGINT NOT NULL CHECK (max_size > 0),
Expand Down
6 changes: 3 additions & 3 deletions internal/container/constants.go
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
package container

import (
"github.com/franchoy/coldkeep/internal/utils"
"github.com/franchoy/coldkeep/internal/utils_env"
)

var ContainersDir = utils.GetenvOrDefault("COLDKEEP_STORAGE_DIR", "./storage/containers")
var ContainersDir = utils_env.GetenvOrDefault("COLDKEEP_STORAGE_DIR", "./storage/containers")

var containerMaxSize = utils.GetenvOrDefaultInt64("COLDKEEP_CONTAINER_MAX_SIZE_MB", 64) * 1024 * 1024 //MB
var containerMaxSize = utils_env.GetenvOrDefaultInt64("COLDKEEP_CONTAINER_MAX_SIZE_MB", 64) * 1024 * 1024 //MB

// GetContainerMaxSize returns the current container max size
func GetContainerMaxSize() int64 {
Expand Down
33 changes: 27 additions & 6 deletions internal/container/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import (
"time"

"github.com/franchoy/coldkeep/internal/db"
"github.com/franchoy/coldkeep/internal/utils"
"github.com/franchoy/coldkeep/internal/utils_compression"
)

func GetOrCreateOpenContainer(db db.DBTX) (int64, string, int64, error) {
Expand Down Expand Up @@ -139,7 +139,7 @@ func SealContainer(tx db.DBTX, containerID int64, filename string) error {
originalPath := filepath.Join(ContainersDir, filename)

// Compress file
compressedPath, compressed_size, err := utils.CompressFile(originalPath, utils.DefaultCompression)
compressedPath, compressed_size, sumHex, err := utils_compression.CompressFile(originalPath, utils_compression.DefaultCompression)
if err != nil {
return err
}
Expand All @@ -149,14 +149,35 @@ func SealContainer(tx db.DBTX, containerID int64, filename string) error {
UPDATE container
SET sealed = TRUE,
compression_algorithm = $1,
compressed_size = $2
WHERE id = $3
`, string(utils.DefaultCompression), compressed_size, containerID)
compressed_size = $2,
container_hash = $3
WHERE id = $4
`, string(utils_compression.DefaultCompression), compressed_size, sumHex, containerID)

if err != nil {
return fmt.Errorf("update/seal container failed: %w", err)
}

fmt.Printf("Container %d sealed and compressed with type %s : %s\n", containerID, utils.DefaultCompression, compressedPath)
fmt.Printf("Container %d sealed and compressed with type %s : %s\n", containerID, utils_compression.DefaultCompression, compressedPath)
return nil
}

func CheckContainerHashFile(id int, filename, storedHash string) error {
containerPath := filepath.Join(ContainersDir, filename)

computedHash, err := utils_compression.ComputeFileHashHex(containerPath)
if err != nil {
return fmt.Errorf("compute container file hash: %w", err)
}

//if stored has is null or empty, we can skip the check (for backward compatibility with old containers)
if len(storedHash) == 0 || storedHash == "null" || storedHash == "NULL" {
return fmt.Errorf("container file hash is missing in db for container %d, calculated hash: %s", id, computedHash)
}

if computedHash != storedHash {
return fmt.Errorf("container file hash mismatch for container %d: expected %s, got %s", id, storedHash, computedHash)
}

return nil
}
4 changes: 2 additions & 2 deletions internal/db/db.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (
"os"
"strings"

"github.com/franchoy/coldkeep/internal/utils"
"github.com/franchoy/coldkeep/internal/utils_env"
_ "github.com/lib/pq"
)

Expand All @@ -16,7 +16,7 @@ func ConnectDB() (*sql.DB, error) {
" user=" + os.Getenv("DB_USER") +
" password=" + os.Getenv("DB_PASSWORD") +
" dbname=" + os.Getenv("DB_NAME") +
" sslmode=" + utils.GetenvOrDefault("DB_SSLMODE", "disable")
" sslmode=" + utils_env.GetenvOrDefault("DB_SSLMODE", "disable")
safeConnStr := strings.ReplaceAll(connStr, "password="+os.Getenv("DB_PASSWORD"), "password=***")

log.Printf("Connecting to DB with: %s", safeConnStr) // Log the connection string (without password)
Expand Down
Loading
Loading