Summary
The download_file_from_gcs method calls requests.get() with no timeout parameter. The Python requests library has no default timeout, so a stalled connection will block indefinitely. This is directly in the model initialization path, meaning any application calling TextEmbedding(model_name) or similar can hang forever.
Affected Code
File: fastembed/common/model_management.py, line 102
@classmethod
def download_file_from_gcs(cls, url: str, output_path: str, show_progress: bool = True) -> str:
if os.path.exists(output_path):
return output_path
response = requests.get(url, stream=True) # No timeout
Reproduction
# Simulate stalled server
sudo iptables -A OUTPUT -d storage.googleapis.com -j DROP
# Then in Python:
from fastembed import TextEmbedding
model = TextEmbedding("BAAI/bge-small-en-v1.5") # Hangs indefinitely
Impact
- Application deadlock: any service using fastembed in its initialization path hangs forever if the download stalls
- No error recovery: no exception is raised, so retry logic and circuit breakers are bypassed
- Silent resource exhaustion: threads/coroutines waiting on the download are blocked indefinitely, eventually exhausting worker pools
- Especially problematic in serverless (Lambda, Cloud Run) or container environments with strict execution time limits — the process hangs until the platform kills it
Suggested Fix
Add a connect+read timeout:
response = requests.get(
url,
stream=True,
timeout=(10, 60), # (connect_timeout, read_timeout) in seconds
)
The tuple form (connect_timeout, read_timeout) is recommended: short connect timeout (10s) to detect unreachable servers, longer read timeout (60s) to allow slow transfers without hanging forever.
Found via automated codebase analysis. Confirmed independently by two reviewers (Claude, Gemini).
Summary
The
download_file_from_gcsmethod callsrequests.get()with notimeoutparameter. The Pythonrequestslibrary has no default timeout, so a stalled connection will block indefinitely. This is directly in the model initialization path, meaning any application callingTextEmbedding(model_name)or similar can hang forever.Affected Code
File:
fastembed/common/model_management.py, line 102Reproduction
Impact
Suggested Fix
Add a connect+read timeout:
The tuple form
(connect_timeout, read_timeout)is recommended: short connect timeout (10s) to detect unreachable servers, longer read timeout (60s) to allow slow transfers without hanging forever.Found via automated codebase analysis. Confirmed independently by two reviewers (Claude, Gemini).