jonpspri · jonpspri · Oct 9, 2025 · Oct 9, 2025 · Oct 9, 2025 · Oct 9, 2025
diff --git a/README.md b/README.md
@@ -14,14 +14,16 @@ Model Context Protocol (MCP).
 
 ## Features
 
-- 🔄 **Complete Data Operations** - Load, transform, analyze, and export CSV data
+- 🔄 **Complete Data Operations** - Load, transform, and analyze CSV data from
+  URLs and string content
 - 📊 **Advanced Analytics** - Statistics, correlations, outlier detection, data
   profiling
 - ✅ **Data Validation** - Schema validation, quality scoring, anomaly detection
 - 🎯 **Stateless Design** - Clean MCP architecture with external context
   management
-- ⚡ **High Performance** - Handles large datasets with streaming and chunking
+- ⚡ **High Performance** - Async I/O, streaming downloads, chunked processing
 - 🔒 **Session Management** - Multi-user support with isolated sessions
+- 🛡️ **Web-Safe** - No file system access; designed for secure web hosting
 - 🌟 **Code Quality** - Zero ruff violations, 100% mypy compliance, perfect MCP
   documentation standards, comprehensive test coverage
 
@@ -71,8 +73,9 @@ uv run databeak --transport http --host 0.0.0.0 --port 8000
 Once configured, ask your AI assistant:
 
 ```text
-"Load a CSV file and show me basic statistics"
-"Remove duplicate rows and export as Excel"
+"Load this CSV data: name,price\nWidget,10.99\nGadget,25.50"
+"Load CSV from URL: https://example.com/data.csv"
+"Remove duplicate rows and show me the statistics"
 "Find outliers in the price column"
 ```
 
@@ -91,34 +94,47 @@ Once configured, ask your AI assistant:
 
 ## Environment Variables
 
-| Variable                    | Default | Description               |
-| --------------------------- | ------- | ------------------------- |
-| `DATABEAK_MAX_FILE_SIZE_MB` | 1024    | Maximum file size         |
-| `DATABEAK_CSV_HISTORY_DIR`  | "."     | History storage location  |
-| `DATABEAK_SESSION_TIMEOUT`  | 3600    | Session timeout (seconds) |
+Configure DataBeak behavior with environment variables (all use `DATABEAK_`
+prefix):
+
+| Variable                              | Default   | Description                        |
+| ------------------------------------- | --------- | ---------------------------------- |
+| `DATABEAK_SESSION_TIMEOUT`            | 3600      | Session timeout (seconds)          |
+| `DATABEAK_MAX_DOWNLOAD_SIZE_MB`       | 100       | Maximum URL download size (MB)     |
+| `DATABEAK_MAX_MEMORY_USAGE_MB`        | 1000      | Max DataFrame memory (MB)          |
+| `DATABEAK_MAX_ROWS`                   | 1,000,000 | Max DataFrame rows                 |
+| `DATABEAK_URL_TIMEOUT_SECONDS`        | 30        | URL download timeout               |
+| `DATABEAK_HEALTH_MEMORY_THRESHOLD_MB` | 2048      | Health monitoring memory threshold |
+
+See [settings.py](src/databeak/core/settings.py) for complete configuration
+options.
 
 ## Known Limitations
 
 DataBeak is designed for interactive CSV processing with AI assistants. Be aware
 of these constraints:
 
-- **File Size**: Maximum 1024MB per file (configurable via
-  `DATABEAK_MAX_FILE_SIZE_MB`)
+- **Data Loading**: URLs and string content only (no local file system access
+  for web hosting security)
+- **Download Size**: Maximum 100MB per URL download (configurable via
+  `DATABEAK_MAX_DOWNLOAD_SIZE_MB`)
+- **DataFrame Size**: Maximum 1GB memory and 1M rows per DataFrame
+  (configurable)
 - **Session Management**: Maximum 100 concurrent sessions, 1-hour timeout
   (configurable)
 - **Memory**: Large datasets may require significant memory; monitor with
-  `system_info` tool
+  `health_check` tool
 - **CSV Dialects**: Assumes standard CSV format; complex dialects may require
   pre-processing
-- **Concurrency**: Single-threaded processing per session; parallel sessions
+- **Concurrency**: Async I/O for concurrent URL downloads; parallel sessions
   supported
 - **Data Types**: Automatic type inference; complex types may need explicit
   conversion
 - **URL Loading**: HTTPS only; blocks private networks (127.0.0.1, 192.168.x.x,
   10.x.x.x) for security
 
-For production deployments with larger datasets, consider adjusting environment
-variables and monitoring resource usage.
+For production deployments with larger datasets, adjust environment variables
+and monitor resource usage with `health_check` and `get_server_info` tools.
 
 ## Contributing
 

diff --git a/docs/api/index.md b/docs/api/index.md
@@ -13,12 +13,10 @@ comprehensive error handling.
 
 ### 📁 I/O Operations
 
-Tools for loading and exporting CSV data in various formats:
+Tools for loading CSV data from web sources:
 
-- **`load_csv`** - Load CSV from file path
 - **`load_csv_from_url`** - Load CSV from HTTP/HTTPS URL
 - **`load_csv_from_content`** - Load CSV from string content
-- **`export_csv`** - Export to CSV, JSON, Excel, Parquet, HTML, Markdown
 - **`get_session_info`** - Get current session details and statistics
 - **`list_sessions`** - List all active sessions
 - **`close_session`** - Close and cleanup a session
@@ -60,14 +58,11 @@ Tools for schema validation and quality checking:
 
 ### 🔄 Session Management
 
-Tools for managing data sessions and workflow:
+Tools for managing data sessions:
 
-- **`configure_auto_save`** - Set up automatic saving strategies
-- **`get_auto_save_status`** - Check current auto-save configuration
-- **`undo`** - Undo the last operation
-- **`redo`** - Redo previously undone operation
-- **`get_history`** - View operation history
-- **`restore_to_operation`** - Restore to specific point in history
+- **`list_sessions`** - List all active sessions
+- **`close_session`** - Close and cleanup a session
+- **`get_session_info`** - Get session metadata and statistics
 
 ### ⚙️ System Tools
 
@@ -129,15 +124,20 @@ Filter operations support complex conditions:
 
 ### Environment Configuration
 
-All tools respect these environment variables:
+All tools respect these environment variables (all use `DATABEAK_` prefix):
+
+| Variable                              | Default   | Purpose                          |
+| ------------------------------------- | --------- | -------------------------------- |
+| `DATABEAK_SESSION_TIMEOUT`            | 3600      | Session timeout (seconds)        |
+| `DATABEAK_MAX_DOWNLOAD_SIZE_MB`       | 100       | Maximum URL download size (MB)   |
+| `DATABEAK_MAX_MEMORY_USAGE_MB`        | 1000      | Max DataFrame memory (MB)        |
+| `DATABEAK_MAX_ROWS`                   | 1,000,000 | Max DataFrame rows               |
+| `DATABEAK_URL_TIMEOUT_SECONDS`        | 30        | URL download timeout (seconds)   |
+| `DATABEAK_HEALTH_MEMORY_THRESHOLD_MB` | 2048      | Health monitoring threshold (MB) |
 
-| Variable                    | Default | Purpose                   |
-| --------------------------- | ------- | ------------------------- |
-| `DATABEAK_MAX_FILE_SIZE_MB` | 1024    | Maximum file size         |
-| `DATABEAK_CSV_HISTORY_DIR`  | "."     | History storage location  |
-| `DATABEAK_SESSION_TIMEOUT`  | 3600    | Session timeout (seconds) |
-| `DATABEAK_CHUNK_SIZE`       | 10000   | Processing chunk size     |
-| `DATABEAK_AUTO_SAVE`        | true    | Enable auto-save          |
+See
+[DatabeakSettings](https://github.com/jonpspri/databeak/blob/main/src/databeak/core/settings.py)
+for all configuration options.
 
 ## Advanced Features
 

diff --git a/docs/tutorials/quickstart.md b/docs/tutorials/quickstart.md
@@ -16,12 +16,16 @@ process a sample sales dataset using natural language commands.
 
 ## Step 1: Load Your Data
 
-Ask your AI assistant:
+Ask your AI assistant to load data from a URL or paste CSV content:
 
-> "Load the sales data from my CSV file"
+> "Load the sales data from this URL: <https://example.com/sales.csv>"
 
-The AI will use the `load_csv` tool to create a new session and load your data.
-You'll see a response with:
+Or provide CSV content directly:
+
+> "Load this CSV data: name,price,quantity\\nWidget,10.99,5\\nGadget,25.50,3"
+
+The AI will use the `load_csv_from_url` or `load_csv_from_content` tool to
+create a new session and load your data. You'll see a response with:
 
 - Session ID for tracking
 - Data shape (rows × columns)
@@ -88,10 +92,11 @@ For detailed column analysis:
 
 > "Check the overall data quality and give me a quality score"
 
-## Step 6: Export Results
+## Step 6: Save Results
 
-> "Export this cleaned and analyzed data as an Excel file named
-> 'sales_analysis.xlsx'"
+DataBeak processes data in memory for web-based hosting security. To save
+results, export them through your AI assistant which can save files on your
+behalf.
 
 ## Advanced Features
 
@@ -102,11 +107,11 @@ Made a mistake? No problem:
 > "Undo the last operation" "Show me the operation history" "Restore to the
 > state before I added the total_value column"
 
-### Auto-Save Configuration
+### Data Retrieval
 
-Set up automatic saving:
+Get processed data back as CSV content for further use:
 
-> "Export the cleaned data to a new CSV file for further analysis"
+> "Show me the cleaned data as CSV content"
 
 ### Session Management
 
@@ -121,40 +126,40 @@ Work with multiple datasets:
 
 ```python
 # Natural language commands:
-"Load the messy customer data"
+"Load customer data from URL: https://example.com/customers.csv"
 
 "Remove duplicate rows"
 "Fill missing email addresses with 'no-email@domain.com'"
 "Standardize the phone number format"
 "Remove rows where age is negative or over 120"
-"Export the cleaned data"
+"Show me the cleaned data preview"
 ```
 
 ### Analysis Pipeline
 
 ```python
 # Business intelligence workflow:
-"Load quarterly sales data"
+"Load quarterly sales data from URL: https://example.com/q1-sales.csv"
 
 "Filter for completed transactions only"
 "Group by product category and month"
 "Calculate total revenue and average order value"
 "Find the top 10 selling products"
 "Create correlation matrix for price vs quantity vs revenue"
-"Export summary as Excel with charts"
+"Show me the summary statistics"
 ```
 
 ### Data Validation
 
 ```python
 # Quality assurance workflow:
-"Load the new data batch"
+"Load data from this CSV content: [paste CSV here]"
 
 "Validate against the expected schema"
 "Check data quality score"
 "Find any statistical anomalies"
 "Generate a data profiling report"
-"Flag any quality issues for review"
+"Show me any quality issues found"
 ```
 
 ## Tips for Success
@@ -171,18 +176,18 @@ where status equals 'active'"
 
 ### 3. **Chain Operations**
 
-"Load sales.csv, remove duplicates, filter for 2024 data, then calculate monthly
-totals"
+"Load sales data from URL, remove duplicates, filter for 2024 data, then
+calculate monthly totals"
 
-### 4. **Leverage Auto-Save**
+### 4. **Work with Web Data**
 
-DataBeak automatically saves your work, so you can focus on analysis without
-worrying about losing changes
+DataBeak is designed for web-based hosting, so it works with URLs and in-memory
+data without accessing your local file system
 
 ### 5. **Explore History**
 
-Use DataBeak's stateless design to experiment with different approaches - export
-intermediate results as needed
+Use DataBeak's stateless design to experiment with different approaches -
+retrieve results when needed
 
 ## Next Steps
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -60,7 +60,6 @@ dependencies = [
     "pytz>=2024.2",
     "pydantic-settings>=2.10.1",
     "psutil>=7.0.0",
-    "chardet>=5.2.0",
     "scipy>=1.16.1",
     "simpleeval>=1.0.3",
     "pandera>=0.26.1",
@@ -384,7 +383,6 @@ dev = [
     "twine>=6.1.0",
     "ty>=0.0.1a21",
     "types-aiofiles>=24.1.0.20250822",
-    "types-chardet>=5.0.4.6",
     "types-jsonschema>=4.25.1.20250822",
     "types-psutil>=7.0.0.20250822",
     "types-pytz>=2025.2.0.20250809",

diff --git a/src/databeak/core/session.py b/src/databeak/core/session.py
@@ -5,12 +5,11 @@
 import logging
 import threading
 from datetime import UTC, datetime, timedelta
-from pathlib import Path
 from typing import TYPE_CHECKING, Any
 from uuid import uuid4
 
 from databeak.exceptions import NoDataLoadedError, SessionExpiredError
-from databeak.models.data_models import ExportFormat, SessionInfo
+from databeak.models.data_models import SessionInfo
 from databeak.models.data_session import DataSession
 
 if TYPE_CHECKING:
@@ -146,43 +145,6 @@ def get_info(self) -> SessionInfo:
             file_path=data_info["file_path"],
         )
 
-    async def _save_callback(
-        self,
-        file_path: str,
-        export_format: ExportFormat,
-        encoding: str,
-    ) -> dict[str, Any]:
-        """Handle auto-save operations."""
-        try:
-            if self._data_session.df is None:
-                return {"success": False, "error": "No data to save"}
-
-            # Handle different export formats
-            path_obj = Path(file_path)
-            path_obj.parent.mkdir(parents=True, exist_ok=True)
-
-            if export_format == ExportFormat.CSV:
-                self._data_session.df.to_csv(path_obj, index=False, encoding=encoding)
-            elif export_format == ExportFormat.TSV:
-                self._data_session.df.to_csv(path_obj, sep="\t", index=False, encoding=encoding)
-            elif export_format == ExportFormat.JSON:
-                self._data_session.df.to_json(path_obj, orient="records", indent=2)
-            elif export_format == ExportFormat.EXCEL:
-                self._data_session.df.to_excel(path_obj, index=False)
-            elif export_format == ExportFormat.PARQUET:
-                self._data_session.df.to_parquet(path_obj, index=False)
-            else:
-                return {"success": False, "error": f"Unsupported format: {export_format}"}
-
-            return {
-                "success": True,
-                "file_path": str(path_obj),
-                "rows": len(self._data_session.df),
-                "columns": len(self._data_session.df.columns),
-            }
-        except (OSError, PermissionError, ValueError, TypeError, UnicodeError) as e:
-            return {"success": False, "error": str(e)}
-
     async def clear(self) -> None:
         """Clear session data to free memory."""
         # Clear data session