gpsaggese · Under-the-stars · Mar 30, 2026 · Mar 30, 2026 · Apr 24, 2026 · Apr 24, 2026
diff --git a/README.md b/README.md
diff --git a/...dTask435_DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/Dockerfile b/...dTask435_DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/Dockerfile
@@ -0,0 +1,30 @@
+# Use Python 3.12 slim (already has Python and pip).
+FROM python:3.12-slim
+
+# Avoid interactive prompts during apt operations.
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Install CA certificates (needed for HTTPS).
+RUN apt-get update && apt-get install -y \
+    ca-certificates \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install project specific packages.
+RUN mkdir -p /install
+COPY requirements.txt /install/requirements.txt
+RUN pip install --upgrade pip && \
+    pip install --no-cache-dir jupyterlab jupyterlab_vim jupytext -r /install/requirements.txt
+
+# Config.
+COPY etc_sudoers /install/
+COPY etc_sudoers /etc/sudoers
+COPY bashrc /root/.bashrc
+
+# Report package versions.
+COPY version.sh /install/
+RUN /install/version.sh 2>&1 | tee version.log
+
+# Jupyter.
+EXPOSE 8888
+
+CMD ["/bin/bash"]
diff --git a/...DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/README.md b/...DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/README.md
@@ -0,0 +1,108 @@
+# Benchmarking-in-Agentic-Reasoning-for-Data-Science-
+
+## Description
+
+This project moves beyond evaluating third-party "black box" tools to engineering a custom, stateful multi-agent system using LangGraph. While standard agents (like ChatGPT) follow linear, one-shot processes, this research builds a cyclic architecture where agents can plan, execute, critique, and self-correct. By developing an internal "Analyst-Reviewer" loop, the project explores the frontier of Agentic Reasoning—testing whether a structured graph of specialized agents can outperform monolithic AI models in reliability, code quality, and handling "adversarial" or "noisy" data science tasks.
+
+| Type                        | Name                                              | Description                                                                                            | Website                                  | Strength                      |
+| --------------------------- | ------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | ---------------------------------------- | -------------|
+| Notebook agent              | Data Interpreter (ChatGPT Advanced Data Analysis) | Upload data → automatic cleaning, analysis, modeling, and visualization                                | https://chat.openai.com                  | Fast        exploratory analysis     |
+| AutoML agent                | AutoGluon                                         | Automated model selection, feature engineering, and tuning pipelines                                   | https://auto.gluon.ai                    | Strong tabular ML performance |
+| Multi-agent research system | Microsoft AutoGen                                 | Agents collaborate to plan experiments, write code, and critique results                               | https://github.com/microsoft/autogen     | Research workflows            |
+| Workflow agent              | LangGraph                                         | Stateful agent graphs for long-running analytical pipelines                                            | https://langchain-ai.github.io/langgraph | Persistent reasoning loops    |
+
+
+## Project Objective 
+
+The primary goal is to benchmark the efficacy of stateful multi-agent orchestration against single-agent and AutoML baselines. This project aims to answer:
+Can a cyclic multi-agent graph (LangGraph) significantly reduce "hallucinations" and logical errors compared to single-agent assistants?
+Does a "Reviewer" node in an agentic workflow produce more production-ready, modular code than one-shot generation?
+How do different agent architectures (Linear vs. Cyclic vs. AutoML) recover when faced with corrupted or ambiguous data?
+
+## Dataset Suggestions
+- **Heart Disease Prediction (UCI / Kaggle)**
+  - Source: Kaggle — UCI Heart Disease Dataset
+  - URL: https://www.kaggle.com/datasets/redwankarimsony/heart-disease-uci
+  - Contains: 14 clinical features (age, cholesterol, chest pain type, etc.)
+    with a binary target indicating presence of heart disease; ~300 rows
+  - Access: Free Kaggle account required; download via
+    `kaggle datasets download` CLI or direct CSV link; no authentication token
+    needed for manual download
+
+- **NYC Yellow Taxi Trip Records**
+  - Source: NYC Open Data / TLC Trip Record Data
+  - URL: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
+  - Contains: Pick-up/drop-off timestamps, GPS coordinates, trip distance, fare
+    amount, tip, and passenger count; monthly Parquet files (~millions of rows —
+    use one month's subset)
+  - Access: Fully public, no authentication; direct Parquet download links
+    available on the page; recommend sampling 50k rows for laptop use
+
+- **Air Quality — OpenAQ**
+  - Source: OpenAQ public API
+  - URL: https://api.openaq.org/v2/measurements (REST, no key required for basic
+    access)
+  - Contains: Real-time and historical PM2.5, PM10, NO₂, O₃, CO readings from
+    thousands of global monitoring stations with timestamps and GPS
+  - Access: Free tier with no API key; query by city, parameter, and date range;
+    returns JSON easily loaded with `requests` + `pandas`
+
+- **Amazon Product Reviews — HuggingFace Datasets**
+  - Source: HuggingFace Hub — `McAuley-Lab/Amazon-Reviews-2023`
+  - URL: https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2023
+  - Contains: Product ratings (1–5 stars), review text, verified purchase flag,
+    product category; load a small subset (e.g., "All_Beauty", ~500k rows) with
+    `datasets.load_dataset()`
+  - Access: Free, no authentication; streamed or downloaded via `datasets`
+    library
+
+## Breakdown of the Nodes 
+
+* The Planner (Node 1): Analyzes the dataset schema and sets the strategy (e.g., "This is a classification problem with imbalanced data").
+* The EDA Analyst (Node 2): Performs exploratory data analysis, cleans data, and identifies outliers.
+* The ML Architect (Node 3): Selects algorithms (e.g., XGBoost, Random Forest), performs hyperparameter tuning, and trains the model.
+* The Quality Reviewer (Node 4): Acts as the scientific "guardrail." It inspects the Analyst's results—if Accuracy is high but Recall is low on imbalanced data, it triggers a loop back to the Architect.
+* The Report Writer (Node 5): Synthesizes the final journey, documenting both the results and the errors caught/corrected by the Reviewer.
+
+## The Quality Reviewer Rules (Guardrails)
+
+### Code Integrity (The "Compiler" Gate)
+
+* Syntax & Execution: Verified execution in a containerized Python environment.
+* Modularity: Checks if code follows DRY (Don't Repeat Yourself) principles and proper function definitions.
+* Library Hygiene: Ensures no unauthorized or deprecated packages are used.
+
+### Statistical Logic (The "Data Scientist" Gate)
+* Leakage Detection: Scans for target variables accidentally included in the feature set.
+* Imbalance Audit: Rejects models that only report "Accuracy" for imbalanced clinical datasets like Heart Disease Prediction.
+* Impossible Values: Flags unrealistic data points (e.g., negative taxi fares) for re-cleaning.
+
+### Explainability (The "Researcher" Gate)
+
+* Narrative Consistency: Verifies that the written report matches the generated SHAP/Feature Importance plots.
+* Logical Grounding: Rejects generic explanations in favor of data-backed insights.
+
+## Benchmark Comparison Framework
+
+We benchmark the custom LangGraph system against three distinct philosophies of AI:
+
+1. Single-Agent Baseline: ChatGPT (Advanced Data Analysis) – Testing monolithic performance.
+2. Conversational Multi-Agent: Microsoft AutoGen – Testing "Group Chat" vs. "Graph-based" logic.
+3. Standard AutoML: AutoGluon – Testing AI reasoning vs. mathematical automation.
+
+## Tasks & Implementation
+
+1. Environment Setup: Version pinning for reproducibility across all agents.
+2. Graph Construction: Implementing the StateGraph and Conditional Edges in LangGraph.
+3. Benchmarking Execution: Running all competitors against Amazon Reviews, NYC Taxi, and Heart Disease datasets.
+4. Adversarial Reliability Test: Introducing mislabeled data and extreme outliers to test system resilience.
+5. Interpretability Audit: Analyzing the "thought logs" to determine which architecture is most transparent for human researchers.
+
+## Useful Resources
+- **AutoGluon Documentation** — Tabular prediction quickstart and benchmarks:
+  https://auto.gluon.ai/stable/tutorials/tabular/tabular-quick-start.html
+- **Microsoft AutoGen GitHub** — Multi-agent conversation examples including
+  data science workflows: https://github.com/microsoft/autogen
+- **OpenML Benchmark Suite** — Curated tabular datasets and standardized
+  evaluation protocols for AutoML comparison studies:
+  https://www.openml.org/search?type=benchmark
diff --git a/...s/UmdTask435_DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/bashrc b/...s/UmdTask435_DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/bashrc
@@ -0,0 +1 @@
+set -o vi
diff --git a/...ATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/copy_docker_files.py b/...ATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/copy_docker_files.py
@@ -0,0 +1,140 @@
+#!/usr/bin/env python
+
+"""
+Copy Docker-related files from the source directory to a destination directory.
+
+This script copies all Docker configuration and utility files from
+class_project/project_template/ to a specified destination directory.
+
+Usage examples:
+    # Copy all files to a target directory.
+    > ./copy_docker_files.py --dst_dir /path/to/destination
+
+    # Copy with verbose logging.
+    > ./copy_docker_files.py --dst_dir /path/to/destination -v DEBUG
+
+Import as:
+
+import class_project.project_template.copy_docker_files as cpdccodo
+"""
+
+import argparse
+import logging
+import os
+from typing import List
+
+import helpers.hdbg as hdbg
+import helpers.hio as hio
+import helpers.hparser as hparser
+import helpers.hsystem as hsystem
+
+_LOG = logging.getLogger(__name__)
+
+# #############################################################################
+# Constants
+# #############################################################################
+
+# List of files to copy from the source directory.
+_FILES_TO_COPY = [
+    "bashrc",
+    "docker_bash.sh",
+    "docker_build.sh",
+    "docker_clean.sh",
+    "docker_cmd.sh",
+    "docker_exec.sh",
+    "docker_jupyter.sh",
+    "docker_name.sh",
+    "docker_push.sh",
+    "etc_sudoers",
+    "install_jupyter_extensions.sh",
+    "run_jupyter.sh"
+    "version.sh",
+]
+
+
+# #############################################################################
+# Helper functions
+# #############################################################################
+
+
+def _get_source_dir() -> str:
+    """
+    Get the absolute path to the source directory containing Docker files.
+
+    :return: absolute path to class_project/project_template/
+    """
+    # Get the directory where this script is located.
+    script_dir = os.path.dirname(os.path.abspath(__file__))
+    _LOG.debug("Script directory='%s'", script_dir)
+    return script_dir
+
+
+def _copy_files(
+    *,
+    src_dir: str,
+    dst_dir: str,
+    files: List[str],
+) -> None:
+    """
+    Copy specified files from source directory to destination directory.
+
+    :param src_dir: source directory path
+    :param dst_dir: destination directory path
+    :param files: list of filenames to copy
+    """
+    # Verify source directory exists.
+    hdbg.dassert_dir_exists(src_dir, "Source directory does not exist:", src_dir)
+    # Create destination directory if it doesn't exist.
+    hio.create_dir(dst_dir, incremental=True)
+    _LOG.info("Copying %d files from '%s' to '%s'", len(files), src_dir, dst_dir)
+    # Copy each file.
+    copied_count = 0
+    for filename in files:
+        src_path = os.path.join(src_dir, filename)
+        dst_path = os.path.join(dst_dir, filename)
+        # Verify source file exists.
+        hdbg.dassert_path_exists(
+            src_path, "Source file does not exist:", src_path
+        )
+        # Copy the file using cp -a to preserve all permissions and attributes.
+        _LOG.debug("Copying '%s' -> '%s'", src_path, dst_path)
+        cmd = f"cp -a {src_path} {dst_path}"
+        hsystem.system(cmd)
+        copied_count += 1
+    #
+    _LOG.info("Successfully copied %d files", copied_count)
+
+
+# #############################################################################
+
+
+def _parse() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "--dst_dir",
+        action="store",
+        required=True,
+        help="Destination directory where files will be copied",
+    )
+    hparser.add_verbosity_arg(parser)
+    return parser
+
+
+def _main(parser: argparse.ArgumentParser) -> None:
+    args = parser.parse_args()
+    hdbg.init_logger(verbosity=args.log_level, use_exec_path=True)
+    # Get source directory.
+    src_dir = _get_source_dir()
+    # Copy files to destination.
+    _copy_files(
+        src_dir=src_dir,
+        dst_dir=args.dst_dir,
+        files=_FILES_TO_COPY,
+    )
+
+
+if __name__ == "__main__":
+    _main(_parse())
diff --git a/...435_DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/docker_build.sh b/...435_DATA605_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/docker_build.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+# """
+# Build a Docker container image for the project.
+#
+# This script sets up the build environment with error handling and command
+# tracing, loads Docker configuration from docker_name.sh, and builds the
+# Docker image using the build_container_image utility function. It supports
+# both single-architecture and multi-architecture builds via the
+# DOCKER_BUILD_MULTI_ARCH environment variable.
+# """
+
+# Exit immediately if any command exits with a non-zero status.
+set -e
+
+# Import the utility functions.
+GIT_ROOT=$(git rev-parse --show-toplevel)
+source $GIT_ROOT/class_project/project_template/utils.sh
+
+# Parse default args (-h, -v) and enable set -x if -v is passed.
+# Shift processed option flags so remaining args are passed to the build.
+parse_default_args "$@"
+shift $((OPTIND-1))
+
+# Load Docker configuration variables (REPO_NAME, IMAGE_NAME, FULL_IMAGE_NAME).
+get_docker_vars_script ${BASH_SOURCE[0]}
+source $DOCKER_NAME
+print_docker_vars
+
+# Configure Docker build settings.
+# Enable BuildKit for improved build performance and features.
+export DOCKER_BUILDKIT=1
+#export DOCKER_BUILDKIT=0
+
+# Configure single-architecture build (set to 1 for multi-arch build).
+#export DOCKER_BUILD_MULTI_ARCH=1
+export DOCKER_BUILD_MULTI_ARCH=0
+
+# Build the container image.
+# Pass extra arguments (e.g., --no-cache) via command line after -v.
+build_container_image "$@"
diff --git a/...05_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/docker_build.version.log b/...05_Spring2026_Benchmarking_in_Agentic_Reasoning_for_Data_Science/docker_build.version.log
@@ -0,0 +1 @@
+the input device is not a TTY