Skip to content

MLFlow: Log fetcher , Context fetcher#4

Open
EoghanOConnor wants to merge 2 commits into
redhat-et:mainfrom
EoghanOConnor:mlflow-log-and-context-fetcher
Open

MLFlow: Log fetcher , Context fetcher#4
EoghanOConnor wants to merge 2 commits into
redhat-et:mainfrom
EoghanOConnor:mlflow-log-and-context-fetcher

Conversation

@EoghanOConnor

@EoghanOConnor EoghanOConnor commented Mar 19, 2026

Copy link
Copy Markdown
Collaborator

Session-start.sh: automatically sets env vars, installs and starts mlflow
Log-fetcher: mlflow decorators added.
context-fetcher: Python script added to capture details in a manner similar to other skills. Makes it easier to track.

Summary by CodeRabbit

  • New Features

    • Integrated MLflow tracing for context-fetcher and logs-fetcher skills, enabling visibility into execution flows and performance metrics via the MLflow UI.
  • Documentation

    • Added comprehensive MLflow setup guide to README, including environment configuration and hook script registration.
    • Updated skill documentation to reflect MLflow tracing capabilities.

@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The pull request integrates MLflow tracing across multiple skill modules by adding environment variable exports in a session-start hook, updating documentation, and instrumenting Python scripts with MLflow span decorators and return type modifications to capture structured result payloads.

Changes

Cohort / File(s) Summary
Session initialization and configuration
.claude/hooks/session-start.sh, README.md
Added MLflow environment variable exports (MLFLOW_TAG_USER, MLFLOW_TRACKING_URI, MLFLOW_EXPERIMENT_NAME, MLFLOW_TRACING_ENABLED) to session-start hook and documented the hook setup process plus a new MLflow tracing guide with environment configuration steps.
Context fetcher skill
skills/context-fetcher/SKILL.md, skills/context-fetcher/mlflow-context.py
Updated skill documentation to replace feedback-capture step with MLflow logging via a new mlflow-context.py script that logs context search spans with query, sources, and results metadata.
Logs fetcher scripts
skills/logs-fetcher/scripts/fetch_logs_by_job.py, skills/logs-fetcher/scripts/fetch_logs_ssh.py
Added MLflow span decorators (@mlflow.trace) to functions, changed return types from None to structured dict payloads containing status and metadata, and updated trace context with inputs/outputs and environment-based metadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title mentions "Log fetcher, Context fetcher" but omits the session-start.sh updates, which constitute a significant part of the changes according to the raw summary. Consider a more comprehensive title like 'MLflow: Add tracing for log-fetcher, context-fetcher, and session initialization' to better reflect all major changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
skills/logs-fetcher/scripts/fetch_logs_ssh.py (1)

251-260: ⚠️ Potential issue | 🔴 Critical

Don't return a dict from main() while the module still does sys.exit(main()).

sys.exit treats any non-integer object as an error, prints it, and exits with status 1. After this change, a successful fetch will look like a failed CLI invocation to callers and automation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/logs-fetcher/scripts/fetch_logs_ssh.py` around lines 251 - 260, The
CLI's main() currently returns a dict (the result of run_sync) while the module
calls sys.exit(main()), which causes Python to treat the dict as an error and
exit with status 1; change main() (the function that calls run_sync and
currently returns result) to return an integer exit code (0 on success, non-zero
on failure) instead of the dict, and move the dict result to be printed or saved
inside main() (or return None and have the if __name__ == "__main__" block call
run_sync and handle printing then call sys.exit(0)); update references in main()
and the top-level sys.exit(main()) call accordingly so callers/automation see
correct exit codes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/hooks/session-start.sh:
- Around line 7-10: The session-start.sh snippet appends empty MLflow env vars
unconditionally and uses the wrong tracing var name; update it to only write
MLflow exports into CLAUDE_ENV_FILE when MLFLOW_PORT (or another required MLflow
var) is set/non-empty, and rename/export the tracing flag as
MLFLOW_CLAUDE_TRACING_ENABLED instead of MLFLOW_TRACING_ENABLED so downstream
code like setup.py can read it; specifically, guard the echo lines for
MLFLOW_TAG_USER, MLFLOW_TRACKING_URI, MLFLOW_EXPERIMENT_NAME, and the tracing
flag with an if [ -n "$MLFLOW_PORT" ] (or equivalent) check and change the
exported variable name to MLFLOW_CLAUDE_TRACING_ENABLED.

In `@README.md`:
- Around line 272-381: The README.md contains a duplicated "## Creating a New
Skill" section (the MLFlow guide block) which breaks rendering due to nested
triple-backtick fences; remove the entire duplicated MLflow block starting at
the second "## Creating a New Skill" and either merge its content into the
existing skill creation section or move MLflow setup into a new, single
subsections (e.g., "MLFlow Tracing Setup") within the original "Creating a New
Skill" section, ensuring you remove the inner "```markdown" wrapper so only
proper fenced code blocks remain and headers like "## Step 1: Install
Dependencies" are integrated once.

In `@skills/context-fetcher/SKILL.md`:
- Around line 26-39: The README/step text is inconsistent: it references running
"scripts/mlflow_context.py" but the PR added
"skills/context-fetcher/mlflow-context.py" and still lists "feedback-capture" as
the terminal step; update the final workflow step so it's consistent and
runnable by changing the command to the actual script path and ensure the
terminal step matches the skill flow (replace "python scripts/mlflow_context.py"
with "python skills/context-fetcher/mlflow-context.py" and update the ending
step from "feedback-capture" to the correct terminal step used by the skill,
verifying any flags (--query, --sources, --job-id, --incident-id,
--results-summary) match mlflow-context.py's CLI signatures).

In `@skills/logs-fetcher/scripts/fetch_logs_by_job.py`:
- Around line 154-157: The CLI currently returns the structured result dict from
main(), which makes sys.exit(main()) treat success as nonzero; change main() so
it calls fetch_job_logs(...) to get the structured data but returns an integer
exit code (0 on success, non-zero on failure) instead of returning that
dict—preserve the existing fetch_job_logs(...) return value for imports/tests
but ensure main() maps that outcome to an int before sys.exit(main()); update
any error paths in main() to return non-zero codes accordingly.

In `@skills/logs-fetcher/scripts/fetch_logs_ssh.py`:
- Around line 12-20: The module unconditionally imports mlflow/SpanType and uses
`@mlflow.trace` which breaks when mlflow is optional; wrap the import in a
try/except and provide a lightweight fallback (e.g., a no-op trace decorator and
a stub SpanType) so `@mlflow.trace` usages (the decorator above Parse datetime
string and any other decorated functions) continue to work when mlflow is not
installed; alternatively move mlflow to hard dependencies in pyproject.toml.
Also fix the process exit behavior: change main() so it does not pass the entire
dict returned by run_sync() to sys.exit(); instead have main() return None or
extract an explicit integer status code from run_sync() (e.g.,
result.get("exit_code", 0)) and pass that integer to sys.exit(), referencing the
run_sync() call and the sys.exit(...) call in main().

---

Outside diff comments:
In `@skills/logs-fetcher/scripts/fetch_logs_ssh.py`:
- Around line 251-260: The CLI's main() currently returns a dict (the result of
run_sync) while the module calls sys.exit(main()), which causes Python to treat
the dict as an error and exit with status 1; change main() (the function that
calls run_sync and currently returns result) to return an integer exit code (0
on success, non-zero on failure) instead of the dict, and move the dict result
to be printed or saved inside main() (or return None and have the if __name__ ==
"__main__" block call run_sync and handle printing then call sys.exit(0));
update references in main() and the top-level sys.exit(main()) call accordingly
so callers/automation see correct exit codes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5aef9ec3-8c8d-4164-86c2-5e2ec3607c93

📥 Commits

Reviewing files that changed from the base of the PR and between 1454f36 and 5b76653.

📒 Files selected for processing (6)
  • .claude/hooks/session-start.sh
  • README.md
  • skills/context-fetcher/SKILL.md
  • skills/context-fetcher/mlflow-context.py
  • skills/logs-fetcher/scripts/fetch_logs_by_job.py
  • skills/logs-fetcher/scripts/fetch_logs_ssh.py

Comment on lines +7 to +10
echo "export MLFLOW_TAG_USER='$MLFLOW_TAG_USER'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_TRACKING_URI='http://127.0.0.1:$MLFLOW_PORT'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_EXPERIMENT_NAME='$MLFLOW_EXPERIMENT_NAME'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_TRACING_ENABLED='$MLFLOW_TRACING_ENABLED'" >> "$CLAUDE_ENV_FILE"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only persist MLflow env vars when they are valid, and use the same tracing flag name downstream expects.

This block runs before the MLFLOW_PORT check, so sessions without tracing configured append MLFLOW_TRACKING_URI='http://127.0.0.1:' and other empty MLflow values into CLAUDE_ENV_FILE. It also exports MLFLOW_TRACING_ENABLED, while skills/root-cause-analysis/scripts/setup.py reads MLFLOW_CLAUDE_TRACING_ENABLED, so preflight can still report tracing as disabled.

Suggested fix
 if [ -n "$CLAUDE_ENV_FILE" ]; then
   echo "export CLAUDE_SESSION_ID='$SESSION_ID'" >> "$CLAUDE_ENV_FILE"
-  echo "export MLFLOW_TAG_USER='$MLFLOW_TAG_USER'" >> "$CLAUDE_ENV_FILE"
-  echo "export MLFLOW_TRACKING_URI='http://127.0.0.1:$MLFLOW_PORT'" >> "$CLAUDE_ENV_FILE"
-  echo "export MLFLOW_EXPERIMENT_NAME='$MLFLOW_EXPERIMENT_NAME'" >> "$CLAUDE_ENV_FILE"
-  echo "export MLFLOW_TRACING_ENABLED='$MLFLOW_TRACING_ENABLED'" >> "$CLAUDE_ENV_FILE"
 fi
+
+if [ -n "$MLFLOW_PORT" ] && [ -n "$CLAUDE_ENV_FILE" ]; then
+  echo "export MLFLOW_TAG_USER='$MLFLOW_TAG_USER'" >> "$CLAUDE_ENV_FILE"
+  echo "export MLFLOW_TRACKING_URI='http://127.0.0.1:$MLFLOW_PORT'" >> "$CLAUDE_ENV_FILE"
+  echo "export MLFLOW_EXPERIMENT_NAME='$MLFLOW_EXPERIMENT_NAME'" >> "$CLAUDE_ENV_FILE"
+  echo "export MLFLOW_CLAUDE_TRACING_ENABLED='$MLFLOW_CLAUDE_TRACING_ENABLED'" >> "$CLAUDE_ENV_FILE"
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "export MLFLOW_TAG_USER='$MLFLOW_TAG_USER'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_TRACKING_URI='http://127.0.0.1:$MLFLOW_PORT'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_EXPERIMENT_NAME='$MLFLOW_EXPERIMENT_NAME'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_TRACING_ENABLED='$MLFLOW_TRACING_ENABLED'" >> "$CLAUDE_ENV_FILE"
if [ -n "$CLAUDE_ENV_FILE" ]; then
echo "export CLAUDE_SESSION_ID='$SESSION_ID'" >> "$CLAUDE_ENV_FILE"
fi
if [ -n "$MLFLOW_PORT" ] && [ -n "$CLAUDE_ENV_FILE" ]; then
echo "export MLFLOW_TAG_USER='$MLFLOW_TAG_USER'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_TRACKING_URI='http://127.0.0.1:$MLFLOW_PORT'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_EXPERIMENT_NAME='$MLFLOW_EXPERIMENT_NAME'" >> "$CLAUDE_ENV_FILE"
echo "export MLFLOW_CLAUDE_TRACING_ENABLED='$MLFLOW_CLAUDE_TRACING_ENABLED'" >> "$CLAUDE_ENV_FILE"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/hooks/session-start.sh around lines 7 - 10, The session-start.sh
snippet appends empty MLflow env vars unconditionally and uses the wrong tracing
var name; update it to only write MLflow exports into CLAUDE_ENV_FILE when
MLFLOW_PORT (or another required MLflow var) is set/non-empty, and rename/export
the tracing flag as MLFLOW_CLAUDE_TRACING_ENABLED instead of
MLFLOW_TRACING_ENABLED so downstream code like setup.py can read it;
specifically, guard the echo lines for MLFLOW_TAG_USER, MLFLOW_TRACKING_URI,
MLFLOW_EXPERIMENT_NAME, and the tracing flag with an if [ -n "$MLFLOW_PORT" ]
(or equivalent) check and change the exported variable name to
MLFLOW_CLAUDE_TRACING_ENABLED.

Comment thread README.md
Comment on lines +272 to +381
## Creating a New Skill

1. Create a directory with your skill name (lowercase, hyphen-separated)
2. Add a `SKILL.md` file:

```markdown
---
name: my-skill
description: Brief description of what this skill does
allowed-tools:
- Bash
- Read
---

# MLFlow Tracing Setup Guide for Claude Code

## Step 1: Install Dependencies

1. Create and activate a virtual environment:

```bash
python3 -m venv .venv
source .venv/bin/activate
```

2. Install the package and MLFlow:

```bash
pip install -e .
pip install mlflow
```

3. Enable MLFlow autologging for Claude:

```bash
mlflow autolog claude
```

## Step 2: Configure Claude Settings

Add the following environment variables to your Claude settings file at `~/.claude/settings.json`:

```json
{
"env": {
"JUMPBOX_URI": "<your-username>@<your-jumpbox> -p <port>",
"MLFLOW_CLAUDE_TRACING_ENABLED": "true",
"MLFLOW_PORT":"<set localhost port>",
"MLFLOW_EXPERIMENT_NAME": "<experiment-name>"
}
}
```

**Note**: Replace `<your-username>@<your-jumpbox> -p <port>` with your actual jumpbox connection details.
Replace `<experiment-name>` to a experiment name to the desired experiment name. This will automatically create the experiment for you if it does not exist.

## Step 3: Add hooks SessionStart to hooks in .claude/settings.json
```json
{
"hooks": {
"SessionStart": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "./.claude/hooks/session-start.sh"
}
]
}
]
}
}
```

Make the script executable:
```bash
chmod +x ./.claude/hooks/session-start.sh
```

## Step 4: cd into AIOPS-SKILLS/ dir (where claude will be running)

## Step 5: Enable Claude Autologging

Before starting Claude, run:

```bash
mlflow autolog claude
```

## Step 6: Start Claude

```bash
claude
```

## Step 7: Run a Prompt

Enter any prompt in Claude to generate a trace.

## Step 8: View Traces
Open your browser and navigate to:
http://localhost:5000
Your MLFlow dashboard will display your trace along with any previous traces.
# My Skill

Instructions for Claude...
```

See [template-skill](./template-skill/) for a minimal example and [agent_skills_spec.md](./agent_skills_spec.md) for the full specification.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove this duplicated MLflow guide block.

This re-adds a second ## Creating a New Skill section and nests triple-backtick fences inside a ```markdown block, so the README starts rendering incorrectly from Line 277 onward. Merge the MLflow instructions into the existing section instead of copying the whole block.

🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 272-272: Multiple headings with the same content

(MD024, no-duplicate-heading)


[warning] 379-379: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 272 - 381, The README.md contains a duplicated "##
Creating a New Skill" section (the MLFlow guide block) which breaks rendering
due to nested triple-backtick fences; remove the entire duplicated MLflow block
starting at the second "## Creating a New Skill" and either merge its content
into the existing skill creation section or move MLflow setup into a new, single
subsections (e.g., "MLFlow Tracing Setup") within the original "Creating a New
Skill" section, ensuring you remove the inner "```markdown" wrapper so only
proper fenced code blocks remain and headers like "## Step 1: Install
Dependencies" are integrated once.

Comment on lines +26 to +39
Step 6 [Claude] Log search to MLflow (run scripts/mlflow_context.py)
```

## MLflow Tracing

After synthesizing findings, you MUST log the search to MLflow:

```bash
python scripts/mlflow_context.py \
--query "{search keywords used}" \
--sources "{comma-separated: github,confluence,slack}" \
--job-id "{job ID if applicable}" \
--incident-id "{incident ID if applicable}" \
--results-summary "{brief summary of what was found}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the final workflow step consistent and executable.

These new instructions tell Claude to run scripts/mlflow_context.py, but the file added in this PR is skills/context-fetcher/mlflow-context.py, and Line 3 still says to finish with feedback-capture. As written, the skill has conflicting terminal steps and a command path that will fail at runtime.

Also applies to: 98-98, 122-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/context-fetcher/SKILL.md` around lines 26 - 39, The README/step text
is inconsistent: it references running "scripts/mlflow_context.py" but the PR
added "skills/context-fetcher/mlflow-context.py" and still lists
"feedback-capture" as the terminal step; update the final workflow step so it's
consistent and runnable by changing the command to the actual script path and
ensure the terminal step matches the skill flow (replace "python
scripts/mlflow_context.py" with "python
skills/context-fetcher/mlflow-context.py" and update the ending step from
"feedback-capture" to the correct terminal step used by the skill, verifying any
flags (--query, --sources, --job-id, --incident-id, --results-summary) match
mlflow-context.py's CLI signatures).

Comment on lines +154 to 157
return fetch_job_logs(
job_numbers=args.job_numbers,
local_dir=args.local_dir,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This new return path breaks the CLI exit status.

Line 161 still does sys.exit(main()), so returning the result dict here makes the process exit with status 1 on success. If you want to keep returning structured data for imports/tests, stop passing main() into sys.exit; otherwise return an integer status code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/logs-fetcher/scripts/fetch_logs_by_job.py` around lines 154 - 157, The
CLI currently returns the structured result dict from main(), which makes
sys.exit(main()) treat success as nonzero; change main() so it calls
fetch_job_logs(...) to get the structured data but returns an integer exit code
(0 on success, non-zero on failure) instead of returning that dict—preserve the
existing fetch_job_logs(...) return value for imports/tests but ensure main()
maps that outcome to an int before sys.exit(main()); update any error paths in
main() to return non-zero codes accordingly.

Comment on lines +12 to +20
import mlflow
from mlflow.entities import SpanType

REMOTE_HOST = os.environ.get("REMOTE_HOST")
REMOTE_DIR = os.environ.get("REMOTE_DIR")
DEFAULT_LOCAL_DIR = Path(os.environ.get("DEFAULT_LOCAL_DIR", Path.home() / "aiops_extracted_logs"))


@mlflow.trace(name="Parse datetime string", span_type=SpanType.PARSER)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

fd -HI 'pyproject.toml|setup.py|setup.cfg|requirements.*|Pipfile|Pipfile.lock|poetry.lock' . | while read -r f; do
  echo "== $f =="
  rg -n '\bmlflow\b' "$f" || true
done

Repository: redhat-et/rhdp-rca-plugin

Length of output: 589


🏁 Script executed:

# Check if logs-fetcher has its own requirements file
find . -path "*/logs-fetcher/*" -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "setup.cfg" \)

Repository: redhat-et/rhdp-rca-plugin

Length of output: 51


🏁 Script executed:

# Check the README mentioned in the review for MLflow optionality
fd -HI 'README*' . --path "*/logs-fetcher/*"

Repository: redhat-et/rhdp-rca-plugin

Length of output: 305


🏁 Script executed:

# Examine the actual imports in the file to understand the issue
head -30 skills/logs-fetcher/scripts/fetch_logs_ssh.py | cat -n

Repository: redhat-et/rhdp-rca-plugin

Length of output: 1038


🏁 Script executed:

# Check the context around line 12-20 and also around line 251-260
sed -n '1,30p' skills/logs-fetcher/scripts/fetch_logs_ssh.py | cat -n

Repository: redhat-et/rhdp-rca-plugin

Length of output: 1038


🏁 Script executed:

# Find all files in logs-fetcher directory
find skills/logs-fetcher -type f | head -20

Repository: redhat-et/rhdp-rca-plugin

Length of output: 192


🏁 Script executed:

# Check for README in logs-fetcher
cat skills/logs-fetcher/README.md 2>/dev/null || echo "No README in logs-fetcher"

Repository: redhat-et/rhdp-rca-plugin

Length of output: 94


🏁 Script executed:

# Check if logs-fetcher has its own requirements
cat skills/logs-fetcher/requirements.txt 2>/dev/null || echo "No requirements.txt found"

Repository: redhat-et/rhdp-rca-plugin

Length of output: 94


🏁 Script executed:

# Search for the main() function and sys.exit usage mentioned in the scratchpad
sed -n '240,270p' skills/logs-fetcher/scripts/fetch_logs_ssh.py | cat -n

Repository: redhat-et/rhdp-rca-plugin

Length of output: 921


🏁 Script executed:

# Check the pyproject.toml to see the full mlflow dependency declaration
cat pyproject.toml | grep -A 10 'mlflow ='

Repository: redhat-et/rhdp-rca-plugin

Length of output: 277


🏁 Script executed:

# Read the SKILL.md for context on MLflow optionality
cat skills/logs-fetcher/SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 5753


🏁 Script executed:

# Check the run_sync function to understand what it returns
sed -n '100,250p' skills/logs-fetcher/scripts/fetch_logs_ssh.py | grep -A 30 "def run_sync"

Repository: redhat-et/rhdp-rca-plugin

Length of output: 1086


🏁 Script executed:

# Look at the complete context around main() to understand the sys.exit issue
sed -n '230,270p' skills/logs-fetcher/scripts/fetch_logs_ssh.py | cat -n

Repository: redhat-et/rhdp-rca-plugin

Length of output: 1350


🏁 Script executed:

# Check if mlflow was recently added by looking at the git history of pyproject.toml
git log --oneline -20 pyproject.toml | head -10

Repository: redhat-et/rhdp-rca-plugin

Length of output: 132


🏁 Script executed:

# Check if there are any try/except blocks around mlflow imports in the file
sed -n '1,50p' skills/logs-fetcher/scripts/fetch_logs_ssh.py | cat -n

Repository: redhat-et/rhdp-rca-plugin

Length of output: 1781


🏁 Script executed:

# Check if there are environment variables that control mlflow behavior
rg -n 'MLFLOW|mlflow' skills/logs-fetcher/scripts/fetch_logs_ssh.py | head -20

Repository: redhat-et/rhdp-rca-plugin

Length of output: 835


🏁 Script executed:

# Check the pyproject.toml to confirm mlflow's dependency status
sed -n '20,45p' pyproject.toml | cat -n

Repository: redhat-et/rhdp-rca-plugin

Length of output: 824


🏁 Script executed:

# Check what run_sync actually returns
sed -n '75,145p' skills/logs-fetcher/scripts/fetch_logs_ssh.py | cat -n

Repository: redhat-et/rhdp-rca-plugin

Length of output: 3100


🏁 Script executed:

# Check if there's more to the run_sync function return
rg -A 5 'return' skills/logs-fetcher/scripts/fetch_logs_ssh.py | grep -A 5 "run_sync"

Repository: redhat-et/rhdp-rca-plugin

Length of output: 185


Make mlflow imports conditional or add it as a hard dependency.

Since mlflow is declared as an optional dependency in pyproject.toml but imported unconditionally at the module level (line 12-13), the script breaks with ModuleNotFoundError for users who don't install the mlflow[genai] extras. Either:

  • Move mlflow from optional-dependencies to the main dependencies list in pyproject.toml
  • Guard the mlflow imports and decorators behind optional imports

Additionally, run_sync() returns a dict (line 260+), but main() passes this result to sys.exit() (line 264), causing the script to exit with code 1 (non-zero) on successful runs. Return None from main() or use a status code from the result dict.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/logs-fetcher/scripts/fetch_logs_ssh.py` around lines 12 - 20, The
module unconditionally imports mlflow/SpanType and uses `@mlflow.trace` which
breaks when mlflow is optional; wrap the import in a try/except and provide a
lightweight fallback (e.g., a no-op trace decorator and a stub SpanType) so
`@mlflow.trace` usages (the decorator above Parse datetime string and any other
decorated functions) continue to work when mlflow is not installed;
alternatively move mlflow to hard dependencies in pyproject.toml. Also fix the
process exit behavior: change main() so it does not pass the entire dict
returned by run_sync() to sys.exit(); instead have main() return None or extract
an explicit integer status code from run_sync() (e.g., result.get("exit_code",
0)) and pass that integer to sys.exit(), referencing the run_sync() call and the
sys.exit(...) call in main().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant