Skip to content

Log errors when incorrect commands are used in batch mode#858

Open
emmuhamm wants to merge 4 commits intodevelopfrom
emmuhamm/bad-commands-batch-fix
Open

Log errors when incorrect commands are used in batch mode#858
emmuhamm wants to merge 4 commits intodevelopfrom
emmuhamm/bad-commands-batch-fix

Conversation

@emmuhamm
Copy link
Copy Markdown
Contributor

@emmuhamm emmuhamm commented Mar 23, 2026

Description

Fixes issue #753

Will give feedback in case there is a bad command when inputting as a drunc batch/semibatch mode.

This change adds in a validation check to ensure that the commands parsed by via the batch mode work.

It does it in a few steps. Note the following example with unknown_cmd

  1. Generates a click parser and use that to parse the argv (['boot', 'unknown_cmd', 'start-run', '--run-number', '1', 'wait', '20'])
  2. Using click to obtain the list of available commands that are registered to it, iterate over the supplied list from the command line.
    1. If it identifies a command, mark it as a command and move on to the next one

    2. If it doesn't identify a command:

      1. but the previous iteration was a command, mark it as an argument
      2. Else, raise error
    3. This splits the arguments into a list of command, argument pairs ([('boot', ['unknown_cmd']), ('start-run', ['--run-number', '1']), ('wait', ['20'])])

  3. Using click's make_context, go through each command and argument pair and see if its possible to form a context
  4. If it fails, raise an error (eg. boot unknown_cmd) is not valid command.

This should be robust as it will go through the definitive list of commands as defined from click. It shoulnd't need maintenance as long as people properly tell click of their new commands.

Type of change

  • New feature / enhancement
  • Bug fix

List of required branches from other repositories

None

Suggested manual testing checklist

  1. Use the latest nightly and check out this branch
  2. Dont forget to rebase if necessary
  3. Run drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test
    1. This should drop you in the unified shell
  4. Run drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test boot wait 5 conf
    1. This should run the batch mode just fine
  5. Run drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test unknown_cmd
    1. This should throw an exception (original command unknown)
  6. Run drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test boot sleep 5 conf
    1. This should throw an exception (sleep is unknown, so treated as an argument to boot boot sleep. Making this context will fail.)
  7. Run drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test boot wait 5 conf logs --name ru-controller --user emmuhamm
    1. This should run just fine (testing argument parsing)
  8. Run drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test boot wait 5 conf logs --namEEe ru-controller --user emmuhamm
    1. This should raise an exception, unknown optoin --namEEe
  9. Run kurt's command drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config ${USER}-local-test boot wait 5 conf wait 3 start --run-number 142 enable-triggers wait 10 disable-triggers sleep 3 drain-dataflow stop-trigger-sources stop scrap terminate
    1. This should throw an exception (you should be able to tell what it is)
  10. Try fixing the above yourself, and see it pass ;)

Developer checklist

Prior to marking this as "Ready for Review"

Tests ran on: np04-srv-028 from release NFD_DEV_260323_A9

Unit tests - some tests can't be ran on the CI. This is documented. If this PR checks a feature that can't be tested with CI, this has been marked appropriately.

Integration tests - the daqsystemtest_integtest_bundle requires a lot of resources, and connections to the EHN1 infrastructure. Check the cross referenced list if you can't run these. The developer needs to run at least the .

  • Unit tests (pytest --marker) passed
    • With relevant marker
    • Without marker
  • Integration tests passed
    • Only daqsystemtest_integtest_bundle.sh -k minimal_system_quick_test.py
    • Full daqsystemtest_integtest_bundle.sh
  • Testing skipped as there are no core code changes in this PR, this only relates to documentation/CI workflows

Final checklist prior to marking this as "Ready for Review"

  • Code is clearly commented.
  • New unit tests have been added, or is documented in # ISSUE NUMBER
  • A suitable reviewer has been chosen from this list.

Reviewer checklist

  • This branch has been rebased with develop prior to testing.
  • Suggested manual tests show changes.
  • CI workflows fails documented (if present)
  • Integration tests passed
    • Only concern yourself if failures related to drunc are in the log files
    • If non-drunc failure appears:
      • Validate failure in fresh working area
      • Contact Pawel if unsure

Once the features are validated and both the unit and integration tests pass, the PRs is ready to be merged.

Prior to merging

Choose one of the following an complete all substeps
  • Changes only affect the Run Control, are in a single repository, and do not affect the end user.
    • Changes are documented in docstrings and code comments
    • Wiki has been updated if architectural or endpoint changes
  • Otherwise
    • Workflow changes demonstrated in the Change Log (if necessary)
    • Wiki has been updated (if necessary)
    • #daq-sw-librarians Slack channel notified (see below)

Once completed, the reviewer can merge the PR.

Notification message for #daq-sw-librarians Slack channel

For an single merge that changes the user workflow

The CCM WG has an isolated PR ready to merge that affects user workflows. The PR is:

_URL_

I will leave time for any comments, otherwise will merge these at the end of the work day _Insert your time zone_.

For co-ordinated merge

The CCM WG has a set of co-ordinated merges ready to merge. The PRs are:

_URL_

_URL_


I will leave time for any comments, otherwise will merge these at the end of the day.

@emmuhamm emmuhamm changed the title [TO FIXUP] THIS WORKS Log errors when incorrect commands are used in batch mode Mar 23, 2026
@emmuhamm emmuhamm self-assigned this Mar 23, 2026
@emmuhamm emmuhamm added the enhancement New feature or request label Mar 23, 2026
@emmuhamm emmuhamm force-pushed the emmuhamm/bad-commands-batch-fix branch from 93f48b0 to c60cf99 Compare March 24, 2026 10:40
@emmuhamm emmuhamm requested a review from PawelPlesniak March 24, 2026 11:23
@emmuhamm emmuhamm marked this pull request as ready for review March 24, 2026 11:24
@emmuhamm
Copy link
Copy Markdown
Contributor Author

Hey @PawelPlesniak, this is ready for review :)

@PawelPlesniak
Copy link
Copy Markdown
Collaborator

This is generally very nice. One small change I would make is in the comments. Otherwise nice work!

except click.UsageError as e:
raise DruncBatchShellArgError(e) from e

parser = ctx.command.make_parser(ctx)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following 5 lines should be checkable simply by checking the UnifiedShellMode - if not interactive, this will suffice.

@PawelPlesniak
Copy link
Copy Markdown
Collaborator

image This has been validated to work as expected. Running the full integration test suite now.

@PawelPlesniak
Copy link
Copy Markdown
Collaborator

on np04-srv-029

======================== 18 passed in 440.64s (0:07:20) ========================


+++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++ SUMMARY ++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++

Fri Mar 27 07:35:46 PM CET 2026
Log file is: /tmp/pytest-of-pplesnia/dunedaq_integtest_bundle_20260327183821.log

⮕ Running daqsystemtest/3ru_1df_multirun_test.py ⬅
=================== 2 failed ❌, 4 passed ✅ in 379.11s (0:06:19) ====================
⮕ Running daqsystemtest/3ru_3df_multirun_test.py ⬅
======================== 6 passed ✅ in 342.23s (0:05:42) =========================
⮕ Running daqsystemtest/example_system_test.py ⬅
======================== 12 passed ✅ in 365.43s (0:06:05) ========================
⮕ Running daqsystemtest/fake_data_producer_test.py ⬅
======================== 6 passed ✅ in 299.55s (0:04:59) =========================
⮕ Running daqsystemtest/long_window_readout_test.py ⬅
============================== 1 skipped 🟡 in 1.71s ==============================
⮕ Running daqsystemtest/minimal_system_quick_test.py ⬅
========================= 4 passed ✅ in 79.46s (0:01:19) =========================
⮕ Running daqsystemtest/readout_type_scan_test.py ⬅
======================== 33 passed ✅ in 883.55s (0:14:43) ========================
⮕ Running daqsystemtest/sample_ehn1_multihost_test.py ⬅
======================== 4 skipped 🟡 in 84.71s (0:01:24) =========================
⮕ Running daqsystemtest/small_footprint_quick_test.py ⬅
========================= 3 passed ✅ in 80.90s (0:01:20) =========================
⮕ Running daqsystemtest/tpg_state_collection_test.py ⬅
======================== 5 passed ✅ in 146.34s (0:02:26) =========================
⮕ Running daqsystemtest/tpreplay_test.py ⬅
======================== 6 passed ✅ in 176.35s (0:02:56) =========================
⮕ Running daqsystemtest/tpstream_writing_test.py ⬅
======================== 4 passed ✅ in 146.25s (0:02:26) =========================
⮕ Running daqsystemtest/trigger_bitwords_test.py ⬅
======================== 18 passed ✅ in 440.64s (0:07:20) ========================

@PawelPlesniak
Copy link
Copy Markdown
Collaborator

I will make the suggested changes later today and get this merged, nice work!

@PawelPlesniak
Copy link
Copy Markdown
Collaborator

I added a small change to update the logic that selects tthe UnifiedShellMode. I think this command needs a little more work - I ran the suggested manual tests and saw the following when running:

drunc-unified-shell ssh-standalone config/daqsystemtest/example-configs.data.xml local-1x1-config test boot sleep 5 conf
[2026/03/29 12:09:10 UTC] INFO       shell.py:180                             drunc.unified_shell                                Setting up to use the process manager with configuration ssh-standalone and configuration id "local-1x1-config" from oksconflibs:config/daqsystemtest/example-configs.data.xml
[2026/03/29 12:09:10 UTC] INFO       shell.py:202                             drunc.unified_shell                                Starting process manager
[2026/03/29 12:09:10 UTC] INFO       process_manager.py:109                   drunc.process_manager                              process_manager communicating through address 10.73.136.71:42475
Extracted batch commands: ['boot', 'sleep', '5', 'conf']
Split batch commands: [('boot', ['sleep', '5']), ('conf', [])]
[2026/03/29 12:09:10 UTC] ERROR      unified_shell.py:15                      drunc.unified_shell                                🔥🔥 Exception thrown 🔥🔥
[2026/03/29 12:09:10 UTC] ERROR      unified_shell.py:16                      drunc.unified_shell                                Caught exception on bad arguments for the batch shell: Got unexpected extra arguments (sleep 5)

This implementation is slightly unclear - when looking at the extra arguments, the way that you have filtered for the arguments (by looking at the chain_args and with the arg not in command_names, this fails to parse when an argument is not in the command list. This, however, is flawed, as it finds sleep as an argument of boot, which is not the case. A better implementation would be to look at the arguments of each command, and validate that the next passed batch mode arguent is neitherr a batch command nor an argument of the current command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: it would be helpful to get more feedback about bad commands in drunc batch mode

3 participants