Raising errors with failed boots#865
Draft
PawelPlesniak wants to merge 8 commits intoprep-release/fddaq-v5.6.0from
Draft
Raising errors with failed boots#865PawelPlesniak wants to merge 8 commits intoprep-release/fddaq-v5.6.0from
PawelPlesniak wants to merge 8 commits intoprep-release/fddaq-v5.6.0from
Conversation
Collaborator
Author
|
The core of the issue for the log override when booting has been identified - the rich Console that the status table updater uses is not the same one that the logger uses. The status table updater overrides it there. This is going to be messy |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes issue #817
If not all applications are alive when boot has completed, raise an error, and put the session in error.
MAKE THE RICH TABLE NOT OVERRIDE BOOT LOGS.
Type of change
List of required branches from other repositories
N/A
Change log
New configurations
In file
config/tests/nestedConfig.data.xml, there are now three new configuration files with applications that die at different stages of a session's lifetime. These are just for template, and can serve as a starting point for failure mode testing development. These sessions aretest-nested-config-failure-on-init- thefake_daq_applications that are spawned die before any processing happenstest-nested-config-failure-post-boot- thefake_daq_applications that are spawned die after they have been fully initialized.test-nested-config-failure-failure-cmd- thefake_daq_applications that are spawned die on the first executed stateful command.Booting safety
When booting, we now make a check to ensure that the expected number of applications are alive. If the incorrect number of applications is booted, log this. The initial plan here was to put the session into an error state in the case that the applications die, but due to this bug in the k8s PM, this blocks k8s operation, as such has been left as a comment that should be reintroduced later.
Logging around status table
When rendering the live table, we can now see when an application logged contents to the tty without the rich table overwriting it. The underpinning issue here was muliple instances of
rich.Consoleobjects, which were overwriting each other. In this issue this will be addressed formally, but this is sufficient for the scope of this release.Notes
Suggested manual testing checklist
Standard runs should behave as expected.
There are additional configurations defined that intentionally fail. To use these, run the following commands
The first will fail to boot as
Note, there are additional logs that are suppressed by the rich table rendering, which can be seen in the logs as
Developer checklist
Prior to marking this as "Ready for Review"
Tests ran on: WHAT HOSTNAME from release RELEASE_NAME
Unit tests - some tests can't be ran on the CI. This is documented. If this PR checks a feature that can't be tested with CI, this has been marked appropriately.
Integration tests - the
daqsystemtest_integtest_bundlerequires a lot of resources, and connections to the EHN1 infrastructure. Check the cross referenced list if you can't run these. The developer needs to run at least the .pytest --marker) passeddaqsystemtest_integtest_bundle.sh -k minimal_system_quick_test.pydaqsystemtest_integtest_bundle.shFinal checklist prior to marking this as "Ready for Review"
Reviewer checklist
druncare in the log filesdruncfailure appears:Once the features are validated and both the unit and integration tests pass, the PRs is ready to be merged.
Choose one of the following an complete all substepsPrior to merging
Once completed, the reviewer can merge the PR.
Notification message for a Slack channel
Note - this should be to #dunedaq-integration for general workflow that isn't during a release candidate period, and to #daq-release-prep otherwise.
For an single merge that changes the user workflow
For co-ordinated merge