-
Notifications
You must be signed in to change notification settings - Fork 91
updated the logging logic when running retention delete with archivin… #642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to fix race condition errors in the retention policy deletion process where items deleted between query and delete operations were logging false errors. The changes add specific handling for CosmosResourceNotFoundError when deleting aged conversations and documents.
Changes:
- Added
CosmosResourceNotFoundErrorexception handling indelete_aged_conversations()anddelete_aged_documents()functions - Updated version to 0.236.012
- Added functional test to verify exception handling exists in source code
- Added comprehensive documentation describing the fix
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| application/single_app/functions_retention_policy.py | Added try-except blocks for CosmosResourceNotFoundError in conversation and document deletion loops |
| application/single_app/config.py | Updated VERSION to 0.236.012 |
| functional_tests/test_retention_policy_notfound_handling.py | Added test to verify exception handling code exists via source inspection |
| docs/explanation/fixes/v0.236.012/RETENTION_POLICY_NOTFOUND_FIX.md | Added documentation describing the issue, fix, and benefits |
| except CosmosResourceNotFoundError: | ||
| # Document was already deleted (race condition) - this is fine | ||
| debug_print(f"Document {document_id} already deleted (not found)") |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The except clause for CosmosResourceNotFoundError will never catch the intended exception because the delete_document function (defined in functions_documents.py at line 2291-2292) catches CosmosResourceNotFoundError and re-raises it as a generic Exception with the message "Document not found". This means the race condition handling described in the PR will not work as intended - when a document is already deleted, a generic Exception will be raised instead, which will be caught by the Exception handler at line 789 and logged as an error rather than being handled gracefully. To fix this, either delete_document should re-raise CosmosResourceNotFoundError instead of converting it to a generic Exception, or this code should catch Exception and check the message for "Document not found".
| except CosmosResourceNotFoundError: | |
| # Document was already deleted (race condition) - this is fine | |
| debug_print(f"Document {document_id} already deleted (not found)") | |
| except Exception as doc_error: | |
| # Handle race condition where delete_document wraps CosmosResourceNotFoundError | |
| message = str(doc_error) | |
| if "Document not found" in message: | |
| # Document was already deleted (race condition) - this is fine | |
| debug_print(f"Document {document_id} already deleted (not found)") | |
| else: | |
| # Re-raise unexpected exceptions to be handled by outer logic | |
| raise |
| except CosmosResourceNotFoundError: | ||
| # Document was already deleted - count as success | ||
| doc_id = doc.get('id', 'unknown') if doc else 'unknown' | ||
| debug_print(f"Document {doc_id} already deleted (not found)") | ||
| deleted_details.append({ | ||
| 'id': doc_id, | ||
| 'file_name': doc.get('file_name', 'Unknown'), | ||
| 'title': doc.get('title', doc.get('file_name', 'Unknown')), | ||
| 'last_updated': doc.get('last_updated'), | ||
| 'already_deleted': True | ||
| }) |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outer try-except for CosmosResourceNotFoundError provides incomplete protection. While this handler could catch CosmosResourceNotFoundError from operations that directly interact with Cosmos DB within this loop iteration (like querying messages), it won't catch the exception from delete_document since that function converts CosmosResourceNotFoundError to a generic Exception. The handler also won't catch CosmosResourceNotFoundError from delete_document_chunks since that function uses Azure Search, not Cosmos DB. This creates a misleading impression that the race condition is being handled when it may not be in all cases.
| def test_retention_policy_function_definitions(): | ||
| """Test that retention policy functions have proper exception handling.""" | ||
| print("\n🔍 Testing retention policy function definitions...") | ||
|
|
||
| try: | ||
| import inspect | ||
| from functions_retention_policy import delete_aged_conversations, delete_aged_documents | ||
|
|
||
| # Get source code of delete_aged_conversations | ||
| conversations_source = inspect.getsource(delete_aged_conversations) | ||
|
|
||
| # Check for CosmosResourceNotFoundError handling in conversations function | ||
| if 'CosmosResourceNotFoundError' in conversations_source: | ||
| print("✅ delete_aged_conversations handles CosmosResourceNotFoundError") | ||
| else: | ||
| print("❌ delete_aged_conversations does not handle CosmosResourceNotFoundError") | ||
| return False | ||
|
|
||
| # Check for 'already deleted' debug message pattern | ||
| if 'already deleted' in conversations_source: | ||
| print("✅ delete_aged_conversations has 'already deleted' debug messaging") | ||
| else: | ||
| print("❌ delete_aged_conversations missing 'already deleted' debug messaging") | ||
| return False | ||
|
|
||
| # Get source code of delete_aged_documents | ||
| documents_source = inspect.getsource(delete_aged_documents) | ||
|
|
||
| # Check for CosmosResourceNotFoundError handling in documents function | ||
| if 'CosmosResourceNotFoundError' in documents_source: | ||
| print("✅ delete_aged_documents handles CosmosResourceNotFoundError") | ||
| else: | ||
| print("❌ delete_aged_documents does not handle CosmosResourceNotFoundError") | ||
| return False | ||
|
|
||
| # Check for 'already deleted' debug message pattern | ||
| if 'already deleted' in documents_source: | ||
| print("✅ delete_aged_documents has 'already deleted' debug messaging") | ||
| else: | ||
| print("❌ delete_aged_documents missing 'already deleted' debug messaging") | ||
| return False | ||
|
|
||
| return True | ||
|
|
||
| except Exception as e: | ||
| print(f"❌ Failed to verify function definitions: {e}") | ||
| import traceback | ||
| traceback.print_exc() | ||
| return False |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test checks for the presence of CosmosResourceNotFoundError handling in the source code, but it doesn't verify that the exception handling actually works correctly. The test would pass even though the exception handling is broken - delete_document converts CosmosResourceNotFoundError to a generic Exception, so the except CosmosResourceNotFoundError clauses in delete_aged_documents will never catch the intended exception. The test should verify the actual exception flow, not just the presence of exception handling code.
| except CosmosResourceNotFoundError: | ||
| # Document chunks already deleted - this is fine | ||
| debug_print(f"Document chunks for {document_id} already deleted (not found)") |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The except clause for CosmosResourceNotFoundError will never be triggered because delete_document_chunks interacts with Azure Cognitive Search, not Cosmos DB. Azure Search operations don't raise CosmosResourceNotFoundError. This exception handler should be removed as it provides false confidence that a specific error case is being handled when it isn't.
|
|
||
| except Exception as e: | ||
| print(f"❌ Failed to verify function definitions: {e}") | ||
| import traceback |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import of module traceback is redundant, as it was previously imported on line 167.
|
|
||
| except Exception as e: | ||
| print(f"❌ Failed to verify already_deleted flag: {e}") | ||
| import traceback |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import of module traceback is redundant, as it was previously imported on line 167.
|
|
||
| except Exception as e: | ||
| print(f"❌ Failed to verify version: {e}") | ||
| import traceback |
Copilot
AI
Jan 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import of module traceback is redundant, as it was previously imported on line 167.
* updated the logging logic when running retention delete with archiving enabled (#642) * Corrected version to 0.236.011 (#645) * v0.237.001 (#649) * Use Microsoft python base image * Add python ENV vars * Add python ENV vars * Install deps to systme * Add temp dir to image and pip conf support * Add custom-ca-certificates dir * Logo bug fix (#654) * release note updating for github coplilot * fixed logo bug issue * added 2,3,4,5,6,14 days to rentention policy * added retention policy time updates --------- Co-authored-by: Ed Clark <clarked@microsoft.com> Co-authored-by: Bionic711 <13358952+Bionic711@users.noreply.github.com>
* updated the logging logic when running retention delete with archiving enabled (#642) * Corrected version to 0.236.011 (#645) * v0.237.001 (#649) * Use Microsoft python base image * Add python ENV vars * Add python ENV vars * Install deps to systme * Add temp dir to image and pip conf support * Add custom-ca-certificates dir * Logo bug fix (#654) * release note updating for github coplilot * fixed logo bug issue * added 2,3,4,5,6,14 days to rentention policy * added retention policy time updates * Rentention policy (#657) * Critical Retention Policy Deletion Fix * Create RETENTION_POLICY_NULL_LAST_ACTIVITY_FIX.md --------- Co-authored-by: Ed Clark <clarked@microsoft.com> Co-authored-by: Bionic711 <13358952+Bionic711@users.noreply.github.com>
Issue Description
The retention policy deletion process was logging errors when attempting to delete conversations or documents that had already been deleted (e.g., by another process or user action between the query and delete operations).
Error Observed
Root Cause
This is a race condition scenario where:
CosmosResourceNotFoundError(404 NotFound)