Skip to content

Conversation

@paullizer
Copy link
Contributor

Issue Description

The retention policy deletion process was logging errors when attempting to delete conversations or documents that had already been deleted (e.g., by another process or user action between the query and delete operations).

Error Observed

DEBUG: [Log] delete_aged_conversations_deletion_error -- {'error': '(NotFound) Entity with the specified id does not exist in the system.

Root Cause

This is a race condition scenario where:

  1. The retention policy queries for aged conversations/documents
  2. Between the query and the delete operation, the item is deleted by another process (user action, concurrent retention execution, etc.)
  3. The delete operation fails with CosmosResourceNotFoundError (404 NotFound)

Copilot AI review requested due to automatic review settings January 24, 2026 23:01
@paullizer paullizer merged commit d017028 into Development Jan 24, 2026
7 of 8 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix race condition errors in the retention policy deletion process where items deleted between query and delete operations were logging false errors. The changes add specific handling for CosmosResourceNotFoundError when deleting aged conversations and documents.

Changes:

  • Added CosmosResourceNotFoundError exception handling in delete_aged_conversations() and delete_aged_documents() functions
  • Updated version to 0.236.012
  • Added functional test to verify exception handling exists in source code
  • Added comprehensive documentation describing the fix

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
application/single_app/functions_retention_policy.py Added try-except blocks for CosmosResourceNotFoundError in conversation and document deletion loops
application/single_app/config.py Updated VERSION to 0.236.012
functional_tests/test_retention_policy_notfound_handling.py Added test to verify exception handling code exists via source inspection
docs/explanation/fixes/v0.236.012/RETENTION_POLICY_NOTFOUND_FIX.md Added documentation describing the issue, fix, and benefits

Comment on lines +765 to +767
except CosmosResourceNotFoundError:
# Document was already deleted (race condition) - this is fine
debug_print(f"Document {document_id} already deleted (not found)")
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except clause for CosmosResourceNotFoundError will never catch the intended exception because the delete_document function (defined in functions_documents.py at line 2291-2292) catches CosmosResourceNotFoundError and re-raises it as a generic Exception with the message "Document not found". This means the race condition handling described in the PR will not work as intended - when a document is already deleted, a generic Exception will be raised instead, which will be caught by the Exception handler at line 789 and logged as an error rather than being handled gracefully. To fix this, either delete_document should re-raise CosmosResourceNotFoundError instead of converting it to a generic Exception, or this code should catch Exception and check the message for "Document not found".

Suggested change
except CosmosResourceNotFoundError:
# Document was already deleted (race condition) - this is fine
debug_print(f"Document {document_id} already deleted (not found)")
except Exception as doc_error:
# Handle race condition where delete_document wraps CosmosResourceNotFoundError
message = str(doc_error)
if "Document not found" in message:
# Document was already deleted (race condition) - this is fine
debug_print(f"Document {document_id} already deleted (not found)")
else:
# Re-raise unexpected exceptions to be handled by outer logic
raise

Copilot uses AI. Check for mistakes.
Comment on lines +778 to +788
except CosmosResourceNotFoundError:
# Document was already deleted - count as success
doc_id = doc.get('id', 'unknown') if doc else 'unknown'
debug_print(f"Document {doc_id} already deleted (not found)")
deleted_details.append({
'id': doc_id,
'file_name': doc.get('file_name', 'Unknown'),
'title': doc.get('title', doc.get('file_name', 'Unknown')),
'last_updated': doc.get('last_updated'),
'already_deleted': True
})
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outer try-except for CosmosResourceNotFoundError provides incomplete protection. While this handler could catch CosmosResourceNotFoundError from operations that directly interact with Cosmos DB within this loop iteration (like querying messages), it won't catch the exception from delete_document since that function converts CosmosResourceNotFoundError to a generic Exception. The handler also won't catch CosmosResourceNotFoundError from delete_document_chunks since that function uses Azure Search, not Cosmos DB. This creates a misleading impression that the race condition is being handled when it may not be in all cases.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +80
def test_retention_policy_function_definitions():
"""Test that retention policy functions have proper exception handling."""
print("\n🔍 Testing retention policy function definitions...")

try:
import inspect
from functions_retention_policy import delete_aged_conversations, delete_aged_documents

# Get source code of delete_aged_conversations
conversations_source = inspect.getsource(delete_aged_conversations)

# Check for CosmosResourceNotFoundError handling in conversations function
if 'CosmosResourceNotFoundError' in conversations_source:
print("✅ delete_aged_conversations handles CosmosResourceNotFoundError")
else:
print("❌ delete_aged_conversations does not handle CosmosResourceNotFoundError")
return False

# Check for 'already deleted' debug message pattern
if 'already deleted' in conversations_source:
print("✅ delete_aged_conversations has 'already deleted' debug messaging")
else:
print("❌ delete_aged_conversations missing 'already deleted' debug messaging")
return False

# Get source code of delete_aged_documents
documents_source = inspect.getsource(delete_aged_documents)

# Check for CosmosResourceNotFoundError handling in documents function
if 'CosmosResourceNotFoundError' in documents_source:
print("✅ delete_aged_documents handles CosmosResourceNotFoundError")
else:
print("❌ delete_aged_documents does not handle CosmosResourceNotFoundError")
return False

# Check for 'already deleted' debug message pattern
if 'already deleted' in documents_source:
print("✅ delete_aged_documents has 'already deleted' debug messaging")
else:
print("❌ delete_aged_documents missing 'already deleted' debug messaging")
return False

return True

except Exception as e:
print(f"❌ Failed to verify function definitions: {e}")
import traceback
traceback.print_exc()
return False
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test checks for the presence of CosmosResourceNotFoundError handling in the source code, but it doesn't verify that the exception handling actually works correctly. The test would pass even though the exception handling is broken - delete_document converts CosmosResourceNotFoundError to a generic Exception, so the except CosmosResourceNotFoundError clauses in delete_aged_documents will never catch the intended exception. The test should verify the actual exception flow, not just the presence of exception handling code.

Copilot uses AI. Check for mistakes.
Comment on lines +755 to +757
except CosmosResourceNotFoundError:
# Document chunks already deleted - this is fine
debug_print(f"Document chunks for {document_id} already deleted (not found)")
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except clause for CosmosResourceNotFoundError will never be triggered because delete_document_chunks interacts with Azure Cognitive Search, not Cosmos DB. Azure Search operations don't raise CosmosResourceNotFoundError. This exception handler should be removed as it provides false confidence that a specific error case is being handled when it isn't.

Copilot uses AI. Check for mistakes.

except Exception as e:
print(f"❌ Failed to verify function definitions: {e}")
import traceback
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of module traceback is redundant, as it was previously imported on line 167.

Copilot uses AI. Check for mistakes.

except Exception as e:
print(f"❌ Failed to verify already_deleted flag: {e}")
import traceback
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of module traceback is redundant, as it was previously imported on line 167.

Copilot uses AI. Check for mistakes.

except Exception as e:
print(f"❌ Failed to verify version: {e}")
import traceback
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of module traceback is redundant, as it was previously imported on line 167.

Copilot uses AI. Check for mistakes.
paullizer added a commit that referenced this pull request Jan 26, 2026
paullizer added a commit that referenced this pull request Jan 26, 2026
* updated the logging logic when running retention delete with archiving enabled (#642)

* Corrected version to 0.236.011 (#645)
paullizer added a commit that referenced this pull request Jan 26, 2026
* updated the logging logic when running retention delete with archiving enabled (#642)

* Corrected version to 0.236.011 (#645)

* v0.237.001 (#649)
paullizer added a commit that referenced this pull request Jan 26, 2026
* updated the logging logic when running retention delete with archiving enabled (#642)

* Corrected version to 0.236.011 (#645)

* v0.237.001 (#649)

* Use Microsoft python base image

* Add python ENV vars

* Add python ENV vars

* Install deps to systme

* Add temp dir to image and pip conf support

* Add custom-ca-certificates dir

* Logo bug fix (#654)

* release note updating for github coplilot

* fixed logo bug issue

* added 2,3,4,5,6,14 days to rentention policy

* added retention policy time updates

---------

Co-authored-by: Ed Clark <clarked@microsoft.com>
Co-authored-by: Bionic711 <13358952+Bionic711@users.noreply.github.com>
paullizer added a commit that referenced this pull request Jan 26, 2026
* updated the logging logic when running retention delete with archiving enabled (#642)

* Corrected version to 0.236.011 (#645)

* v0.237.001 (#649)

* Use Microsoft python base image

* Add python ENV vars

* Add python ENV vars

* Install deps to systme

* Add temp dir to image and pip conf support

* Add custom-ca-certificates dir

* Logo bug fix (#654)

* release note updating for github coplilot

* fixed logo bug issue

* added 2,3,4,5,6,14 days to rentention policy

* added retention policy time updates

* Rentention policy (#657)

* Critical Retention Policy Deletion Fix

* Create RETENTION_POLICY_NULL_LAST_ACTIVITY_FIX.md

---------

Co-authored-by: Ed Clark <clarked@microsoft.com>
Co-authored-by: Bionic711 <13358952+Bionic711@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants