## Implement Immediate Email Retries with Enhanced Error Handling by az4mm · Pull Request #17 · TheShahnawaaz/CabManagementSystem

az4mm · 2026-01-29T20:29:25Z

Description

This PR refactors the email queue processing service to implement immediate consecutive retries when sending emails fails, replacing the previous exponential backoff approach with scheduled future retries.

Changes

`backend/src/services/email-queue.service.ts`

New sendEmailWithRetry() function: Encapsulates the retry logic for sending a single email with up to 3 consecutive attempts (configurable via MAX_RETRY_ATTEMPTS)
Removed exponential backoff: Emails are now retried immediately without delays between attempts, simplifying the retry flow
Simplified queue processing:
- Removed attempts < max_attempts filter from the initial query since retries happen inline
- Removed attempts = attempts + 1 increment during queue pickup (now tracked via the retry function)
- Eliminated scheduled retry logic with backoff minutes calculation
Improved status handling: Emails now go directly to sent or failed status after processing, with no intermediate pending reschedules
Better logging: Added attempt number tracking in success/warning/error log messages for easier debugging

Motivation

The previous implementation used exponential backoff which:

Required multiple cron runs to fully process failed emails
Added complexity with scheduled future retries
Made it harder to track email delivery issues in real-time

The new immediate retry approach:

Resolves transient failures quickly within a single processing cycle
Provides clearer success/failure outcomes
Reduces the time between initial send attempt and final status

Technical Details

Aspect	Before	After
Retry Strategy	Exponential backoff (5, 10, 20, 30 min)	Immediate consecutive retries
Max Attempts	Configurable per email	3 attempts (constant)
Failed Email Handling	Reschedule to pending	Mark as failed immediately
Attempt Tracking	Incremented on queue pickup	Tracked in retry function

Testing

Verified successful email sends on first attempt
Verified retry behavior when transient failures occur
Verified emails marked as failed after 3 unsuccessful attempts
Verified timeout handling still resets emails to pending

…nced error handling

vercel · 2026-01-29T20:29:30Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
cab-management-system	Ready	Preview, Comment	Jan 29, 2026 8:29pm
cab-management-system-b59a	Ready	Preview, Comment	Jan 29, 2026 8:29pm

Copilot

Pull request overview

This PR refactors the email queue processing service to implement immediate consecutive retries (up to 3 attempts) when email sending fails, replacing the previous exponential backoff strategy that scheduled retries for future cron runs.

Changes:

Introduced sendEmailWithRetry() function that performs up to 3 immediate retry attempts without delays between them
Removed exponential backoff logic and scheduled retry mechanism from queue processing
Simplified the queue selection query by removing the attempts < max_attempts filter since retries now happen inline

Comments suppressed due to low confidence (2)

backend/src/services/email-queue.service.ts:151

With immediate retries and parallel processing via Promise.all(), the total time to process a batch can increase significantly. If all 25 emails (default batch size) fail and each requires 3 retry attempts, and each attempt takes (for example) 2 seconds, the total time would be 25 * 3 * 2 = 150 seconds, far exceeding the default 25-second timeout.

This defeats the purpose of the timeout protection and will likely result in serverless function timeouts. The previous exponential backoff approach handled this better by deferring retries to future cron runs. Consider:

Implementing a timeout budget per email (e.g., MAX_EXECUTION_TIME / batch_size)
Reducing MAX_RETRY_ATTEMPTS when operating in parallel mode
Switching to sequential processing when retries are needed
Passing the time budget to sendEmailWithRetry() to limit retry attempts based on remaining time

    const sendPromises = result.rows.map(async (email) => {
      // Check if we're running out of time
      if (Date.now() - startTime > MAX_EXECUTION_TIME) {
        console.log(`⏰ Timeout approaching, skipping email ${email.id}`);
        await pool.query(`
          UPDATE email_queue 
          SET status = 'pending'
          WHERE id = $1
        `, [email.id]);
        return { success: false, skipped: true };
      }
      
      // Send with immediate retries
      const sendResult = await sendEmailWithRetry(email);
      
      if (sendResult.success) {
        // Mark as sent
        await pool.query(`
          UPDATE email_queue 
          SET status = 'sent', 
              sent_at = NOW(),
              attempts = $1
          WHERE id = $2
        `, [sendResult.attempts, email.id]);
        return { success: true };
        
      } else {
        // Mark as failed after all retries exhausted
        await pool.query(`
          UPDATE email_queue 
          SET status = 'failed', 
              last_error = $1,
              attempts = $2
          WHERE id = $3
        `, [sendResult.error, sendResult.attempts, email.id]);
        return { success: false };
      }
    });
    
    // Wait for all emails to complete
    const results = await Promise.all(sendPromises);

backend/src/services/email-queue.service.ts:108

The query no longer filters by attempts < max_attempts, but the database has an index defined specifically for this condition (idx_email_queue_pending in migration 017_create_email_queue_table.sql:43). This index will no longer be utilized efficiently, which could degrade query performance as the email_queue table grows. The index definition is:

CREATE INDEX IF NOT EXISTS idx_email_queue_pending 
  ON email_queue(priority, scheduled_for) 
  WHERE status = 'pending' AND attempts < max_attempts;

Consider either:

Creating a new migration to drop the old index and create a new one without the attempts < max_attempts condition
Adding the filter back to the query to maintain index compatibility

Without this, PostgreSQL will perform a full sequential scan on the table instead of using the index.

    const result = await pool.query(`
      UPDATE email_queue
      SET status = 'sending'
      WHERE id IN (
        SELECT id FROM email_queue
        WHERE status = 'pending'
          AND scheduled_for <= NOW()
        ORDER BY priority ASC, scheduled_for ASC
        LIMIT $1
        FOR UPDATE SKIP LOCKED
      )
      RETURNING *
    `, [limit]);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T20:33:47Z

+async function sendEmailWithRetry(email: any): Promise<{
+  success: boolean;
+  attempts: number;
+  error?: string;
+}> {
+  let lastError = '';
+
+  for (let attempt = 1; attempt <= MAX_RETRY_ATTEMPTS; attempt++) {


The max_attempts field from the database is no longer being respected. The database schema allows per-email configuration of maximum retry attempts via the max_attempts column (default: 3), but the code now uses a hardcoded MAX_RETRY_ATTEMPTS = 3 constant and ignores the database value.

This breaks the existing API contract where emails can be queued with custom retry limits. If any part of the system relies on setting different max_attempts values for different email types (e.g., 5 attempts for critical transactional emails vs 1 attempt for marketing emails), this functionality is now lost.

Consider either:

Passing email.max_attempts to sendEmailWithRetry() and using that value instead of the constant

Documenting that this is an intentional breaking change and updating any code that sets custom max_attempts values

Copilot · 2026-01-29T20:33:47Z

      // Check if we're running out of time
      if (Date.now() - startTime > MAX_EXECUTION_TIME) {
        console.log(`⏰ Timeout approaching, skipping email ${email.id}`);
-        // Reset to pending for next run
        await pool.query(`
          UPDATE email_queue 
-          SET status = 'pending', attempts = attempts - 1
+          SET status = 'pending'
          WHERE id = $1
        `, [email.id]);
        return { success: false, skipped: true };


When an email is skipped due to timeout, the code resets its status to 'pending' but doesn't update the attempts field. In the new implementation, the attempts field serves as a record of how many times the email has been attempted. When a timeout occurs:

The email is marked as 'sending' and picked up from the queue (line 98)

Before any attempts are made by sendEmailWithRetry(), a timeout check occurs (line 113)

The email is reset to 'pending' without any attempt tracking (line 115-119)

On the next cron run, the same email will be picked up again and go through the same flow, potentially creating an infinite loop for emails that consistently trigger timeouts. Consider tracking partial attempts or implementing a separate mechanism to handle timeout scenarios to prevent emails from being stuck in an infinite retry loop.

Copilot · 2026-01-29T20:33:47Z

+  for (let attempt = 1; attempt <= MAX_RETRY_ATTEMPTS; attempt++) {
+    try {
+      const result = await sendEmail({
+        to: email.to_email,
+        subject: email.subject,
+        html: email.body_html,
+      });
+
+      if (result.success) {
+        console.log(`✅ Email sent: ${email.id} to ${email.to_email} (attempt ${attempt})`);
+        return { success: true, attempts: attempt };
+      }
+
+      lastError = result.error || 'Unknown send error';
+      console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`);
+
+    } catch (error: any) {
+      lastError = error.message || 'Unknown error';
+      console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`);
+    }


Immediate consecutive retries without any delay between attempts may cause issues with rate-limited email providers or transient network failures. Most email service providers (SMTP servers, SES, SendGrid, etc.) implement rate limiting, and sending 3 requests in rapid succession could:

Trigger rate limit violations, causing all 3 attempts to fail when a small delay might have succeeded

Exhaust API quotas unnecessarily

Not give sufficient time for transient network issues to resolve

Consider adding a small delay (e.g., 1-2 seconds) between retry attempts to improve the success rate of retries. This is especially important for serverless environments where cold starts and network initialization can cause initial failures that resolve quickly.

Copilot · 2026-01-29T20:33:48Z

      // Check if we're running out of time
      if (Date.now() - startTime > MAX_EXECUTION_TIME) {
        console.log(`⏰ Timeout approaching, skipping email ${email.id}`);
-        // Reset to pending for next run
        await pool.query(`
          UPDATE email_queue 
-          SET status = 'pending', attempts = attempts - 1
+          SET status = 'pending'
          WHERE id = $1
        `, [email.id]);
        return { success: false, skipped: true };
      }

-      try {
-        const sendResult = await sendEmail({
-          to: email.to_email,
-          subject: email.subject,
-          html: email.body_html,
-        });
-
-        if (sendResult.success) {
-          await pool.query(`
-            UPDATE email_queue 
-            SET status = 'sent', sent_at = NOW() 
-            WHERE id = $1
-          `, [email.id]);
-          console.log(`✅ Email sent: ${email.id} to ${email.to_email}`);
-          return { success: true };
-        } else {
-          throw new Error(sendResult.error || 'Unknown send error');
-        }
-
-      } catch (error: any) {
-        console.error(`❌ Email failed: ${email.id}`, error.message);
-
-        const newStatus = email.attempts >= email.max_attempts ? 'failed' : 'pending';
-        const backoffMinutes = Math.min(5 * Math.pow(2, email.attempts - 1), 30);
+      // Send with immediate retries
+      const sendResult = await sendEmailWithRetry(email);


The timeout check occurs before calling sendEmailWithRetry(), but the actual email sending with retries can take significant time. If sendEmailWithRetry() makes 3 attempts and each attempt times out after several seconds, the total time could exceed MAX_EXECUTION_TIME, causing the serverless function to be terminated mid-execution.

This could leave emails in a 'sending' status with partial attempts recorded in the database. Consider:

Moving the timeout check inside the retry loop to cancel between attempts

Passing remaining time budget to sendEmailWithRetry() so it can decide whether to attempt another retry

Implementing proper cleanup for emails that get terminated mid-execution

Copilot · 2026-01-29T20:33:48Z

+ * @param email - Email record from database
+ * @returns Result object with success status and final attempt count
+ */
+async function sendEmailWithRetry(email: any): Promise<{


The email parameter is typed as any, which loses type safety. Based on the database schema (backend/migrations/017_create_email_queue_table.sql), the email record should have specific fields like id, to_email, subject, body_html, attempts, max_attempts, etc.

Consider defining an interface for the email queue record to improve type safety and code maintainability:

interface EmailQueueRecord { id: string; to_email: string; subject: string; body_html: string; attempts: number; max_attempts: number; // ... other fields }

Then update the function signature to use this type instead of any.

Copilot · 2026-01-29T20:33:48Z

+  let lastError = '';
+
+  for (let attempt = 1; attempt <= MAX_RETRY_ATTEMPTS; attempt++) {
+    try {
+      const result = await sendEmail({
+        to: email.to_email,
+        subject: email.subject,
+        html: email.body_html,
+      });
+
+      if (result.success) {
+        console.log(`✅ Email sent: ${email.id} to ${email.to_email} (attempt ${attempt})`);
+        return { success: true, attempts: attempt };
+      }
+
+      lastError = result.error || 'Unknown send error';
+      console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`);
+
+    } catch (error: any) {
+      lastError = error.message || 'Unknown error';
+      console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`);
+    }
+  }
+
+  // All attempts exhausted
+  console.error(`❌ Email failed after ${MAX_RETRY_ATTEMPTS} attempts: ${email.id}`);
+  return { success: false, attempts: MAX_RETRY_ATTEMPTS, error: lastError };


The lastError variable is initialized as an empty string, which means if all retry attempts fail but no error message is captured, sendResult.error could be an empty string. When this is stored in the database at line 142, it will save an empty string to the last_error field, which is less helpful for debugging than storing null or a meaningful default message.

Consider initializing lastError to a more descriptive default like 'All retry attempts failed with unknown error' or checking if lastError is empty before storing it in the database.

Copilot · 2026-01-29T20:33:48Z

+      // Send with immediate retries
+      const sendResult = await sendEmailWithRetry(email);
+
+      if (sendResult.success) {
+        // Mark as sent
+        await pool.query(`
+          UPDATE email_queue 
+          SET status = 'sent', 
+              sent_at = NOW(),
+              attempts = $1
+          WHERE id = $2
+        `, [sendResult.attempts, email.id]);
+        return { success: true };

+      } else {
+        // Mark as failed after all retries exhausted
        await pool.query(`
          UPDATE email_queue 
-          SET status = $1, 
-              last_error = $2,
-              scheduled_for = NOW() + INTERVAL '${backoffMinutes} minutes'
+          SET status = 'failed', 
+              last_error = $1,
+              attempts = $2
          WHERE id = $3
-        `, [newStatus, error.message, email.id]);
-
+        `, [sendResult.error, sendResult.attempts, email.id]);


The retryFailedEmails() function at line 220-241 resets attempts = 0 when retrying failed emails, but this doesn't align with the new retry logic. When these emails are picked up by processEmailQueue(), they will go through sendEmailWithRetry() which will attempt 3 more times regardless of previous attempts.

This means an email could potentially be retried 6+ times total (3 attempts in first run, reset to pending via retryFailedEmails(), then 3 more attempts), which contradicts the documented behavior of having a maximum of 3 attempts.

The retryFailedEmails() function should either:

Be updated to work with the new immediate retry approach

Be removed if manual retries are no longer supported in the new design

Keep the attempts count to track total lifetime attempts across manual retries

feat(email): Implement immediate retries for sending emails with enha…

1247b64

…nced error handling

Copilot AI review requested due to automatic review settings January 29, 2026 20:29

Copilot started reviewing on behalf of az4mm January 29, 2026 20:29 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

## Implement Immediate Email Retries with Enhanced Error Handling#17

## Implement Immediate Email Retries with Enhanced Error Handling#17
az4mm wants to merge 1 commit into
mainfrom
azam/mail

az4mm commented Jan 29, 2026

Uh oh!

vercel Bot commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

az4mm commented Jan 29, 2026

Description

Changes

backend/src/services/email-queue.service.ts

Motivation

Technical Details

Testing

Uh oh!

vercel Bot commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`backend/src/services/email-queue.service.ts`