Conversation
…nced error handling
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR refactors the email queue processing service to implement immediate consecutive retries (up to 3 attempts) when email sending fails, replacing the previous exponential backoff strategy that scheduled retries for future cron runs.
Changes:
- Introduced
sendEmailWithRetry()function that performs up to 3 immediate retry attempts without delays between them - Removed exponential backoff logic and scheduled retry mechanism from queue processing
- Simplified the queue selection query by removing the
attempts < max_attemptsfilter since retries now happen inline
Comments suppressed due to low confidence (2)
backend/src/services/email-queue.service.ts:151
- With immediate retries and parallel processing via
Promise.all(), the total time to process a batch can increase significantly. If all 25 emails (default batch size) fail and each requires 3 retry attempts, and each attempt takes (for example) 2 seconds, the total time would be 25 * 3 * 2 = 150 seconds, far exceeding the default 25-second timeout.
This defeats the purpose of the timeout protection and will likely result in serverless function timeouts. The previous exponential backoff approach handled this better by deferring retries to future cron runs. Consider:
- Implementing a timeout budget per email (e.g.,
MAX_EXECUTION_TIME / batch_size) - Reducing MAX_RETRY_ATTEMPTS when operating in parallel mode
- Switching to sequential processing when retries are needed
- Passing the time budget to
sendEmailWithRetry()to limit retry attempts based on remaining time
const sendPromises = result.rows.map(async (email) => {
// Check if we're running out of time
if (Date.now() - startTime > MAX_EXECUTION_TIME) {
console.log(`⏰ Timeout approaching, skipping email ${email.id}`);
await pool.query(`
UPDATE email_queue
SET status = 'pending'
WHERE id = $1
`, [email.id]);
return { success: false, skipped: true };
}
// Send with immediate retries
const sendResult = await sendEmailWithRetry(email);
if (sendResult.success) {
// Mark as sent
await pool.query(`
UPDATE email_queue
SET status = 'sent',
sent_at = NOW(),
attempts = $1
WHERE id = $2
`, [sendResult.attempts, email.id]);
return { success: true };
} else {
// Mark as failed after all retries exhausted
await pool.query(`
UPDATE email_queue
SET status = 'failed',
last_error = $1,
attempts = $2
WHERE id = $3
`, [sendResult.error, sendResult.attempts, email.id]);
return { success: false };
}
});
// Wait for all emails to complete
const results = await Promise.all(sendPromises);
backend/src/services/email-queue.service.ts:108
- The query no longer filters by
attempts < max_attempts, but the database has an index defined specifically for this condition (idx_email_queue_pending in migration 017_create_email_queue_table.sql:43). This index will no longer be utilized efficiently, which could degrade query performance as the email_queue table grows. The index definition is:
CREATE INDEX IF NOT EXISTS idx_email_queue_pending
ON email_queue(priority, scheduled_for)
WHERE status = 'pending' AND attempts < max_attempts;
Consider either:
- Creating a new migration to drop the old index and create a new one without the
attempts < max_attemptscondition - Adding the filter back to the query to maintain index compatibility
Without this, PostgreSQL will perform a full sequential scan on the table instead of using the index.
const result = await pool.query(`
UPDATE email_queue
SET status = 'sending'
WHERE id IN (
SELECT id FROM email_queue
WHERE status = 'pending'
AND scheduled_for <= NOW()
ORDER BY priority ASC, scheduled_for ASC
LIMIT $1
FOR UPDATE SKIP LOCKED
)
RETURNING *
`, [limit]);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async function sendEmailWithRetry(email: any): Promise<{ | ||
| success: boolean; | ||
| attempts: number; | ||
| error?: string; | ||
| }> { | ||
| let lastError = ''; | ||
|
|
||
| for (let attempt = 1; attempt <= MAX_RETRY_ATTEMPTS; attempt++) { |
There was a problem hiding this comment.
The max_attempts field from the database is no longer being respected. The database schema allows per-email configuration of maximum retry attempts via the max_attempts column (default: 3), but the code now uses a hardcoded MAX_RETRY_ATTEMPTS = 3 constant and ignores the database value.
This breaks the existing API contract where emails can be queued with custom retry limits. If any part of the system relies on setting different max_attempts values for different email types (e.g., 5 attempts for critical transactional emails vs 1 attempt for marketing emails), this functionality is now lost.
Consider either:
- Passing
email.max_attemptstosendEmailWithRetry()and using that value instead of the constant - Documenting that this is an intentional breaking change and updating any code that sets custom
max_attemptsvalues
| // Check if we're running out of time | ||
| if (Date.now() - startTime > MAX_EXECUTION_TIME) { | ||
| console.log(`⏰ Timeout approaching, skipping email ${email.id}`); | ||
| // Reset to pending for next run | ||
| await pool.query(` | ||
| UPDATE email_queue | ||
| SET status = 'pending', attempts = attempts - 1 | ||
| SET status = 'pending' | ||
| WHERE id = $1 | ||
| `, [email.id]); | ||
| return { success: false, skipped: true }; |
There was a problem hiding this comment.
When an email is skipped due to timeout, the code resets its status to 'pending' but doesn't update the attempts field. In the new implementation, the attempts field serves as a record of how many times the email has been attempted. When a timeout occurs:
- The email is marked as 'sending' and picked up from the queue (line 98)
- Before any attempts are made by
sendEmailWithRetry(), a timeout check occurs (line 113) - The email is reset to 'pending' without any attempt tracking (line 115-119)
On the next cron run, the same email will be picked up again and go through the same flow, potentially creating an infinite loop for emails that consistently trigger timeouts. Consider tracking partial attempts or implementing a separate mechanism to handle timeout scenarios to prevent emails from being stuck in an infinite retry loop.
| for (let attempt = 1; attempt <= MAX_RETRY_ATTEMPTS; attempt++) { | ||
| try { | ||
| const result = await sendEmail({ | ||
| to: email.to_email, | ||
| subject: email.subject, | ||
| html: email.body_html, | ||
| }); | ||
|
|
||
| if (result.success) { | ||
| console.log(`✅ Email sent: ${email.id} to ${email.to_email} (attempt ${attempt})`); | ||
| return { success: true, attempts: attempt }; | ||
| } | ||
|
|
||
| lastError = result.error || 'Unknown send error'; | ||
| console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`); | ||
|
|
||
| } catch (error: any) { | ||
| lastError = error.message || 'Unknown error'; | ||
| console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`); | ||
| } |
There was a problem hiding this comment.
Immediate consecutive retries without any delay between attempts may cause issues with rate-limited email providers or transient network failures. Most email service providers (SMTP servers, SES, SendGrid, etc.) implement rate limiting, and sending 3 requests in rapid succession could:
- Trigger rate limit violations, causing all 3 attempts to fail when a small delay might have succeeded
- Exhaust API quotas unnecessarily
- Not give sufficient time for transient network issues to resolve
Consider adding a small delay (e.g., 1-2 seconds) between retry attempts to improve the success rate of retries. This is especially important for serverless environments where cold starts and network initialization can cause initial failures that resolve quickly.
| // Check if we're running out of time | ||
| if (Date.now() - startTime > MAX_EXECUTION_TIME) { | ||
| console.log(`⏰ Timeout approaching, skipping email ${email.id}`); | ||
| // Reset to pending for next run | ||
| await pool.query(` | ||
| UPDATE email_queue | ||
| SET status = 'pending', attempts = attempts - 1 | ||
| SET status = 'pending' | ||
| WHERE id = $1 | ||
| `, [email.id]); | ||
| return { success: false, skipped: true }; | ||
| } | ||
|
|
||
| try { | ||
| const sendResult = await sendEmail({ | ||
| to: email.to_email, | ||
| subject: email.subject, | ||
| html: email.body_html, | ||
| }); | ||
|
|
||
| if (sendResult.success) { | ||
| await pool.query(` | ||
| UPDATE email_queue | ||
| SET status = 'sent', sent_at = NOW() | ||
| WHERE id = $1 | ||
| `, [email.id]); | ||
| console.log(`✅ Email sent: ${email.id} to ${email.to_email}`); | ||
| return { success: true }; | ||
| } else { | ||
| throw new Error(sendResult.error || 'Unknown send error'); | ||
| } | ||
|
|
||
| } catch (error: any) { | ||
| console.error(`❌ Email failed: ${email.id}`, error.message); | ||
|
|
||
| const newStatus = email.attempts >= email.max_attempts ? 'failed' : 'pending'; | ||
| const backoffMinutes = Math.min(5 * Math.pow(2, email.attempts - 1), 30); | ||
| // Send with immediate retries | ||
| const sendResult = await sendEmailWithRetry(email); |
There was a problem hiding this comment.
The timeout check occurs before calling sendEmailWithRetry(), but the actual email sending with retries can take significant time. If sendEmailWithRetry() makes 3 attempts and each attempt times out after several seconds, the total time could exceed MAX_EXECUTION_TIME, causing the serverless function to be terminated mid-execution.
This could leave emails in a 'sending' status with partial attempts recorded in the database. Consider:
- Moving the timeout check inside the retry loop to cancel between attempts
- Passing remaining time budget to
sendEmailWithRetry()so it can decide whether to attempt another retry - Implementing proper cleanup for emails that get terminated mid-execution
| * @param email - Email record from database | ||
| * @returns Result object with success status and final attempt count | ||
| */ | ||
| async function sendEmailWithRetry(email: any): Promise<{ |
There was a problem hiding this comment.
The email parameter is typed as any, which loses type safety. Based on the database schema (backend/migrations/017_create_email_queue_table.sql), the email record should have specific fields like id, to_email, subject, body_html, attempts, max_attempts, etc.
Consider defining an interface for the email queue record to improve type safety and code maintainability:
interface EmailQueueRecord {
id: string;
to_email: string;
subject: string;
body_html: string;
attempts: number;
max_attempts: number;
// ... other fields
}Then update the function signature to use this type instead of any.
| let lastError = ''; | ||
|
|
||
| for (let attempt = 1; attempt <= MAX_RETRY_ATTEMPTS; attempt++) { | ||
| try { | ||
| const result = await sendEmail({ | ||
| to: email.to_email, | ||
| subject: email.subject, | ||
| html: email.body_html, | ||
| }); | ||
|
|
||
| if (result.success) { | ||
| console.log(`✅ Email sent: ${email.id} to ${email.to_email} (attempt ${attempt})`); | ||
| return { success: true, attempts: attempt }; | ||
| } | ||
|
|
||
| lastError = result.error || 'Unknown send error'; | ||
| console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`); | ||
|
|
||
| } catch (error: any) { | ||
| lastError = error.message || 'Unknown error'; | ||
| console.warn(`⚠️ Email attempt ${attempt} failed: ${email.id} - ${lastError}`); | ||
| } | ||
| } | ||
|
|
||
| // All attempts exhausted | ||
| console.error(`❌ Email failed after ${MAX_RETRY_ATTEMPTS} attempts: ${email.id}`); | ||
| return { success: false, attempts: MAX_RETRY_ATTEMPTS, error: lastError }; |
There was a problem hiding this comment.
The lastError variable is initialized as an empty string, which means if all retry attempts fail but no error message is captured, sendResult.error could be an empty string. When this is stored in the database at line 142, it will save an empty string to the last_error field, which is less helpful for debugging than storing null or a meaningful default message.
Consider initializing lastError to a more descriptive default like 'All retry attempts failed with unknown error' or checking if lastError is empty before storing it in the database.
| // Send with immediate retries | ||
| const sendResult = await sendEmailWithRetry(email); | ||
|
|
||
| if (sendResult.success) { | ||
| // Mark as sent | ||
| await pool.query(` | ||
| UPDATE email_queue | ||
| SET status = 'sent', | ||
| sent_at = NOW(), | ||
| attempts = $1 | ||
| WHERE id = $2 | ||
| `, [sendResult.attempts, email.id]); | ||
| return { success: true }; | ||
|
|
||
| } else { | ||
| // Mark as failed after all retries exhausted | ||
| await pool.query(` | ||
| UPDATE email_queue | ||
| SET status = $1, | ||
| last_error = $2, | ||
| scheduled_for = NOW() + INTERVAL '${backoffMinutes} minutes' | ||
| SET status = 'failed', | ||
| last_error = $1, | ||
| attempts = $2 | ||
| WHERE id = $3 | ||
| `, [newStatus, error.message, email.id]); | ||
|
|
||
| `, [sendResult.error, sendResult.attempts, email.id]); |
There was a problem hiding this comment.
The retryFailedEmails() function at line 220-241 resets attempts = 0 when retrying failed emails, but this doesn't align with the new retry logic. When these emails are picked up by processEmailQueue(), they will go through sendEmailWithRetry() which will attempt 3 more times regardless of previous attempts.
This means an email could potentially be retried 6+ times total (3 attempts in first run, reset to pending via retryFailedEmails(), then 3 more attempts), which contradicts the documented behavior of having a maximum of 3 attempts.
The retryFailedEmails() function should either:
- Be updated to work with the new immediate retry approach
- Be removed if manual retries are no longer supported in the new design
- Keep the attempts count to track total lifetime attempts across manual retries
Description
This PR refactors the email queue processing service to implement immediate consecutive retries when sending emails fails, replacing the previous exponential backoff approach with scheduled future retries.
Changes
backend/src/services/email-queue.service.tssendEmailWithRetry()function: Encapsulates the retry logic for sending a single email with up to 3 consecutive attempts (configurable viaMAX_RETRY_ATTEMPTS)attempts < max_attemptsfilter from the initial query since retries happen inlineattempts = attempts + 1increment during queue pickup (now tracked via the retry function)sentorfailedstatus after processing, with no intermediatependingreschedulesMotivation
The previous implementation used exponential backoff which:
The new immediate retry approach:
Technical Details
Testing
failedafter 3 unsuccessful attempts