Resilience & Error Handling
EverTask helps you build fault-tolerant background tasks that can recover from transient failures automatically.
Table of Contents
- Retry Policies
- Timeout Management
- CancellationToken Usage
- Handling WorkerService Stops
- Error Observation
- Best Practices
Retry Policies
Retry policies let you automatically retry failed tasks without writing custom error-handling code. When a task fails, the retry policy kicks in and tries again based on the rules you’ve defined.
EverTask supports both simple retry configurations and advanced exception filtering to fail-fast on permanent errors while retrying transient failures.
Default Linear Retry Policy
By default, tasks use LinearRetryPolicy, which you can configure globally:
builder.Services.AddEverTask(opt =>
{
// Default: 3 attempts with 500ms delay between retries
opt.SetDefaultRetryPolicy(new LinearRetryPolicy(3, TimeSpan.FromMilliseconds(500)));
});
LinearRetryPolicy Options
Fixed Retry Count and Delay
// 5 attempts with 1 second between retries
builder.Services.AddEverTask(opt =>
{
opt.SetDefaultRetryPolicy(new LinearRetryPolicy(5, TimeSpan.FromSeconds(1)));
});
Custom Delay Array
// Custom delays for each retry
var delays = new TimeSpan[]
{
TimeSpan.FromMilliseconds(100), // First retry after 100ms
TimeSpan.FromMilliseconds(500), // Second retry after 500ms
TimeSpan.FromSeconds(2), // Third retry after 2s
TimeSpan.FromSeconds(5) // Fourth retry after 5s
};
builder.Services.AddEverTask(opt =>
{
opt.SetDefaultRetryPolicy(new LinearRetryPolicy(delays));
});
Per-Handler Retry Policy
You can override the global policy on a per-handler basis when you need different retry behavior for specific task types:
public class CriticalTaskHandler : EverTaskHandler<CriticalTask>
{
// More aggressive retries for critical tasks
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(10, TimeSpan.FromSeconds(1));
public override async Task Handle(CriticalTask task, CancellationToken cancellationToken)
{
// Task logic - will retry up to 10 times if it fails
}
}
Per-Handler Custom Delays
public class CustomRetryHandler : EverTaskHandler<CustomRetryTask>
{
// Exponential backoff-like delays
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(new TimeSpan[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(4),
TimeSpan.FromSeconds(8),
TimeSpan.FromSeconds(16)
});
public override async Task Handle(CustomRetryTask task, CancellationToken cancellationToken)
{
// Task logic with custom retry pattern
}
}
Custom Retry Policies
Need full control over retry behavior? Implement IRetryPolicy yourself:
public class ExponentialBackoffPolicy : IRetryPolicy
{
private readonly int _maxAttempts;
private readonly TimeSpan _baseDelay;
public ExponentialBackoffPolicy(int maxAttempts = 5, TimeSpan? baseDelay = null)
{
_maxAttempts = maxAttempts;
_baseDelay = baseDelay ?? TimeSpan.FromSeconds(1);
}
public async Task Execute(Func<CancellationToken, Task> action, CancellationToken token = default)
{
for (int attempt = 0; attempt < _maxAttempts; attempt++)
{
try
{
await action(token);
return; // Success
}
catch (Exception) when (attempt < _maxAttempts - 1)
{
// Calculate exponential delay: base * 2^attempt
var delay = TimeSpan.FromMilliseconds(
_baseDelay.TotalMilliseconds * Math.Pow(2, attempt));
await Task.Delay(delay, token);
// Loop continues to retry
}
}
}
}
// Use in handler
public class MyHandler : EverTaskHandler<MyTask>
{
public override IRetryPolicy? RetryPolicy => new ExponentialBackoffPolicy(maxAttempts: 5);
}
Polly Integration
If you’re already using Polly in your project, you can wrap it in an IRetryPolicy implementation:
using Polly;
public class PollyRetryPolicy : IRetryPolicy
{
private readonly AsyncRetryPolicy _pollyPolicy;
public PollyRetryPolicy()
{
_pollyPolicy = Policy
.Handle<HttpRequestException>() // Only retry on HTTP errors
.Or<TimeoutException>()
.WaitAndRetryAsync(
retryCount: 3,
sleepDurationProvider: retryAttempt =>
TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)),
onRetry: (exception, timeSpan, retryCount, context) =>
{
// Log retry attempt
Console.WriteLine($"Retry {retryCount} after {timeSpan.TotalSeconds}s due to {exception.GetType().Name}");
});
}
public async Task Execute(Func<CancellationToken, Task> action, CancellationToken token = default)
{
await _pollyPolicy.ExecuteAsync(async (ct) =>
{
await action(ct);
}, token);
}
}
// Use in handler
public class ApiCallHandler : EverTaskHandler<ApiCallTask>
{
public override IRetryPolicy? RetryPolicy => new PollyRetryPolicy();
}
// Or set globally
builder.Services.AddEverTask(opt =>
{
opt.SetDefaultRetryPolicy(new PollyRetryPolicy());
});
Exception Filtering
Exception filtering allows you to configure retry policies to only retry transient errors while failing fast on permanent errors. This prevents wasting retry attempts and resources on failures that won’t fix themselves.
Why Exception Filtering?
Without filtering, all exceptions trigger retries:
public override async Task Handle(MyTask task, CancellationToken ct)
{
// Bug: task.Data is null
var length = task.Data.Length; // NullReferenceException
}
With default retry policy:
- Attempt 1:
NullReferenceException→ Retry (wasteful) - Attempt 2:
NullReferenceException→ Retry (wasteful) - Attempt 3:
NullReferenceException→ Retry (wasteful) - Attempt 4:
NullReferenceException→ Retry (wasteful) - Attempt 5:
NullReferenceException→ Final failure
Total wasted time: 5 attempts × retry delay
With exception filtering, you can fail immediately on permanent errors like NullReferenceException, ArgumentException, or validation failures.
Whitelist Approach (Handle)
Use Handle<TException>() to specify which exceptions should be retried. All other exceptions fail immediately.
public class DatabaseTaskHandler : EverTaskHandler<DatabaseTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(5, TimeSpan.FromSeconds(2))
.Handle<DbException>()
.Handle<SqlException>()
.Handle<TimeoutException>();
public override async Task Handle(DatabaseTask task, CancellationToken ct)
{
await _dbContext.SaveChangesAsync(ct);
}
}
Result: Only database-related exceptions trigger retries. Application logic errors (ArgumentException, NullReferenceException) fail immediately.
Blacklist Approach (DoNotHandle)
Use DoNotHandle<TException>() to specify which exceptions should NOT be retried. All other exceptions trigger retries.
public class ApiTaskHandler : EverTaskHandler<ApiTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(5, TimeSpan.FromSeconds(1))
.DoNotHandle<ArgumentException>()
.DoNotHandle<ArgumentNullException>()
.DoNotHandle<InvalidOperationException>();
public override async Task Handle(ApiTask task, CancellationToken ct)
{
await _apiClient.CallAsync(task.Endpoint, ct);
}
}
Result: Permanent errors (argument validation, logic errors) fail immediately. Network errors, timeouts, and other transient issues trigger retries.
Multiple Exception Types (Params Overload)
For whitelisting or blacklisting many exception types, use the params Type[] overload:
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(5, TimeSpan.FromSeconds(2))
.Handle(
typeof(DbException),
typeof(SqlException),
typeof(HttpRequestException),
typeof(IOException),
typeof(SocketException),
typeof(TimeoutException)
);
This is more concise than chaining multiple Handle<T>() calls.
Predicate-Based Filtering (HandleWhen)
For complex scenarios, use HandleWhen(Func<Exception, bool>) with custom logic:
public class HttpApiHandler : EverTaskHandler<HttpApiTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(1))
.HandleWhen(ex =>
{
// Only retry 5xx server errors
if (ex is HttpRequestException httpEx)
{
var statusCode = httpEx.StatusCode;
return statusCode >= 500 && statusCode < 600;
}
// Retry timeout and network errors
return ex is TimeoutException or SocketException;
});
public override async Task Handle(HttpApiTask task, CancellationToken ct)
{
var response = await _httpClient.GetAsync(task.Url, ct);
response.EnsureSuccessStatusCode();
}
}
Use Cases for Predicates:
- HTTP status code-based retry (5xx yes, 4xx no)
- Exception message pattern matching
- Combining multiple conditions
- Dynamic retry logic based on task context
Predefined Exception Sets
EverTask provides extension methods for common transient error patterns:
// Database errors only
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(5, TimeSpan.FromSeconds(2))
.HandleTransientDatabaseErrors();
// Network errors only
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(1))
.HandleTransientNetworkErrors();
// All transient errors (database + network)
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(5, TimeSpan.FromSeconds(2))
.HandleAllTransientErrors();
Included Exception Types:
HandleTransientDatabaseErrors():
DbExceptionTimeoutException(database-related)
HandleTransientNetworkErrors():
HttpRequestExceptionSocketExceptionWebExceptionTaskCanceledException
HandleAllTransientErrors():
- All of the above combined
You can combine predefined sets with custom exceptions:
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(5, TimeSpan.FromSeconds(2))
.HandleAllTransientErrors()
.Handle<MyCustomTransientException>();
Exception Filter Priority
When multiple filtering strategies are configured, they’re evaluated in this order:
- Predicate (
HandleWhen): Takes precedence over all other filters - Whitelist (
Handle<T>): Only retry whitelisted exceptions - Blacklist (
DoNotHandle<T>): Retry all except blacklisted exceptions - Default: Retry all exceptions except
OperationCanceledExceptionandTimeoutException
Important: You cannot mix Handle<T>() and DoNotHandle<T>() - choose either whitelist or blacklist approach, not both.
// Invalid - throws InvalidOperationException
var policy = new LinearRetryPolicy(3, TimeSpan.FromSeconds(1))
.Handle<DbException>()
.DoNotHandle<ArgumentException>(); // ERROR: Cannot mix approaches
Derived Exception Types
Exception filtering supports derived types using Type.IsAssignableFrom():
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(1))
.Handle<IOException>();
This will also retry:
FileNotFoundException(derives fromIOException)DirectoryNotFoundException(derives fromIOException)PathTooLongException(derives fromIOException)
You don’t need to explicitly whitelist every derived type.
OnRetry Lifecycle Callback
The OnRetry callback gives you visibility into individual retry attempts, enabling logging, metrics, alerting, and debugging of intermittent failures.
Basic Usage
Override OnRetry in your handler to track retry attempts:
public class SendEmailHandler : EverTaskHandler<SendEmailTask>
{
private readonly ILogger<SendEmailHandler> _logger;
public SendEmailHandler(ILogger<SendEmailHandler> logger)
{
_logger = logger;
}
public override ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
_logger.LogWarning(exception,
"Email task {TaskId} retry attempt {Attempt} after {DelayMs}ms: {ErrorMessage}",
taskId, attemptNumber, delay.TotalMilliseconds, exception.Message);
return ValueTask.CompletedTask;
}
public override async Task Handle(SendEmailTask task, CancellationToken ct)
{
await _emailService.SendAsync(task.To, task.Subject, task.Body, ct);
}
}
When OnRetry is Called:
Handle()executes and throws exception- Retry policy determines if exception should be retried (
ShouldRetry()) - If yes: Retry policy waits for delay period
OnRetry()called with attempt detailsHandle()retried
Important: OnRetry is only called for retry attempts, not the initial execution. If a task succeeds on the first attempt, OnRetry is never called.
Tracking Metrics
Use OnRetry to track retry metrics for monitoring and alerting:
public class MetricsTrackingHandler : EverTaskHandler<MyTask>
{
private readonly IMetrics _metrics;
public override ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
_metrics.IncrementCounter("task_retries", new
{
handler = GetType().Name,
attempt = attemptNumber,
exception_type = exception.GetType().Name
});
_metrics.RecordHistogram("retry_delay_ms", delay.TotalMilliseconds);
return ValueTask.CompletedTask;
}
}
Circuit Breaker Pattern
Implement basic circuit breaker logic using OnRetry:
public class CircuitBreakerHandler : EverTaskHandler<ExternalApiTask>
{
private static int _consecutiveFailures = 0;
private static DateTimeOffset? _circuitOpenedAt = null;
private readonly ILogger<CircuitBreakerHandler> _logger;
public override ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
Interlocked.Increment(ref _consecutiveFailures);
if (_consecutiveFailures >= 10 && _circuitOpenedAt == null)
{
_circuitOpenedAt = DateTimeOffset.UtcNow;
_logger.LogError(
"Circuit breaker opened due to {Failures} consecutive failures",
_consecutiveFailures);
// Send alerts, disable service, etc.
}
return ValueTask.CompletedTask;
}
public override async Task Handle(ExternalApiTask task, CancellationToken ct)
{
// Check circuit breaker
if (_circuitOpenedAt.HasValue &&
DateTimeOffset.UtcNow - _circuitOpenedAt.Value < TimeSpan.FromMinutes(5))
{
throw new InvalidOperationException("Circuit breaker is open");
}
await _apiClient.CallAsync(task.Endpoint, ct);
// Success - reset circuit breaker
Interlocked.Exchange(ref _consecutiveFailures, 0);
_circuitOpenedAt = null;
}
}
Note: For production circuit breaker implementations, consider using Polly’s circuit breaker policy instead of manual tracking.
Debugging Intermittent Failures
Use OnRetry to capture diagnostic information for failures that only occur occasionally:
public class DiagnosticHandler : EverTaskHandler<DataProcessingTask>
{
private readonly IDiagnosticService _diagnostics;
public override ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
_diagnostics.CaptureSnapshot(new
{
TaskId = taskId,
Attempt = attemptNumber,
Exception = exception.ToString(),
StackTrace = exception.StackTrace,
Timestamp = DateTimeOffset.UtcNow,
Environment = new
{
MachineName = Environment.MachineName,
ThreadId = Environment.CurrentManagedThreadId,
WorkingSet = Environment.WorkingSet
}
});
return ValueTask.CompletedTask;
}
}
Error Handling in OnRetry
If OnRetry throws an exception, it’s logged but does not prevent the retry attempt. The retry proceeds regardless of callback success or failure:
public override ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
// If this throws, it's logged but retry still happens
_externalMetricsService.TrackRetry(taskId, attemptNumber);
return ValueTask.CompletedTask;
}
This ensures that monitoring/logging failures don’t impact task execution reliability.
Async Operations in OnRetry
OnRetry returns ValueTask, allowing async operations like database logging:
public override async ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
// Log retry to database for audit trail
await _auditDb.LogRetryAttempt(new RetryAuditEntry
{
TaskId = taskId,
AttemptNumber = attemptNumber,
ExceptionType = exception.GetType().Name,
ExceptionMessage = exception.Message,
Delay = delay,
Timestamp = DateTimeOffset.UtcNow
});
}
Combining Exception Filtering and OnRetry
You can use both features together for comprehensive retry handling:
public class RobustDatabaseHandler : EverTaskHandler<DatabaseTask>
{
private readonly ILogger<RobustDatabaseHandler> _logger;
private readonly IMetrics _metrics;
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(
new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(5),
TimeSpan.FromSeconds(10)
})
.HandleTransientDatabaseErrors();
public override ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
_logger.LogWarning(exception,
"Database task {TaskId} retry {Attempt}/{MaxAttempts} after {DelayMs}ms",
taskId, attemptNumber, 4, delay.TotalMilliseconds);
_metrics.IncrementCounter("db_task_retries", new
{
attempt = attemptNumber,
exception = exception.GetType().Name
});
return ValueTask.CompletedTask;
}
public override async Task Handle(DatabaseTask task, CancellationToken ct)
{
await _dbContext.ProcessAsync(task.Data, ct);
}
}
Result:
- Only database exceptions trigger retries (fail-fast on logic errors)
- Each retry attempt is logged with context
- Metrics track retry patterns for monitoring
- Exponential backoff gives database time to recover
Circuit Breaker with Polly
public class CircuitBreakerRetryPolicy : IRetryPolicy
{
private readonly AsyncPolicy _policy;
public CircuitBreakerRetryPolicy()
{
var retryPolicy = Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)));
var circuitBreakerPolicy = Policy
.Handle<HttpRequestException>()
.CircuitBreakerAsync(
exceptionsAllowedBeforeBreaking: 5,
durationOfBreak: TimeSpan.FromMinutes(1));
// Combine retry + circuit breaker
_policy = Policy.WrapAsync(retryPolicy, circuitBreakerPolicy);
}
public async Task Execute(Func<CancellationToken, Task> action, CancellationToken token = default)
{
await _policy.ExecuteAsync(async (ct) => await action(ct), token);
}
}
Timeout Management
Timeouts prevent runaway tasks from consuming resources indefinitely. You can set them globally or per-handler.
Global Default Timeout
The simplest approach is setting a default timeout that applies to all tasks:
builder.Services.AddEverTask(opt =>
{
opt.SetDefaultTimeout(TimeSpan.FromMinutes(5)); // All tasks timeout after 5 minutes
});
Per-Handler Timeout
Different tasks have different performance characteristics, so you’ll often want to override the global timeout:
public class QuickTaskHandler : EverTaskHandler<QuickTask>
{
// This handler times out after 30 seconds
public override TimeSpan? Timeout => TimeSpan.FromSeconds(30);
public override async Task Handle(QuickTask task, CancellationToken cancellationToken)
{
// Must complete within 30 seconds
}
}
public class LongRunningTaskHandler : EverTaskHandler<LongRunningTask>
{
// This handler gets 2 hours
public override TimeSpan? Timeout => TimeSpan.FromHours(2);
public override async Task Handle(LongRunningTask task, CancellationToken cancellationToken)
{
// Long-running processing
}
}
Per-Queue Timeout
You can also set default timeouts at the queue level, which is useful when you organize tasks by execution characteristics:
builder.Services.AddEverTask(opt =>
{
opt.RegisterTasksFromAssembly(typeof(Program).Assembly);
})
.AddQueue("quick", q => q
.SetDefaultTimeout(TimeSpan.FromSeconds(30)))
.AddQueue("long-running", q => q
.SetDefaultTimeout(TimeSpan.FromHours(1)));
No Timeout
Sometimes you need a task to run to completion no matter how long it takes:
public class NoTimeoutHandler : EverTaskHandler<NoTimeoutTask>
{
// No timeout - runs until complete
public override TimeSpan? Timeout => null;
}
Handling Timeout
When a task times out, EverTask cancels its CancellationToken. Your task code should check this token regularly to respond to cancellation:
public class TimeoutAwareHandler : EverTaskHandler<TimeoutAwareTask>
{
public override TimeSpan? Timeout => TimeSpan.FromMinutes(5);
public override async Task Handle(TimeoutAwareTask task, CancellationToken cancellationToken)
{
try
{
for (int i = 0; i < 1000; i++)
{
// Check cancellation regularly
cancellationToken.ThrowIfCancellationRequested();
await ProcessItemAsync(i, cancellationToken);
}
}
catch (OperationCanceledException)
{
_logger.LogWarning("Task timed out after {Timeout}", Timeout);
throw; // Re-throw to mark as timed out
}
}
}
CancellationToken Usage
The CancellationToken parameter in your handler is how you implement cooperative cancellation. It gets signaled in three situations:
- The task’s timeout is reached
- Someone explicitly cancels the task via
dispatcher.Cancel(taskId) - The WorkerService is shutting down
Basic Usage
public override async Task Handle(MyTask task, CancellationToken cancellationToken)
{
// Pass token to async operations
var data = await _repository.GetDataAsync(cancellationToken);
// Check before expensive operations
cancellationToken.ThrowIfCancellationRequested();
await ProcessDataAsync(data, cancellationToken);
}
Loop Processing
public override async Task Handle(BatchTask task, CancellationToken cancellationToken)
{
foreach (var item in task.Items)
{
// Check at the start of each iteration
cancellationToken.ThrowIfCancellationRequested();
await ProcessItemAsync(item, cancellationToken);
}
}
Parallel Processing
public override async Task Handle(ParallelTask task, CancellationToken cancellationToken)
{
await Parallel.ForEachAsync(
task.Items,
new ParallelOptions
{
MaxDegreeOfParallelism = 4,
CancellationToken = cancellationToken // Pass to parallel operations
},
async (item, ct) =>
{
await ProcessItemAsync(item, ct);
});
}
HTTP Requests
public override async Task Handle(ApiCallTask task, CancellationToken cancellationToken)
{
using var httpClient = new HttpClient();
// Pass token to HTTP calls
var response = await httpClient.GetAsync(task.Url, cancellationToken);
var content = await response.Content.ReadAsStringAsync(cancellationToken);
}
Database Operations
public override async Task Handle(DataTask task, CancellationToken cancellationToken)
{
// Pass token to EF Core operations
var users = await _dbContext.Users
.Where(u => u.IsActive)
.ToListAsync(cancellationToken);
foreach (var user in users)
{
cancellationToken.ThrowIfCancellationRequested();
await ProcessUserAsync(user, cancellationToken);
}
}
Handling WorkerService Stops
When your application shuts down and the EverTask WorkerService stops, all running tasks get their CancellationToken cancelled.
Graceful Shutdown
If a task gets interrupted during shutdown, EverTask marks it as ServiceStopped and automatically re-queues it when your application restarts:
public override async Task Handle(MyTask task, CancellationToken cancellationToken)
{
try
{
await LongRunningOperationAsync(cancellationToken);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
{
_logger.LogWarning("Task interrupted by service shutdown");
// Clean up any partial work
await CleanupAsync();
// Re-throw so task is marked as ServiceStopped
throw;
}
}
Partial Execution Tracking
For long-running tasks, you might want to track progress so they can resume where they left off after a restart:
public record ProgressTrackingTask(Guid BatchId, int LastProcessedIndex) : IEverTask;
public class ProgressTrackingHandler : EverTaskHandler<ProgressTrackingTask>
{
public override async Task Handle(ProgressTrackingTask task, CancellationToken cancellationToken)
{
var items = await GetBatchItemsAsync(task.BatchId);
for (int i = task.LastProcessedIndex; i < items.Count; i++)
{
cancellationToken.ThrowIfCancellationRequested();
await ProcessItemAsync(items[i], cancellationToken);
// Save progress periodically
if (i % 10 == 0)
{
await SaveProgressAsync(task.BatchId, i);
}
}
}
}
Idempotent Operations
The best practice is designing tasks that can be safely retried multiple times without causing problems:
public override async Task Handle(IdempotentTask task, CancellationToken cancellationToken)
{
// Check if already processed
if (await IsAlreadyProcessedAsync(task.OperationId))
{
_logger.LogInformation("Task {OperationId} already completed, skipping", task.OperationId);
return;
}
// Process
await ProcessAsync(task, cancellationToken);
// Mark as completed
await MarkAsProcessedAsync(task.OperationId);
}
Error Observation
Beyond just retrying, you can observe and react to errors using lifecycle hooks in your handlers:
OnError Hook
public override ValueTask OnError(Guid taskId, Exception? exception, string? message)
{
_logger.LogError(exception, "Task {TaskId} failed: {Message}", taskId, message);
// Send alerts
if (exception is CriticalException)
{
_alertService.SendCriticalAlert(taskId, exception);
}
// Log to external monitoring
_telemetry.TrackException(exception);
// Dispatch compensation task
if (_shouldRollback)
{
_dispatcher.Dispatch(new RollbackTask(taskId));
}
return ValueTask.CompletedTask;
}
Error Categorization
public override ValueTask OnError(Guid taskId, Exception? exception, string? message)
{
switch (exception)
{
case OperationCanceledException:
_logger.LogWarning("Task {TaskId} was cancelled", taskId);
break;
case TimeoutException:
_logger.LogWarning("Task {TaskId} timed out", taskId);
// Maybe retry with longer timeout
break;
case HttpRequestException httpEx:
_logger.LogError(httpEx, "API call failed for task {TaskId}", taskId);
// Retry policy already handled this, just log
break;
case DbUpdateException dbEx:
_logger.LogError(dbEx, "Database error for task {TaskId}", taskId);
// Might need manual intervention
_alertService.SendDatabaseAlert(taskId, dbEx);
break;
default:
_logger.LogError(exception, "Unexpected error for task {TaskId}", taskId);
break;
}
return ValueTask.CompletedTask;
}
Best Practices
Retry Policies
- Use retries for transient failures - Things like network errors, database timeouts, or temporary service unavailability are good candidates for retry logic
- Don’t retry permanent failures - Validation errors, 404s, or authentication failures won’t fix themselves on retry
- Use exception filtering to fail-fast - Configure
Handle<T>()to only retry transient errors, saving resources and improving error visibility - Implement exponential backoff - Give failing services breathing room instead of hammering them with requests
- Set reasonable retry limits - Usually 3-5 attempts is enough; more than that and you’re probably dealing with a non-transient issue
- Log retry attempts - Use
OnRetrycallback to track patterns in retry behavior that reveal systemic problems
// ✅ Good: Exception filtering with transient errors only
public class ApiTaskHandler : EverTaskHandler<ApiTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(2))
.HandleTransientNetworkErrors(); // Only retry network issues
public override async Task Handle(ApiTask task, CancellationToken cancellationToken)
{
var response = await _httpClient.GetAsync(task.Url, cancellationToken);
response.EnsureSuccessStatusCode();
}
}
// ❌ Bad: No exception filtering - retries validation errors
public class ValidationTaskHandler : EverTaskHandler<ValidationTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(1));
public override async Task Handle(ValidationTask task, CancellationToken cancellationToken)
{
if (!task.IsValid)
{
throw new ValidationException(); // Will retry unnecessarily!
}
}
}
// ✅ Better: Fail-fast on validation errors
public class BetterValidationHandler : EverTaskHandler<ValidationTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(1))
.DoNotHandle<ValidationException>(); // Fail immediately on validation errors
public override async Task Handle(ValidationTask task, CancellationToken cancellationToken)
{
if (!task.IsValid)
{
throw new ValidationException(); // No retry, immediate failure
}
}
}
Exception Filtering Best Practices
- Whitelist approach for specific integrations - Use
Handle<T>()when you know exactly which errors are transient (database, HTTP API) - Blacklist approach for general handlers - Use
DoNotHandle<T>()when most errors are retriable except specific permanent ones - Use predefined sets -
HandleTransientDatabaseErrors()andHandleTransientNetworkErrors()cover common scenarios - Consider derived types - Exception filtering matches derived types automatically (e.g.,
Handle<IOException>()catchesFileNotFoundException) - Don’t over-filter - Only filter when you’re confident an error is truly permanent vs. transient
// ✅ Good: Whitelist for database operations
public class DatabaseTaskHandler : EverTaskHandler<DatabaseTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(5, TimeSpan.FromSeconds(2))
.HandleTransientDatabaseErrors(); // Retry DB timeouts, deadlocks, etc.
}
// ✅ Good: Predicate for HTTP status codes
public class HttpApiHandler : EverTaskHandler<HttpApiTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(1))
.HandleWhen(ex => ex is HttpRequestException httpEx && httpEx.StatusCode >= 500);
// Only retry 5xx server errors, fail fast on 4xx client errors
}
// ❌ Bad: Mixing whitelist and blacklist
public class BadHandler : EverTaskHandler<BadTask>
{
public override IRetryPolicy? RetryPolicy => new LinearRetryPolicy(3, TimeSpan.FromSeconds(1))
.Handle<HttpRequestException>()
.DoNotHandle<ArgumentException>(); // ERROR: Cannot mix approaches!
}
OnRetry Callback Best Practices
- Keep callbacks fast -
OnRetryis invoked synchronously during retry flow; avoid expensive operations - Use for observability - Log retry attempts, track metrics, send alerts on excessive retries
- Don’t throw exceptions - Exceptions in
OnRetryare logged but don’t prevent retry (by design) - Use structured logging - Include task ID, attempt number, and exception type for easy filtering
- Track retry metrics - Monitor retry rates to detect systemic issues early
// ✅ Good: Fast, informative OnRetry callback
public class EmailHandler : EverTaskHandler<SendEmailTask>
{
private readonly ILogger<EmailHandler> _logger;
private readonly IMetrics _metrics;
public override ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
_logger.LogWarning(exception,
"Email task {TaskId} retry {Attempt} after {DelayMs}ms",
taskId, attemptNumber, delay.TotalMilliseconds);
_metrics.Increment("email_retries", new { attempt = attemptNumber });
return ValueTask.CompletedTask;
}
}
// ❌ Bad: Expensive operations in OnRetry
public class SlowHandler : EverTaskHandler<SlowTask>
{
public override async ValueTask OnRetry(Guid taskId, int attemptNumber, Exception exception, TimeSpan delay)
{
// BAD: Expensive HTTP call blocks retry
await _httpClient.PostAsync("https://metrics-api.com/track", ...);
// BAD: Database query on hot path
await _dbContext.RetryLogs.AddAsync(new RetryLog { ... });
await _dbContext.SaveChangesAsync();
}
}
Timeouts
- Set appropriate timeouts - Base them on expected execution time plus a reasonable buffer
- Monitor timeout rates - If tasks are timing out frequently, you’ve got a performance problem to investigate
- Handle timeouts gracefully - Clean up resources and save state when possible
- Different timeouts for different task types - A quick API call shouldn’t have the same timeout as a report generation job
// ✅ Good: Appropriate timeout for task type
public class QuickApiCallHandler : EverTaskHandler<QuickApiCallTask>
{
// Quick API call
public override TimeSpan? Timeout => TimeSpan.FromSeconds(30);
}
public class ReportGenerationHandler : EverTaskHandler<ReportGenerationTask>
{
// Long-running report
public override TimeSpan? Timeout => TimeSpan.FromMinutes(30);
}
CancellationToken
- Always check the token - Put checks in loops and before expensive operations
- Pass to all async operations - Let the framework propagate cancellation through your call stack
- Don’t catch OperationCanceledException - Unless you need specific cleanup, let it bubble up
- Test cancellation - Actually verify your handlers respond correctly to cancellation signals
// ✅ Good: Proper cancellation handling
public override async Task Handle(MyTask task, CancellationToken cancellationToken)
{
for (int i = 0; i < 1000; i++)
{
cancellationToken.ThrowIfCancellationRequested();
await ProcessAsync(i, cancellationToken);
}
}
// ❌ Bad: Ignoring cancellation token
public override async Task Handle(MyTask task, CancellationToken cancellationToken)
{
for (int i = 0; i < 1000; i++)
{
await ProcessAsync(i); // Not passing token!
}
}
Error Handling
- Log errors appropriately - Match log levels to severity
- Don’t swallow exceptions - Your retry policy can’t work if you catch everything
- Use OnError for side effects - Things like alerting, telemetry, or triggering compensation tasks
- Design for idempotency - Make sure tasks can be safely retried without causing duplicate side effects
// ✅ Good: Let exceptions bubble for retry
public override async Task Handle(MyTask task, CancellationToken cancellationToken)
{
await ProcessAsync(task, cancellationToken); // Exceptions propagate to retry policy
}
// ❌ Bad: Swallowing exceptions
public override async Task Handle(MyTask task, CancellationToken cancellationToken)
{
try
{
await ProcessAsync(task, cancellationToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error occurred");
// Exception swallowed - retry policy won't trigger!
}
}
Next Steps
- Monitoring - Track task failures and retries
- Advanced Features - Continuations and error compensation
- Configuration Reference - All timeout and retry options
- Architecture - How retry and timeout mechanisms work internally