Architecture & Internals

This document explains how EverTask works internally, its architecture, performance characteristics, and design decisions.

Overview
Task Execution Flow
Efficient Task Processing
Scheduler Architecture
Performance Optimizations
Threading Model
Design Principles

Overview

EverTask is a high-performance background task execution library built for persistence and reliability. The architecture is built around several key principles:

Event-driven scheduling instead of polling
Aggressive caching to minimize allocations
Fully asynchronous execution from top to bottom
Handles extreme loads (>10k tasks/sec without breaking a sweat)
Built-in resilience with retry policies, timeouts, and graceful degradation

Component Overview

┌─────────────┐
│ Application │
└──────┬──────┘
       │ Dispatch
       ▼
┌─────────────┐     ┌──────────────┐
│  Dispatcher │────>│ TaskStorage  │
└──────┬──────┘     └──────────────┘
       │
       ▼
┌─────────────┐     ┌──────────────┐
│   Scheduler │────>│ PriorityQueue│
└──────┬──────┘     └──────────────┘
       │
       ▼
┌─────────────┐     ┌──────────────┐
│ WorkerQueue │────>│ BoundedQueue │
└──────┬──────┘     └──────────────┘
       │
       ▼
┌─────────────┐     ┌──────────────┐
│   Workers   │────>│   Handlers   │
└─────────────┘     └──────────────┘

Task Execution Flow

1. Task Dispatch

When you call dispatcher.Dispatch():

Application Code
    │
    ▼
ITaskDispatcher.Dispatch(task)
    │
    ├──> Serialize task (if storage configured)
    ├──> Persist to storage
    ├──> Determine target queue
    │
    ├──> Immediate execution?
    │    ├──> Yes: Enqueue to WorkerQueue
    │    └──> No: Schedule with Scheduler
    │
    └──> Return task ID

2. Immediate Execution Path

For immediate (fire-and-forget) tasks:

Dispatcher
    │
    ▼
WorkerQueueManager.TryEnqueue()
    │
    ├──> Lookup target queue
    ├──> Check queue capacity
    │
    └──> Write to BoundedQueue (Channel)
         │
         ▼
    WorkerService receives task
         │
         ▼
    WorkerExecutor.Execute()
         │
         ├──> Create service scope
         ├──> Resolve handler
         ├──> Apply retry policy
         ├──> Apply timeout
         ├──> Execute handler.Handle()
         ├──> Update storage status
         └──> Publish monitoring events

3. Delayed/Scheduled Execution Path

For delayed or scheduled tasks:

Dispatcher
    │
    ▼
Scheduler.Schedule(task, executionTime)
    │
    ├──> Calculate delay
    ├──> Add to PriorityQueue (ordered by execution time)
    │
    └──> Wake up scheduler timer
         │
         ▼
    Scheduler background thread
         │
         ├──> Calculate next wake time
         ├──> Sleep until next task
         │
         └──> On wake: Dequeue ready tasks
              │
              └──> Dispatch to WorkerQueue

4. Recurring Task Cycle

Recurring tasks follow a continuous cycle:

Initial Dispatch
    │
    ▼
Scheduler schedules first execution
    │
    ▼
Task executes in WorkerExecutor
    │
    ▼
Handler.Handle() completes
    │
    ▼
WorkerExecutor checks if recurring
    │
    ├──> Calculate next execution time
    ├──> Increment run count
    ├──> Check MaxRuns/RunUntil
    │
    └──> Re-schedule with Scheduler
         │
         └──> (Cycle continues)

Efficient Task Processing

EverTask avoids polling entirely, using an event-driven approach for maximum efficiency.

BoundedQueue Architecture

We use System.Threading.Channels.BoundedChannel<T> for the worker queues:

// Conceptual implementation
var channel = Channel.CreateBounded<TaskHandlerExecutor>(new BoundedChannelOptions(capacity)
{
    FullMode = BoundedChannelFullMode.Wait // Block when full
});

// Writer (Dispatcher)
await channel.Writer.WriteAsync(taskExecutor);

// Reader (WorkerService)
await foreach (var task in channel.Reader.ReadAllAsync(cancellationToken))
{
    _ = ExecuteTaskAsync(task); // Fire-and-forget execution
}

Why channels? They give us zero polling overhead through OS-level signaling, built-in backpressure that blocks when full, and lock-free operation for better concurrency under load.

ConcurrentPriorityQueue

For scheduled tasks, we use a custom priority queue:

// Tasks ordered by execution time
public class ConcurrentPriorityQueue<T>
{
    private readonly SortedSet<(DateTimeOffset ExecutionTime, T Item)> _queue;
    private readonly SemaphoreSlim _semaphore;

    public void Enqueue(T item, DateTimeOffset executionTime)
    {
        lock (_lock)
        {
            _queue.Add((executionTime, item));
        }
        _semaphore.Release(); // Wake up scheduler
    }

    public bool TryPeek(out T item, out DateTimeOffset executionTime)
    {
        lock (_lock)
        {
            if (_queue.Count > 0)
            {
                var first = _queue.First();
                item = first.Item;
                executionTime = first.ExecutionTime;
                return true;
            }
        }
        item = default;
        executionTime = default;
        return false;
    }
}

The priority queue gives us O(log n) insert and remove operations, and we always know exactly when the next task needs to run. Locks are fine here since the scheduler isn’t on the hot path.

Scheduler Architecture

PeriodicTimerScheduler (v2.0+, Default)

The default high-performance scheduler:

public class PeriodicTimerScheduler : IScheduler
{
    private readonly ConcurrentPriorityQueue<QueuedTask> _queue;
    private readonly SemaphoreSlim _wakeSignal;
    private readonly CancellationTokenSource _cts;

    private async Task RunAsync()
    {
        while (!_cts.Token.IsCancellationRequested)
        {
            // Calculate delay to next task
            var delay = CalculateNextDelay();

            if (delay == Timeout.InfiniteTimeSpan)
            {
                // No tasks - wait for signal
                await _wakeSignal.WaitAsync(_cts.Token);
            }
            else
            {
                // Wait until next task OR new task scheduled
                await _wakeSignal.WaitAsync(delay, _cts.Token);
            }

            // Dequeue and dispatch ready tasks
            await DequeueReadyTasksAsync();
        }
    }

    public void Schedule(QueuedTask task, DateTimeOffset executionTime)
    {
        _queue.Enqueue(task, executionTime);
        _wakeSignal.Release(); // Wake up immediately
    }
}

The scheduler uses dynamic delays, sleeping only until the next task is due. When idle, it uses zero CPU. New tasks signal immediately for wake-up, resulting in minimal lock contention compared to the old timer-based approach.

Performance Comparison:

Scheduler	CPU Usage (Idle)	Lock Contention	Throughput
TimerScheduler (v1.x)	~0.5-1%	Moderate	~5-10k/sec
PeriodicTimerScheduler (v2.0+)	~0%	Low	~10-15k/sec
ShardedScheduler (v2.0+)	~0%	Very Low	~20-40k/sec

ShardedScheduler (v2.0+, Opt-in)

For extreme high-load scenarios (>10k Schedule() calls/sec):

public class ShardedScheduler : IScheduler
{
    private readonly PeriodicTimerScheduler[] _shards;
    private readonly int _shardCount;

    public ShardedScheduler(int shardCount, ...)
    {
        _shardCount = shardCount;
        _shards = new PeriodicTimerScheduler[shardCount];

        for (int i = 0; i < shardCount; i++)
        {
            _shards[i] = new PeriodicTimerScheduler(...);
        }
    }

    public void Schedule(QueuedTask task, DateTimeOffset executionTime)
    {
        // Hash-based distribution
        var shard = GetShard(task.PersistenceId);
        _shards[shard].Schedule(task, executionTime);
    }

    private int GetShard(Guid taskId)
    {
        return Math.Abs(taskId.GetHashCode()) % _shardCount;
    }
}

Sharding distributes the work across multiple independent schedulers, each operating in parallel. Lock contention gets divided by the shard count, and issues in one shard won’t affect others. You’ll see 2-4x throughput improvement for workloads exceeding 10k Schedule() calls per second.

The trade-off? You’ll use additional memory (around 300 bytes per shard) and additional threads (one per shard), plus debugging becomes slightly more complex.

Performance Optimizations

EverTask v2.0 includes several major performance improvements:

Reflection Caching (Dispatcher)

// v1.x: Reflection on every dispatch
var wrapperType = typeof(TaskHandlerWrapper<>).MakeGenericType(task.GetType());
var wrapper = (TaskHandlerWrapper)Activator.CreateInstance(wrapperType, task);

// v2.0: Compiled expression cache
private static readonly ConcurrentDictionary<Type, Func<IEverTask, TaskHandlerWrapper>> _wrapperCache = new();

var factory = _wrapperCache.GetOrAdd(task.GetType(), type =>
{
    var wrapperType = typeof(TaskHandlerWrapper<>).MakeGenericType(type);
    var ctor = wrapperType.GetConstructor(new[] { type });
    var param = Expression.Parameter(typeof(IEverTask));
    var newExpr = Expression.New(ctor, Expression.Convert(param, type));
    return Expression.Lambda<Func<IEverTask, TaskHandlerWrapper>>(newExpr, param).Compile();
});

var wrapper = factory(task);

We compile the reflection once and cache it. Repeated dispatches of the same task type went from 150μs to 10μs - a 93% improvement.

Lazy Serialization (Dispatcher)

// v1.x: Always serialize
var queuedTask = executor.ToQueuedTask();
if (storage != null)
{
    await storage.AddAsync(queuedTask);
}

// v2.0: Serialize only when needed
QueuedTask? queuedTask = null;

if (storage != null)
{
    queuedTask = executor.ToQueuedTask(); // Only when storage configured
    await storage.AddAsync(queuedTask);
}

If you’re running in-memory only, why serialize? Now we skip it entirely when storage isn’t configured.

Event Data Caching (Worker Executor)

// v1.x: Serialize on every event
var eventData = new EverTaskEventData(
    taskId,
    DateTimeOffset.UtcNow,
    severity,
    task.GetType().AssemblyQualifiedName,
    handler.GetType().AssemblyQualifiedName,
    JsonConvert.SerializeObject(task), // Every time!
    message,
    exception);

// v2.0: Cache serialized data
private static readonly ConditionalWeakTable<IEverTask, string> _taskJsonCache = new();
private static readonly ConcurrentDictionary<Type, string> _typeNameCache = new();

var taskJson = _taskJsonCache.GetValue(task, t => JsonConvert.SerializeObject(t));
var typeName = _typeNameCache.GetOrAdd(task.GetType(), t => t.AssemblyQualifiedName);

Monitoring events were triggering 60k-80k JSON serializations per 10k tasks. Now it’s down to around 10-20 - a 99% reduction.

Handler Options Caching (Worker Executor)

// v1.x: Cast and read on every execution
var timeout = (handler as EverTaskHandler<T>)?.Timeout;
var retryPolicy = (handler as EverTaskHandler<T>)?.RetryPolicy;

// v2.0: Cache per handler type
private static readonly ConcurrentDictionary<Type, HandlerOptionsCache> _optionsCache = new();

var options = _optionsCache.GetOrAdd(handler.GetType(), _ => new HandlerOptionsCache
{
    Timeout = (handler as dynamic)?.Timeout,
    RetryPolicy = (handler as dynamic)?.RetryPolicy
});

We were doing runtime casts on every execution. Now we cache handler options per type, cutting casts from 10k to around 100 per 10k executions.

DbContext Pooling (Storage, v2.0+)

// v1.x: DbContext per scope
builder.Services.AddDbContext<TaskStoreDbContext>();

// v2.0: DbContext factory with pooling
builder.Services.AddDbContextFactory<TaskStoreDbContext>(options =>
    options.UseSqlServer(connectionString));

DbContext pooling alone gives us 30-50% faster storage operations and reduces memory allocations significantly.

SQL Server Stored Procedures (v2.0+)

-- Atomic status update + audit insert
CREATE PROCEDURE [EverTask].[SetTaskStatus]
    @TaskId UNIQUEIDENTIFIER,
    @Status INT,
    @AuditMessage NVARCHAR(MAX)
AS
BEGIN
    BEGIN TRANSACTION

    UPDATE [EverTask].[QueuedTasks]
    SET [Status] = @Status
    WHERE [PersistenceId] = @TaskId

    INSERT INTO [EverTask].[TaskAudit] (...)
    VALUES (...)

    COMMIT TRANSACTION
END

Status updates used to require multiple roundtrips. The stored procedure cuts that in half.

Threading Model

Worker Service Thread

Each queue gets a single background thread that consumes from the BoundedQueue:

protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
    await foreach (var queue in _queueManager.GetAllQueues())
    {
        _ = ProcessQueueAsync(queue, stoppingToken); // Fire-and-forget per queue
    }
}

private async Task ProcessQueueAsync(WorkerQueue queue, CancellationToken ct)
{
    await foreach (var task in queue.ReadAllAsync(ct))
    {
        // Execute with configured parallelism
        await _semaphore.WaitAsync(ct);

        _ = Task.Run(async () =>
        {
            try
            {
                await ExecuteTaskAsync(task, ct);
            }
            finally
            {
                _semaphore.Release();
            }
        }, ct);
    }
}

Scheduler Threads

PeriodicTimerScheduler: 1 background thread
ShardedScheduler: N background threads (1 per shard)

Task Execution Threads

Tasks execute on the thread pool via Task.Run:

_ = Task.Run(async () =>
{
    using var scope = _serviceProvider.CreateScope();
    var handler = scope.ServiceProvider.GetRequiredService<IEverTaskHandler<T>>();

    await retryPolicy.Execute(async ct =>
    {
        await handler.Handle(task, ct);
    }, timeoutCts.Token);
}, cancellationToken);

We let the .NET thread pool handle load balancing and graceful degradation under pressure. It’s efficient and scales well naturally.

Design Principles

1. Async All The Way

// All APIs are async
Task<Guid> Dispatch(IEverTask task);
Task Handle(TTask task, CancellationToken cancellationToken);
ValueTask OnCompleted(Guid taskId);

Asynchronous APIs from top to bottom maximize scalability for I/O-bound workloads.

2. Zero Polling

// No polling loops like this:
while (true)
{
    var tasks = await storage.GetPendingTasksAsync();
    if (tasks.Any())
    {
        // Process
    }
    await Task.Delay(1000); // Wasteful!
}

// Instead: Event-driven
await foreach (var task in channel.Reader.ReadAllAsync())
{
    // Process immediately when available
}

Event-driven design reduces CPU usage dramatically and improves responsiveness.

3. Minimal Allocations

// Caching strategies:
- Compiled expression cache for reflection
- ConditionalWeakTable for task JSON
- ConcurrentDictionary for type metadata
- Object pooling for frequently created types

Less allocation means less GC pressure and better throughput overall.

4. Fail-Safe Defaults

// Auto-scaling defaults
MaxDegreeOfParallelism = Environment.ProcessorCount * 2 (min 4)
ChannelCapacity = Environment.ProcessorCount * 200 (min 1000)
RetryPolicy = LinearRetryPolicy(3, 500ms)

The defaults are tuned to work well out-of-the-box for most scenarios. You can tweak them, but you probably won’t need to.

5. Extensibility

// Implement your own:
- ITaskStorage (custom persistence)
- IRetryPolicy (custom retry logic)
- IScheduler (custom scheduling)
- ITaskStoreDbContextFactory (custom EF Core integration)

Extension points let you adapt EverTask to your specific requirements without forking the library.

Performance Characteristics

Throughput

Scenario	Throughput	Notes
In-memory, fire-and-forget	~50k-100k tasks/sec	Limited by CPU
SQL Server persistence	~5k-10k tasks/sec	Limited by database
Scheduled tasks (default)	~5k-10k Schedule()/sec	PeriodicTimerScheduler
Scheduled tasks (sharded)	~20k-40k Schedule()/sec	ShardedScheduler

Latency

Operation	Latency	Notes
Dispatch (in-memory)	~10-50μs	Reflection cached
Dispatch (SQL Server)	~1-5ms	Database write
Schedule	~10-50μs	Priority queue insert
Task execution start	<10ms	From dispatch to handler start

Memory

Component	Memory Usage	Notes
Base overhead	~1-2 MB	Core services
Per task (queued)	~500 bytes	TaskHandlerExecutor
Per task (scheduled)	~600 bytes	QueuedTask in priority queue
Per shard	~300 bytes	ShardedScheduler
Caches (total)	~5-10 KB	Expression, type, JSON caches

Next Steps

Configuration Reference - Tune performance settings
Scalability - Multi-queue and sharded scheduler
Getting Started - Setup guide
Resilience - Error handling and retries