Load Function
The load function is the core equation that decides which agent gets the resource next.
The Equation
| Term | What it is |
|---|---|
Qi |
Weighted queue depth -- sum of task.weight in agent i's queue |
Dmax_i |
Age of the oldest waiting task (measured in ticks) |
alpha |
Tradeoff: 0.0 = latency-first, 0.5 = throughput-first |
Both terms are normalized across all competing agents. An agent with Q=10 when everyone else has Q=10 scores the same as Q=1 when everyone else has Q=1. Relative load, not absolute.
The agent with the highest L(i) gets the resource next.
Ticks
A tick is one unit of work completed -- not wall clock time. Each release() increments the tick counter and ages all waiting tasks by 1.
- Under heavy load: ticks fire fast, priority shifts quickly
- Under low load: ticks fire slowly, priority barely shifts
- No contention: no scoring needed, immediate grant
Priority only shifts when there's actual contention. This is by design -- scheduling overhead is zero when the system isn't loaded.
Alpha
Alpha controls the tradeoff between latency (serve longest-waiting agents) and throughput (serve deepest-backlog agents).
| Setting | alpha | Behavior | Use when |
|---|---|---|---|
"latency" |
0.0 | Serve longest-waiting agents first | Webhooks, user-facing requests |
"balanced" |
0.25 | Default | Most workloads |
"throughput" |
0.5 | Serve deepest-backlog agents first | Batch processing, ETL |
from loco import AsyncLOCOScheduler
# Using the string API
scheduler = AsyncLOCOScheduler(agents, resource, optimize_for="balanced")
# Using the numeric API
scheduler = AsyncLOCOScheduler(agents, resource, alpha=0.25)
Adaptive Alpha
With auto_tune=True, the scheduler adjusts alpha automatically based on observed wait-time variance:
- High wait-time variance across agents: alpha nudges down (toward latency/fairness)
- Growing queue depths: alpha nudges up (toward throughput)
- Clamped to [0.0, 0.5] safe range
Task Weight
Task weight is a cost proxy set at submit time. The scheduler uses it for queue depth scoring.
| Model tier | Typical weight |
|---|---|
| haiku / gpt-4o-mini / gemini-flash | 1.0 |
| sonnet / gpt-4o / gemini-pro | 2.0--3.0 |
| opus / o1 | 5.0 |
With the convenience API:
With the full API:
Framework adapters compute weight automatically from model name and prompt length.
Scoring at Grant Time
A critical design choice: scores are computed when a resource slot becomes available (grant time), not when the agent first calls acquire() (request time).
This prevents priority inversion. An agent that has been waiting for 10 ticks will have a higher Dmax than one that just arrived, even if the newcomer has a deeper queue. The waiting agent's urgency grows over time.