SLO Error Budgets
Track wait-time SLO violations with a state machine. SLOBudget is an observability primitive -- it monitors completed tasks and tells you when your scheduling quality is degrading.
Quick Setup
from loco import SLOBudget, SLOState
slo = SLOBudget(
target_wait=20.0, # max acceptable wait time in ticks
window=100, # rolling observation window
)
How It Works
After each task completes, call slo.record() with the agent and task. The SLOBudget checks if the task's wait time (age) exceeded the target and tracks the violation rate over a rolling window.
from loco import AsyncLOCOScheduler
scheduler = AsyncLOCOScheduler(
agents, resource,
on_task_completed=lambda agent_id, task, _: slo.record(agent_id, task),
)
State Machine
The SLO state transitions based on what fraction of your error budget has been consumed:
| State | Default Threshold | Meaning |
|---|---|---|
HEALTHY |
violation rate < 75% | Error budget is mostly intact |
WARNING |
violation rate >= 75% | Error budget is burning fast |
CRITICAL |
violation rate >= 90% | Nearly out of error budget |
EXHAUSTED |
violation rate = 100% | Every task is violating the SLO |
States can improve as violations slide out of the rolling window.
Checking State
slo.state # SLOState.HEALTHY
slo.violation_rate # 0.12 (12% of tasks violated)
slo.budget_remaining # 0.88 (88% of error budget left)
slo.total_violations # 47 (lifetime, not just window)
slo.total_observations # 389
Custom Thresholds
slo = SLOBudget(
target_wait=10.0,
window=50,
warn=0.5, # WARNING at 50% violation rate
critical=0.8, # CRITICAL at 80%
)
Example: Alert on State Change
prev_state = SLOState.HEALTHY
def on_task_done(agent_id, task, _):
global prev_state
new_state = slo.record(agent_id, task)
if new_state != prev_state:
print(f"SLO state changed: {prev_state} -> {new_state}")
if new_state in (SLOState.CRITICAL, SLOState.EXHAUSTED):
send_alert(f"SLO degraded to {new_state}")
prev_state = new_state
scheduler = AsyncLOCOScheduler(
agents, resource,
on_task_completed=on_task_done,
)
Recovery
As good observations enter the rolling window and violations drop out, the state improves automatically:
# Window of 5, all violations -> EXHAUSTED
# Then 3 good observations push out 3 violations
# Window: [violation, violation, pass, pass, pass] -> 40% -> HEALTHY
Reset
Use this for daily/weekly SLO budget cycles.