What gets monitored
Four signal types come out of the SDK. Each is independent — if one fails (e.g. queue depth on an unsupported broker) the others keep working.
Task lifecycle
The SDK hooks four Celery signals and emits one event per signal:
task_prerun→ task-startedtask_postrun→ task-succeeded (when state isSUCCESS)task_failure→ task-failedtask_retry→ task-retried
Every event carries the task ID, task name, worker hostname, retry count, args/kwargs, and a timestamp. task-started additionally carries the queue. task-succeeded carries a runtime in seconds. task-failed carries the exception repr() and the traceback string; task-retried carries the same shape but the exception is rendered with str() on Celery's retry reason.
Celery reuses the same task_id across retry attempts. The SDK leans into that: every event for an attempt is tagged with a retries counter (0 on first attempt, 1 on second, etc.), which is how the task chain view groups events under "Attempt 1 / Attempt 2 / …" headings.
What you see in the dashboard
- Tasks log — every event, filterable by state, task name substring, queue, worker, exception text.
- Per-task breakdown — runs, fail rate, retry rate, average and p95 runtime grouped by task name. Useful for finding the task that quietly retries five times before succeeding.
- Task detail / chain view — every event for a task ID, including args, kwargs, exception, and traceback.
Notes
- Args and kwargs are captured by default and capped at 4 KB combined. See Payload size and PII for the truncation rules and
capture_args=Falseopt-out. - The
runtimefield is set on success events only; on failure it'sNULL. The per-task breakdown's average and p95 runtime columns skip nulls natively, so failures don't drag the percentile to zero. - Retry events are not counted as failures in the failure-rate column. A task that retries four times and eventually succeeds is a SUCCESS terminally; the retry rate is the separate signal.
Worker heartbeats
Celery's worker process emits a heartbeat_sent signal periodically. The SDK listens for it and forwards a worker-heartbeat event upstream, throttled to one every 30 seconds per worker process. The payload is the worker hostname and the list of queues the worker is consuming.
On the backend, heartbeat writes are an upsert keyed on (api_key, hostname) with GREATEST(existing, incoming) semantics on last_seen — so an out-of-order heartbeat (e.g. one that landed late from the SDK retry queue after a CR-side outage) can never push last_seen backward and fire a phantom worker_offline alert.
Worker name resolution
The hostname sent on heartbeats — and on every task event — is resolved fresh on each call:
CELERYRADAR_WORKER_NAMEenvironment variable, if set and non-empty.- The
worker_name=kwarg passed toconnect(). - Falling back to
socket.gethostname().
In Kubernetes, ECS, or anywhere else where the host's name rotates on every restart, set CELERYRADAR_WORKER_NAME in your manifest to a stable per-deployment value. Otherwise every restart adds a new "worker" row to your dashboard and the previous one drifts into offline state.
Beat schedules
If you run Celery beat — either a dedicated beat process or beat embedded in a worker — the SDK monitors your scheduled tasks automatically. No extra configuration.
How it works
The SDK hooks two beat signals:
beat_init— fires when the beat process starts. The SDK reads the active scheduler'sscheduledict and sends a schedule-register event for each entry, plus a schedule-snapshot event listing the full active set (so the dashboard can deactivate any entries that no longer exist).before_task_publish— fires every time beat publishes a scheduled task. The SDK sends a beat-fired event so the dashboard knows that fire window was satisfied.
To pick up admin-side changes (a user adding a new entry in django-celery-beat, or changing a crontab in RedBeat) without a beat restart, the SDK wraps the scheduler's tick() method and re-syncs the schedule list every 30 seconds. So adding or deleting a beat entry while the beat process is running propagates within half a minute.
Supported schedulers
- Celery's built-in
PersistentScheduler✓ django_celery_beat.schedulers.DatabaseScheduler✓celery_redbeat.RedBeatScheduler✓
Schedule types
schedule(seconds=N)— interval schedules ✓crontab(...)✓solar(...)— skipped with a warning log lineclocked(...)— skipped with a warning log line
Solar and clocked schedules don't fit the "expected next fire" abstraction the dashboard uses to detect missed runs. They'll be supported when the model adapts — for now, beat fires for those entries land in the task log but don't get a dedicated schedule row.
What you see in the dashboard
- Schedules — one row per active beat entry with its status (
on time/N missed (24h)/inactive) and last fire time. - Beat health panel on overview — late schedules sorted late-first.
beat_missalert rule — fires when an expected beat fire window passes without a correspondingbeat-firedevent, beyond the schedule's grace period (default 5 minutes).
Queue depth
Queue depth monitoring is the only piece of the SDK that talks to your broker directly. Every 30 seconds it samples the depth of every declared queue with a single Redis pipeline and emits one queue-depth event per poll, batching all queues into a samples array.
Leader election
If you run multiple worker processes — and you almost certainly do — every one of them imports the SDK and spawns a queue depth poller. Without coordination, each would sample independently and you'd see N copies of every depth sample.
The SDK avoids this with a Redis-backed leader lock at the key celeryradar::queue-poll-lock. Pollers contend for the lock; the winner samples and ships, the losers sleep. The lock has a 60-second TTL and is refreshed every poll interval; if the leader crashes, the next contender takes over within a minute.
This means queue depth monitoring only works when at least one process can reach Redis with the broker's credentials — which it always can, because that's how Celery itself talks to the broker.
Broker support
Today: standard Redis list-mode brokers (redis:// or rediss:// URLs). Auto-detected from app.conf.broker_url; pass broker_url= to connect() if you need to override.
Not yet supported for queue depth (but tasks/workers/beat all still work):
- RabbitMQ
- SQS
- Redis Sentinel, Cluster, or Streams transports
If your broker isn't supported, the queue depth charts will silently stay empty. You'll still see queue names on workers, in task events, and as alert rule targets.
What you see in the dashboard
- Queues page — one card per queue with current depth, hourly min/max, and a sparkline.
queue_depth_thresholdalert rule — fires when a queue's depth exceeds N for more than M seconds.