What gets monitored

Four signal types come out of the SDK. Each is independent — if one fails (e.g. queue depth on an unsupported broker) the others keep working.

Task lifecycle

The SDK hooks four Celery signals and emits one event per signal:

task_prerun → task-started
task_postrun → task-succeeded (when state is SUCCESS)
task_failure → task-failed
task_retry → task-retried

Every event carries the task ID, task name, worker hostname, retry count, args/kwargs, and a timestamp. task-started additionally carries the queue. task-succeeded carries a runtime in seconds. task-failed carries the exception repr() and the traceback string; task-retried carries the same shape but the exception is rendered with str() on Celery's retry reason.

Celery reuses the same task_id across retry attempts. The SDK leans into that: every event for an attempt is tagged with a retries counter (0 on first attempt, 1 on second, etc.), which is how the task chain view groups events under "Attempt 1 / Attempt 2 / …" headings.

What you see in the dashboard

Tasks log — every event, filterable by state, task name substring, queue, worker, exception text.
Per-task breakdown — runs, fail rate, retry rate, average and p95 runtime grouped by task name. Useful for finding the task that quietly retries five times before succeeding.
Task detail / chain view — every event for a task ID, including args, kwargs, exception, and traceback.

Notes

Args and kwargs are captured by default and capped at 4 KB combined. See Payload size and PII for the truncation rules and capture_args=False opt-out.
The runtime field is set on success events only; on failure it's NULL. The per-task breakdown's average and p95 runtime columns skip nulls natively, so failures don't drag the percentile to zero.
Retry events are not counted as failures in the failure-rate column. A task that retries four times and eventually succeeds is a SUCCESS terminally; the retry rate is the separate signal.

Worker heartbeats

Celery's worker process emits a heartbeat_sent signal periodically. The SDK listens for it and forwards a worker-heartbeat event upstream, throttled to one every 30 seconds per worker process. The payload is the worker hostname and the list of queues the worker is consuming.

On the backend, heartbeat writes are an upsert keyed on (api_key, hostname) with GREATEST(existing, incoming) semantics on last_seen — so an out-of-order heartbeat (e.g. one that landed late from the SDK retry queue after a CR-side outage) can never push last_seen backward and fire a phantom worker_offline alert.

Worker name resolution

The hostname sent on heartbeats — and on every task event — is resolved fresh on each call:

CELERYRADAR_WORKER_NAME environment variable, if set and non-empty.
The worker_name= kwarg passed to connect().
Falling back to socket.gethostname().

In Kubernetes, ECS, or anywhere else where the host's name rotates on every restart, set CELERYRADAR_WORKER_NAME in your manifest to a stable per-deployment value. Otherwise every restart adds a new "worker" row to your dashboard and the previous one drifts into offline state.

Beat schedules

If you run Celery beat — either a dedicated beat process or beat embedded in a worker — the SDK monitors your scheduled tasks automatically. No extra configuration.

How it works

The SDK hooks two beat signals:

beat_init — fires when the beat process starts. The SDK reads the active scheduler's schedule dict and sends a schedule-register event for each entry, plus a schedule-snapshot event listing the full active set (so the dashboard can deactivate any entries that no longer exist).
before_task_publish — fires every time beat publishes a scheduled task. The SDK sends a beat-fired event so the dashboard knows that fire window was satisfied.

To pick up admin-side changes (a user adding a new entry in django-celery-beat, or changing a crontab in RedBeat) without a beat restart, the SDK wraps the scheduler's tick() method and re-syncs the schedule list every 30 seconds. So adding or deleting a beat entry while the beat process is running propagates within half a minute.

Supported schedulers

Celery's built-in PersistentScheduler ✓
django_celery_beat.schedulers.DatabaseScheduler ✓
celery_redbeat.RedBeatScheduler ✓

Schedule types

schedule(seconds=N) — interval schedules ✓
crontab(...) ✓
solar(...) — skipped with a warning log line
clocked(...) — skipped with a warning log line

Solar and clocked schedules don't fit the "expected next fire" abstraction the dashboard uses to detect missed runs. They'll be supported when the model adapts — for now, beat fires for those entries land in the task log but don't get a dedicated schedule row.

What you see in the dashboard

Schedules — one row per active beat entry with its status (on time / N missed (24h) / inactive) and last fire time.
Beat health panel on overview — late schedules sorted late-first.
beat_miss alert rule — fires when an expected beat fire window passes without a corresponding beat-fired event, beyond the schedule's grace period (default 5 minutes).

Queue depth

Queue depth monitoring is the only piece of the SDK that talks to your broker directly. Every 30 seconds it samples the depth of every declared queue with a single Redis pipeline and emits one queue-depth event per poll, batching all queues into a samples array.

Leader election

If you run multiple worker processes — and you almost certainly do — every one of them imports the SDK and spawns a queue depth poller. Without coordination, each would sample independently and you'd see N copies of every depth sample.

The SDK avoids this with a Redis-backed leader lock at the key celeryradar::queue-poll-lock. Pollers contend for the lock; the winner samples and ships, the losers sleep. The lock has a 60-second TTL and is refreshed every poll interval; if the leader crashes, the next contender takes over within a minute.

This means queue depth monitoring only works when at least one process can reach Redis with the broker's credentials — which it always can, because that's how Celery itself talks to the broker.

Broker support

Today: standard Redis list-mode brokers (redis:// or rediss:// URLs). Auto-detected from app.conf.broker_url; pass broker_url= to connect() if you need to override.

Not yet supported for queue depth (but tasks/workers/beat all still work):

RabbitMQ
SQS
Redis Sentinel, Cluster, or Streams transports

If your broker isn't supported, the queue depth charts will silently stay empty. You'll still see queue names on workers, in task events, and as alert rule targets.

What you see in the dashboard

Queues page — one card per queue with current depth, hourly min/max, and a sparkline.
queue_depth_threshold alert rule — fires when a queue's depth exceeds N for more than M seconds.

Troubleshooting

When something doesn't appear the way you expect, the most common causes and how to resolve each.

Reference

Configuration

All the kwargs and env vars in one place.