The contamination budget at scale
A time-domain survey at ZTF cadence produces of order one million alerts on a typical clear night. Of those, perhaps 1 in 10 000 corresponds to a transient that a human astronomer wants to follow up on. Any classifier between the survey pipeline and the human therefore lives in a hostile statistical regime: the cost of a false positive is small for any single alert, but the cost of a 5% false-positive rate across a night's stream is fifty thousand spurious follow-up candidates and a broker the community will rapidly stop using. Sky is built around the constraint that operational deployment requires a contamination rate well under 1%, ideally at 0.5% or lower.
An end-to-end CNN with a cosmic-ray pre-step
Sky is two networks. The first is a U-Net cosmic-ray segmenter that produces a per-pixel cosmic-ray mask on the science image, the reference image, and the difference image before any other processing happens. Cosmic rays are the dominant single source of bogus alerts in any survey stream we have looked at; removing them at the pixel level before classification reshapes the downstream problem and pushes the real-bogus classifier into a regime where end-to-end learning works.
The second network is a ResNet-class real-bogus classifier reading 63×63 cutouts of (science, reference, difference, segmentation-mask) — a four-channel input. Output is a single calibrated probability, post-processed with isotonic regression on a held-out month of alerts. There is no hand-engineered feature engineering in this stack; we tried, twice, and the end-to-end CNN beat the engineered features on every operationally relevant slice once the cosmic-ray pre-step was in place.
The pre-step that is the unsung half of the system
| Method | Pixel-level F1 | False-positive rate | Note |
|---|---|---|---|
| LACosmic (default params) | 0.871 | 2.1% | The community standard. Misses faint events; over-flags PSF cores. |
| LACosmic (tuned per chip) | 0.903 | 1.4% | Hand-tuned by partner; not portable. |
| Sky cosmic-ray U-Net | 0.962 | 0.4% | Trained on 8 400 hand-labelled frames at Data. |
Three partner pipelines have replaced LACosmic with Sky's cosmic-ray segmenter as a standalone module, independent of the real-bogus classifier above it; the segmenter is also available as a separate model release.
97.4% recall at 0.5% bogus — and what that does not say
On the ZTF 2024-Jan replay set (167 000 labelled alerts), the operating point that gives 0.5% bogus contamination yields 97.4% recall on real transients. At 0.3% bogus contamination — the operating point one broker prefers — recall is 94.1%. At 1.0% bogus contamination — the regime most academic benchmarks publish in — recall is 99.0%, but that is not an operationally useful number on a million-alert night.
What this benchmark does not say: it is a single survey, on a single facility's reference images, at a single depth. Sky has not yet been validated on LSST-class data, partly because LSST is not yet producing operational alerts at that volume and partly because the difference-image properties at LSST depth will be different enough that we expect to retrain rather than fine-tune. We say this in writing because over-claiming on this metric is the single most common failure mode we see in the survey-classifier literature.
Inside the survey-night clock
The operational constraint is the survey-night clock: alerts must be classified within seconds of the difference image being produced, so that follow-up triggers can be issued before the field rotates out. Sky's combined latency (cosmic-ray segmenter + real-bogus classifier + isotonic calibration) is 14 ms per alert on an A100 and 22 ms on an L4. A typical survey-night peak of 2 000 alerts/second is well within budget on a single A100; the partner broker currently runs Sky on a four-A100 box for headroom and for the cosmic-ray segmenter to also serve nightly reductions.
Deployment is the same Docker pattern as the rest of the missions: deterministic CUDA build, pinned NVIDIA container revision, validation notebook reproducing the headline metric end-to-end in under thirty minutes on a single GPU.
If you run a survey alert broker
The honest first conversation is about your contamination budget and your latency budget. Email with the survey, the broker software in use, and the volume per night; we will reply with whether Sky fits or whether a different door is right.
Open contact →