Solstice Pro AIObservatory · 16544720
RA 21h 02m 11s DEC −03° 18′ 44″ Epoch J2026.4 Plate II.c Field Sky
Plate II.c · Mission Sky

A real-bogus classifier you can put in front of a broker.

Sky is the Solstice Pro AI mission for survey alert streams. The job: take the difference-image alerts produced by a time-domain survey and separate real astrophysical transients from the cosmic rays, subtraction residuals, and processing artefacts that dominate the raw stream. Headline numbers are honest about the regime: ZTF 2024-Jan replay set, M ≤ 21 alert depth, single-night operational target.

Real-bogus recall 97.4% @ 0.5% bogus contamination Cosmic-ray F1 0.962 (vs LACosmic 0.871) Latency 14 ms / alert · A100 Status Production at 1 broker · pilot at 2
RA 21h 02m DEC −03°  ·  Field 01 — Why this is hard

The contamination budget at scale

A time-domain survey at ZTF cadence produces of order one million alerts on a typical clear night. Of those, perhaps 1 in 10 000 corresponds to a transient that a human astronomer wants to follow up on. Any classifier between the survey pipeline and the human therefore lives in a hostile statistical regime: the cost of a false positive is small for any single alert, but the cost of a 5% false-positive rate across a night's stream is fifty thousand spurious follow-up candidates and a broker the community will rapidly stop using. Sky is built around the constraint that operational deployment requires a contamination rate well under 1%, ideally at 0.5% or lower.

RA 21h 18m DEC −02°  ·  Field 02 — Architecture

An end-to-end CNN with a cosmic-ray pre-step

Sky is two networks. The first is a U-Net cosmic-ray segmenter that produces a per-pixel cosmic-ray mask on the science image, the reference image, and the difference image before any other processing happens. Cosmic rays are the dominant single source of bogus alerts in any survey stream we have looked at; removing them at the pixel level before classification reshapes the downstream problem and pushes the real-bogus classifier into a regime where end-to-end learning works.

The second network is a ResNet-class real-bogus classifier reading 63×63 cutouts of (science, reference, difference, segmentation-mask) — a four-channel input. Output is a single calibrated probability, post-processed with isotonic regression on a held-out month of alerts. There is no hand-engineered feature engineering in this stack; we tried, twice, and the end-to-end CNN beat the engineered features on every operationally relevant slice once the cosmic-ray pre-step was in place.

Difference image cutout with a real transient and a cosmic ray
Plate II.c-01 ZTF g-band difference image · NOAO 2024-01 replay · top: real Type Ia at z ≈ 0.04 · bottom: cosmic ray on the same chip · the cosmic-ray pre-step segments the lower object out before the classifier ever sees it
RA 21h 36m DEC −01°  ·  Field 03 — Cosmic-ray segmentation

The pre-step that is the unsung half of the system

MethodPixel-level F1False-positive rateNote
LACosmic (default params)0.8712.1%The community standard. Misses faint events; over-flags PSF cores.
LACosmic (tuned per chip)0.9031.4%Hand-tuned by partner; not portable.
Sky cosmic-ray U-Net0.9620.4%Trained on 8 400 hand-labelled frames at Data.

Three partner pipelines have replaced LACosmic with Sky's cosmic-ray segmenter as a standalone module, independent of the real-bogus classifier above it; the segmenter is also available as a separate model release.

RA 21h 54m DEC −00°  ·  Field 04 — Real-bogus performance

97.4% recall at 0.5% bogus — and what that does not say

On the ZTF 2024-Jan replay set (167 000 labelled alerts), the operating point that gives 0.5% bogus contamination yields 97.4% recall on real transients. At 0.3% bogus contamination — the operating point one broker prefers — recall is 94.1%. At 1.0% bogus contamination — the regime most academic benchmarks publish in — recall is 99.0%, but that is not an operationally useful number on a million-alert night.

What this benchmark does not say: it is a single survey, on a single facility's reference images, at a single depth. Sky has not yet been validated on LSST-class data, partly because LSST is not yet producing operational alerts at that volume and partly because the difference-image properties at LSST depth will be different enough that we expect to retrain rather than fine-tune. We say this in writing because over-claiming on this metric is the single most common failure mode we see in the survey-classifier literature.

RA 22h 14m DEC −00°  ·  Field 05 — Latency and deployment

Inside the survey-night clock

The operational constraint is the survey-night clock: alerts must be classified within seconds of the difference image being produced, so that follow-up triggers can be issued before the field rotates out. Sky's combined latency (cosmic-ray segmenter + real-bogus classifier + isotonic calibration) is 14 ms per alert on an A100 and 22 ms on an L4. A typical survey-night peak of 2 000 alerts/second is well within budget on a single A100; the partner broker currently runs Sky on a four-A100 box for headroom and for the cosmic-ray segmenter to also serve nightly reductions.

Deployment is the same Docker pattern as the rest of the missions: deterministic CUDA build, pinned NVIDIA container revision, validation notebook reproducing the headline metric end-to-end in under thirty minutes on a single GPU.

If you run a survey alert broker

The honest first conversation is about your contamination budget and your latency budget. Email with the survey, the broker software in use, and the volume per night; we will reply with whether Sky fits or whether a different door is right.

Open contact →