A middle power that intends to know what is in its sky — over a port, a pipeline corridor, a flight park, or a farm — now has the tools to know it with engineering certainty, at consumer prices, in under a second. This briefing surveys the 2025–2026 state of the art in computer vision and sensing for monitoring a defined volume of airspace; describes the verification architecture that converts a probabilistic detection into a near-certain identification by requiring every condition in a configurable ledger to pass; reports realistic detection-to-action latency; maps the automated responses that are lawful for a private Canadian operator against those that are criminal; and offers a deliberately conservative capability forecast at one, two-and-a-half, and five years.
The central findings are three. First, the binding constraint is no longer compute or cost — a capable monitoring node is buildable for under CAD $1,500 — but the detection of small, distant targets, where published accuracy has plateaued. Second, certainty is an architecture, not a model: the systems that work do not trust a classifier, they corroborate it. Third, in Canada the line between lawful and criminal response is bright, statutory, and indifferent to good intentions.
§ I · The Sensor Problem
No single sensor can watch a sky. Each modality answers one question well and goes blind on the others, and the discipline of airspace awareness consists largely of knowing which blindness is being purchased at which price. The table below summarizes the field as deployed in 2025–2026; ranges are indicative, since vendor figures are rarely stated against a defined target and condition set, and should be read as such.[1]
| Modality | Answers | Indicative range | Blind to | Cost class |
|---|---|---|---|---|
| RGB camera | What it is; evidence-grade imagery | ~1–3 km (with zoom optics) | Night, fog, glare, sky-saturated backlight; narrow field of view | Consumer–prosumer |
| Thermal (LWIR, uncooled) | Presence in darkness and smoke | hundreds of m – several km, optics-dependent | Thermal crossover at dawn and dusk; obscured targets | Prosumer–industrial |
| Radar (micro-Doppler) | Range, bearing, elevation, in all weather; rotor-vs-bird discrimination | ~1–5 km for small UAS | Identity, make, operator | Industrial–defence |
| RF detection | Control link; locates drone and operator; reads Remote ID | to ~5 km | Autonomous, RF-silent, and fibre-guided aircraft; everything that does not emit | Prosumer–defence |
| Acoustic array | Propeller and motor signature beyond line of sight | ~300–500 m small UAS | Wind above ~5 m/s; urban noise; quiet and gliding aircraft | Consumer–industrial |
| Event camera | Microsecond-latency motion in extreme dynamic range | research-grade | Static objects; not yet a turnkey product | Prosumer R&D |
The empirical reference point for what cheap, distributed sensing achieves at national scale is Ukraine's Zvook acoustic network: detections reportedly reach the armed forces' Delta situational-awareness system within roughly twelve seconds, at a false-positive rate of about 1.6 per cent, from sensors costing on the order of US$500 each.[2] The lesson generalizes: coverage and corroboration, not exquisite individual sensors, are what produce reliable awareness.
The pairing logic follows from the blindness table. Radar and RF do the searching; an electro-optical/infrared camera does the confirming and produces the evidence. Commercial counter-UAS platforms — Dedrone (acquired by Axon), DroneShield, Fortem — are at bottom fusion engines that correlate detections across modalities and decline to escalate on a single sensor's word.[3] The standing rule across the industry is two-sensor corroboration before escalation.
One consequence deserves emphasis because it inverts the intuition that more sensors are always better. For targets that are large, slow, mostly within visual range, and electromagnetically silent — a hang glider or paraglider over a coastal British Columbia flight park is the canonical case — RF detection contributes nothing, acoustics contribute little, and a camera-first architecture (RGB plus thermal, with radar optional for range and weather) is both sufficient and the correct allocation of budget. The modality must be matched to the target, not to the catalogue.
§ II · Certainty Is an Architecture
A detector is a probability machine. It reports that a region of pixels resembles a class of object with some confidence, and on a good benchmark day that confidence is well calibrated; on a bad day a gull at altitude is a drone and a kite is an aircraft. The systems that work in production do not ask the detector to be certain. They make certainty structurally, by requiring an identification to pass every condition in a configurable ledger before any action fires. The pattern is general; the instantiation below is the flight-park case.
Every component of this ledger is available open-source, and the stack is stable enough to name. Detection: the YOLO family remains the deployable workhorse, with transformer detectors competitive at higher compute. On VisDrone — the standard aerial benchmark, whose objects average roughly 36 pixels — deployable models score in the high-30s to ~51 per cent mAP50 depending on architecture and weight class, and sliced inference (SAHI) materially improves small-object recall.[4] Tracking: ByteTrack (80.3 MOTA on MOT17) and BoT-SORT (80.5 MOTA, with camera-motion compensation and a stronger motion model) convert per-frame detections into persistent tracks with tunable persistence buffers.[5] Open-vocabulary detection — the ability to alert on a natural-language description without retraining — is real but young: Grounding DINO leads accuracy, YOLO-World leads speed, and sustaining useful frame rates on edge hardware remains an optimization exercise rather than a default.[6] The rules layer above all of this is ordinary application code.
It is worth stating plainly what the benchmarks say about the hard case, because this is where reputations in this field go to die. On Anti-UAV410, the standard thermal-infrared drone-tracking benchmark, state accuracy has sat in the low-to-mid 60s for three years — 67.7 per cent in 2023, with 2025 methods clustered between 64 and 67.[7] When the field built a benchmark that looks like the real world — CST Anti-UAV, 220 thermal sequences, more than 240,000 annotations, tiny targets in complex scenes, 20 state-of-the-art trackers evaluated — the best method achieved 35.92 per cent state accuracy, against 67.69 per cent on Anti-UAV410.[8] Performance on realistic small-and-distant targets is roughly half of headline performance. Any system design that does not assume losing and re-acquiring the track is designed for the benchmark, not the sky.
§ III · The Speed of Noticing
Latency in these systems decomposes into three regimes that are routinely conflated in marketing material: model inference, verification, and action.
Inference. A Jetson Orin Nano Super — NVIDIA's US$249 developer board, launched December 2024 at 67 INT8 TOPS — runs current YOLO models at 30–60+ frames per second under TensorRT; dedicated accelerators in the Hailo class run common detectors in under ten milliseconds.[9] Inference is, for practical purposes, solved at the edge.
Verification. The persistence requirement in the ledger is a deliberate latency purchase: holding all conditions across, say, ten to thirty frames costs a third of a second to a second and buys an order-of-magnitude reduction in false alarms. This is the correct trade for almost every civil application.
Action. A soft response — email, SMS, push notification, webhook, an API call into another system — adds network time and lands the full chain at under a second to a few seconds. A pan-tilt-zoom slew-to-cue, in which radar or RF hands a predicted position to a camera that mechanically slews and locks an auto-tracker, is dominated by motor time: hundreds of milliseconds to roughly two seconds on commodity hardware. Physical response — launching an interceptor aircraft — runs to seconds and minutes and belongs, as § IV explains, to a legal category most readers must not enter. Cloud-routed pipelines add hundreds of milliseconds to seconds and are appropriate for notification and archival, not for the verification loop, which belongs at the edge.
§ IV · Response and the Law
What may a private Canadian operator lawfully do when the ledger closes? The answer divides cleanly, and the dividing line is statutory.
Lawful and commercially routine
Notification in every form; camera handoff and autonomous optical tracking; spotlight or audible deterrent, provided no aircraft is endangered; timestamped evidence packages of clip, track, and metadata; and ingestion of Remote ID broadcasts as a corroborating ledger condition for cooperative aircraft. On this last point the regulatory ground is moving in the operator's favour: on 8 June 2026 Transport Canada published Notice of Proposed Amendment 2026-005 (CARAC NPA 06-2026: RPAS — Remote Identification, Community-Based Organizations, and Designated Airspace), proposing mandatory Remote ID for most drone operations on a performance basis — Broadcast or Network, both on the ASTM F3411 standard — with the comment window open to 9 September 2026.[10] In the United States, Remote ID has been mandatory and enforced since 16 March 2024. If the Canadian proposal proceeds, cooperative-aircraft identification becomes a broadcast to be received rather than an inference to be made, and the strongest single condition a ledger can carry.
Criminal for a civilian, regardless of intent
Radio-frequency jamming and spoofing are prohibited by the Radiocommunication Act: subsection 4(4) bars the installation, use, possession, manufacture, import, or sale of jammers, and paragraph 9(1)(b) bars interference with radiocommunication, with exemptions under section 14 confined to federal bodies such as the RCMP and the Department of National Defence and to narrow authorized pilots.[11] Taking over a drone's control software engages the unauthorized-computer-use provisions of the Criminal Code, sections 342.1 and 342.2. Physically downing a drone is interference with an aircraft under the Aeronautics Act and CARs Part IX. The interceptor systems that exist — Fortem's DroneHunter, which according to the company began first customer deliveries of its 5.0 version in January 2026 and was selected by the Pentagon's counter-UAS task force under the Replicator-2 initiative; Anduril's Anvil — are defence and government systems, full stop.[12] A Canadian civil operator's response ceiling is detection, tracking, evidence, and notification. That ceiling is set by Parliament, not by engineering.
Privacy is a design input, not an afterthought
Fixed cameras operated in the course of commercial activity engage PIPEDA and, in British Columbia, Alberta, and Québec, the provincial private-sector statutes. The obligations are settled: a reasonable purpose, visible signage, collection limited to what is necessary, a field of view that does not sweep neighbouring properties or public sidewalks, secured storage, scheduled retention and destruction, and a written policy.[13] Audio is the trap: recording private conversations without consent engages the Criminal Code's interception provisions, and the correct engineering answer is to disable microphones at the hardware level. A system pointed at the sky carries low privacy exposure by construction — one more argument, where targets permit, for the camera-first architecture aimed upward.
§ V · A Conservative Forecast
Forecasts in this field fail in a characteristic way: they extrapolate the hardware curve, which is real, onto the accuracy curve, which is not. The measured baseline for the hardware curve is the Jetson Orin Nano line, which moved from 40 INT8 TOPS at US$499 in March 2023 to 67 INT8 TOPS at US$249 in December 2024 — roughly a three-to-four-fold improvement in compute per dollar in under two years — while headline TOPS figures across the industry are increasingly inflated by precision-format changes (INT8 to INT4 to FP4) rather than by efficiency gains at constant precision.[14] The measured baseline for the accuracy curve is the benchmark plateau documented in § II. Holding both honestly:
| Horizon | Near-certain | Likely | Genuinely uncertain — do not plan on it |
|---|---|---|---|
| +1 yr mid-2027 |
Edge compute 15–25% cheaper per unit; a robust RGB+thermal node in the low four figures; YOLO + ByteTrack/BoT-SORT remains the production stack | Open-vocabulary detection usable at modest frame rates on edge boards for coarse alerting | Any material jump in small-and-distant-target accuracy |
| +2.5 yr end-2028 |
Continued hardware cost decline; thermal cores cheaper as 12 µm pitch standardizes | On-device vision-language models practical at the edge — "describe the thing to alert on" without retraining; sensor-fusion correlation software increasingly commoditized | Tiny-target tracking in clutter; design for re-acquisition, not for an unbroken track |
| +5 yr 2031 |
Multi-camera fusion nodes cheap and unremarkable; edge AI market substantially larger on any analyst definition | Natural-language-configurable monitoring as the normal interface; event-camera sensors entering fusion stacks for fast and backlit targets | Whether small-object detection at distance escapes its plateau at all. Treat a breakout as upside, never as the plan |
The asymmetry is the finding. Everything around the detector — compute, sensors, fusion, interface — is on a cost curve that favours the builder. The detector's performance on the hardest realistic case is not. Architecture must therefore carry what the model cannot: corroboration across sensors, persistence across frames, graceful loss and re-acquisition of track, and a verification ledger that converts an imperfect detector into a near-certain system.
§ VI · A Reference Architecture in Four Stages
For an operator proceeding from nothing to a deployed system, the staging below orders the spend by what each stage retires in risk. Each stage carries an explicit gate; the gate, not the calendar, authorizes the next stage.
| Stage | Build | Gate to advance |
|---|---|---|
| 0 · Prove < CAD $1,500 |
Single RGB camera; Jetson Orin Nano Super or Raspberry Pi 5 with a Hailo accelerator; YOLO + ByteTrack; the full verification ledger in application code; action = email and webhook | Reliable detection of the declared target at the required range, with fewer than one false alert per day |
| 1 · Harden | Add a 640×512 uncooled thermal camera for night, glare, and contrast failure; a PTZ for slew-to-track; BoT-SORT for motion robustness; evidence logging; Remote ID ingestion as a ledger condition | Two-sensor corroboration driving false positives to approximately zero across a full diurnal cycle |
| 2 · Extend | Small commercial radar for beyond-visual range and all-weather cueing; rigorous time synchronization and a single coordinate convention (WGS84 with a local ENU frame) adopted before, not after, fusion; optionally an open-vocabulary model for reconfigurable alerting | Detection range and weather robustness meeting the site's declared requirement |
| 3 · Respond | Soft responses only: alerting, autonomous optical tracking, deterrence that endangers no aircraft, evidence packages, operator dashboard. No jamming, no spoofing, no takeover, no kinetic capability — see § IV | Counsel's review of the response set against the current state of Canadian law |
Two engineering notes from the fusion literature merit standing inclusion because they are cheap to honour and expensive to retrofit. First, clock discipline: a two-hundred-millisecond drift between a radar plot and a camera frame is enough to spawn duplicate tracks and defeat corroboration; synchronize from the first commit. Second, coordinate discipline: one geodetic convention, declared once, carried everywhere. Most fusion failures in the field are bookkeeping failures.
§ VII · Sources and Notes
- Vendor detection-range claims are generally stated without a defined target, aspect, or condition set; international standardization of counter-UAS detection metrics is in progress at the IEC. All ranges in Table 1 should be de-rated accordingly.
- Figures for the Zvook acoustic detection network — approximately twelve seconds to appearance in the Delta system, a false-positive rate of about 1.6 per cent, and per-sensor cost on the order of US$500 — are as reported by the project's Air Force liaison in Western press coverage, 2023–2025, and are presented as reported rather than independently audited.
- On fusion architecture and two-sensor corroboration as the industry false-positive control: vendor technical documentation, Dedrone (Axon) DedroneTracker.AI; DroneShield DroneSentry; Fortem SkyDome; and the counter-UAS trade literature, 2024–2026.
- VisDrone benchmark results, 2024–2025 literature: deployable single-stage models in the high-30s mAP50 (e.g., improved YOLOv8n variants) to approximately 51 per cent mAP50 for heavier 2025 detection-transformer architectures; sliced-inference (SAHI) gains documented in Akyon et al., 2022, and successors.
- Zhang et al., ByteTrack: Multi-Object Tracking by Associating Every Detection Box, ECCV 2022 (80.3 MOTA, 77.3 IDF1, MOT17); Aharon et al., BoT-SORT: Robust Associations Multi-Pedestrian Tracking, 2022 (80.5 MOTA, 80.2 IDF1, MOT17).
- Grounding DINO 1.5 Edge (IDEA Research, 2024): 36.2 zero-shot AP on COCO with an edge-optimized backbone; Cheng et al., YOLO-World: Real-Time Open-Vocabulary Object Detection, CVPR 2024: 35.4 AP on LVIS at 52 fps on server-class hardware. Edge deployment of open-vocabulary models above ~10 fps remains an active optimization problem in the 2025 literature.
- Anti-UAV410 thermal tracking benchmark: Huang et al., IEEE TPAMI 2023 (SiamDT, 67.7% state accuracy), with 2025 methods reporting 64.0–67.0% — a three-year plateau in the band.
- Xie, B., Zhang, C., Wang, F., Liu, P., Lu, F., Chen, Z., & Hu, W. (2025). CST Anti-UAV: A Thermal Infrared Benchmark for Tiny UAV Tracking in Complex Scenes. ICCV 2025 Workshops; arXiv:2507.23473. 220 sequences, >240k annotations, 20 trackers evaluated; best method 35.92% state accuracy versus 67.69% on Anti-UAV410.
- NVIDIA Jetson Orin Nano Super Developer Kit, announced 17 December 2024: US$249, 67 INT8 TOPS, 102 GB/s memory bandwidth (NVIDIA developer blog and product documentation). Hailo-8 sub-10 ms detector latency per vendor documentation and the Frigate NVR community's published benchmarks.
- Transport Canada, Notice of Proposed Amendment 2026-005 (CARAC NPA 06-2026): RPAS — Remote Identification, Community-Based Organizations, and Designated Airspace, published 8 June 2026; comment period to 9 September 2026; performance-based Remote ID accepting Broadcast or Network paths on ASTM F3411. United States: 14 CFR Part 89, Remote ID enforcement from 16 March 2024.
- Radiocommunication Act, R.S.C. 1985, c. R-2, ss. 4(4), 9(1)(b), 14; Criminal Code, R.S.C. 1985, c. C-46, ss. 342.1, 342.2; Aeronautics Act, R.S.C. 1985, c. A-2; Canadian Aviation Regulations, Part IX, as amended by SOR/2025-70 (BVLOS and other operations framework, in force 4 November 2025).
- Fortem Technologies press release, Lindon, Utah, January 2026 (DroneHunter 5.0 first customer deliveries; selection under the Replicator-2 initiative) — company statements, reported as such. Anduril Industries, Anvil product documentation.
- Office of the Privacy Commissioner of Canada, Guidelines for Overt Video Surveillance in the Private Sector; PIPEDA Report of Findings #2010-008 (cameras capturing neighbouring units and a public sidewalk found non-compliant); Personal Information Protection Act (British Columbia), S.B.C. 2003, c. 63.
- Jetson Orin Nano Developer Kit, announced March 2023: US$499, 40 INT8 TOPS; Jetson Orin Nano Super Developer Kit, December 2024: US$249, 67 INT8 TOPS (NVIDIA product announcements). On the inflation of headline TOPS by precision-format migration (INT8 → INT4 → FP4) rather than efficiency gains at constant precision: vendor datasheet conventions across the edge-accelerator industry, 2024–2026.