AI Video Analytics System for Industrial Safety & Operations (Confidential)

BeSpokeAI Group (Uzbekistan) · Lead Engineer / ML + Systems (end-to-end) · Two large automotive manufacturing plants in Uzbekistan · Production (24/7), multi-site

Quick Start for Reviewers

Estimated time: 6 minutes. What to read first:

RTSP
Real Time Streaming Protocol. A network control protocol for streaming media; used here to pull live video from IP cameras.
NMS
Non-Maximum Suppression. Post-processing step in object detection that removes overlapping duplicate boxes, keeping the highest-confidence detection per object.
AP50
Average Precision at IoU threshold 0.5. Standard object-detection metric; proportion of correct detections when overlap with ground truth ≥ 0.5.
P95
95th percentile. In latency contexts: 95% of requests complete within this time (e.g. P95 end-to-end latency 2.63 s).
MTTR
Mean Time To Repair. Average time to restore service after a failure (e.g. 11 min in this deployment).
RBAC
Role-Based Access Control. Authorization model where access is granted according to user roles rather than individual identities.
VLAN
Virtual Local Area Network. Logical network segmentation; used here for dedicated routing of camera streams (details omitted).

1. Executive Summary

This document describes a production-grade AI video analytics system processing live RTSP streams across two automotive manufacturing plants. The system provides real-time detection, tracking, and rule-based action safety analytics with low-latency alerts. It operates under strict privacy constraints: no identity recognition, restricted retention, and encrypted storage. The main technical challenge addressed is reliable streaming ML under distribution shift.

2. Problem Definition

2.1 Industrial pain points

Industrial safety and operations depend on detecting small, high-frequency deviations: missing PPE (helmet, vest), unsafe proximity to vehicles, restricted zone violations, improper posture, and distraction. Manually monitoring dozens or hundreds of cameras is infeasible. Rule-based CCTV systems fail under shift changes, lighting variation, and camera-specific quirks.

2.2 System objectives

  • Real-time awareness within seconds of an event.
  • 24/7 stability with graceful degradation.
  • Privacy-first: no face ID, minimization of retention.
  • Auditability: explainable alert evidence.

3. Deployment Snapshot (Production)

Scale

Sites
2
Total cameras
214
Active streams (typical)
198
Frame sampling
6 FPS/stream

Throughput & latency

Frames ingested/s
1,188
Mean E2E alert latency
1.42 s
P95 E2E latency
2.63 s
Inference/frame (mean, GPU)
18.7 ms
Tracking + rules/frame (mean)
6.9 ms

Availability

Uptime (90 days)
99.73%
MTTR
11 min
Stream reconnect success
99.2%

3.1 Scale

  • Sites: 2.
  • Total cameras onboarded: 214 RTSP streams.
  • Active concurrent streams (typical): 198.
  • Frame sampling: 6 FPS per stream for analytics.

3.2 Throughput and latency

  • Frames ingested per second: 1,188 FPS.
  • Mean end-to-end alert latency: 1.42 s.
  • P95 end-to-end latency: 2.63 s.
  • Model inference latency per frame (mean, GPU): 18.7 ms.
  • Tracking + rules latency per frame (mean): 6.9 ms.
Latency & Throughput (production)
latency (s)0 — 3 sE2E mean 1.42 sE2E P95 2.63 sInference: 18.7 ms · Tracking+rules: 6.9 ms · 1,188 FPS

3.3 Availability

  • Platform uptime (last 90 days): 99.73%.
  • MTTR: 11 min.
  • Stream reconnection success rate: 99.2%.

4. System Architecture (High-level)

The pipeline decouples sampling from inference: ingest nodes pull RTSP, decode, and sample at 6 FPS; frames are dispatched to stateless inference workers. A temporal event engine evaluates rules over tracking state. Key design decisions: decouple sampling from inference; stateless inference workers; temporal event engine; observability mandatory.

Pipeline: RTSP ingest, decode, sample; stateless inference workers; temporal event engine.
RTSPCamerasIngestpull, decode, 6 FPSInferencestateless workersTrackingsingle-cameraEvent enginetemporal rulesObservability

5. Data Flow (Detailed)

Ingestion

RTSP pull, decode, sampling, health checks. Per-stream health metrics: heartbeat, bitrate, dropped frames, jitter, reconnect count, time since last valid frame.

Inference outputs

Boxes, classes, confidence; optional pose; no embeddings stored.

Tracking

Single-camera identity only. State includes position history, velocity, dwell time, zone intersections.

Event engine

Temporal constraints; e.g. RestrictedZoneEntry persists ≥0.8 s before firing.

6. ML Models (Production)

  • Input resolution: 960×540.
  • Batch size per GPU worker: 16.
  • Confidence threshold: 0.41.
  • NMS IoU: 0.58.
  • Versioning includes: weights hash, preprocessing config, label map, evaluation report, deployment manifest.

7. Training Data & Labeling

  • Total labeled images: 183,420.
  • Total labeled video clips: 1,260.
  • Total labeled bounding boxes: 2,946,110.
  • Classes: 13.
  • Double-annotation: 12,000 images.
  • Agreement IoU≥0.5: 0.93.
  • Spot-check: 320 random images/day (during active labeling).
  • Split: Train 147,900; Val 17,320; Test 18,200.
  • Split is camera-aware, no leakage.

8. Evaluation

8.1 Offline test metrics

  • Person AP50: 0.961.
  • Helmet AP50: 0.933.
  • Vest AP50: 0.918.
  • Forklift AP50: 0.947.

8.2 Online event-level metrics

  • Period: 30 days.
  • Reviewed alerts: 9,480.
  • True positives: 8,221.
  • False positives: 1,259.
  • Event precision: 0.867.
  • Event recall (estimated): 0.812.
  • Mean investigation time saved per shift: 44 min.

Recall estimated via incident logs and sampled review.

9. Distribution Shift Handling

Drift signals monitored: confidence distribution shift, illumination histogram shift, tracking fragmentation rate, alert frequency anomalies per zone. Hard-case queue volume: 1,200 images/day. Selected for labeling: 220 images/day. Retraining cadence: minor 21 days; major 63 days.

10. Infrastructure (Production Stack)

10.1 Compute

  • GPU inference servers: 8 nodes.
  • GPUs total: 16 GPUs.
  • CPU ingest + event nodes: 14 nodes.
  • Total RAM: 1.5 TB.
  • Total NVMe hot buffers: 92 TB.

10.2 Storage & retention

  • Metadata retention: 365 days.
  • Evidence retention: 30 days.
  • Clip duration per alert: 12 seconds (8 s before + 4 s after).
  • Avg evidence clip size: 7.8 MB.
  • Evidence clips/day: 1,040.
  • Evidence storage/day: 8.11 GB/day.

10.3 Networking

Stable RTSP pulling with dedicated VLAN routing (details omitted). TLS service-to-service auth.

11. Security & Privacy

  • No facial recognition, no identity classification, no employee scoring.
  • No cross-camera re-ID.
  • Store events and short clips only.
  • Audit logs of views.
  • AES-256 at rest for clips; TLS in transit; RBAC; immutable event logs.

12. Reliability Engineering

  • RTSP reconnect with jittered exponential backoff.
  • Circuit breakers.
  • Frame-drop handling with time windows.
  • Observability metrics and alerts: GPU failure, false positive spikes, storage warnings.

13. UI & Workflow

Dashboard: camera grid, alert timeline, evidence playback, feedback (valid / false positive). Routing: Critical → Telegram + dashboard + email; Medium → dashboard + daily report; Low → dashboard only.

14. Outcomes (Operational impact)

Event-driven safety response with auditable events. Quantified snapshot (30 days): total alerts generated 31,204; high severity 3,118; alerts reviewed 9,480; verified true safety events 8,221.

15. My Contributions (End-to-end)

Owned streaming architecture, training pipeline, tracking and event logic, MLOps, observability, and privacy/security constraints.

16. What I'd Improve Next

  • Causal alerting beyond rules with interpretability.
  • Uncertainty-aware alerts.
  • Camera-specific adaptation heads.
  • Formal evaluation shift suites.

Appendix A — Event Catalog (Production)

  • E01 Helmet missing.
  • E02 Vest missing.
  • E03 Restricted zone entry.
  • E04 Forklift-person unsafe proximity.
  • E05 Perimeter intrusion (after-hours).
  • E06 Fall-risk posture anomaly (designated zones).
  • E07 Blocked safety corridor / obstruction.
  • E08 Unattended object in restricted corridor.

Contact for Confirmation & Verification

All facts and claims in this document may be formally confirmed through the following official contact. This person is the project captain and supervisor for my AI internship across multiple company-wide AI projects at BeSpokeAI Group.

Role
Founder & Chief Executive Officer
Organization
Bespoke AI – Applied Artificial Intelligence Solutions
Phone
+998 99 305 5022

Supervisor of my (Yusufbek Abdurakhimov) AI internship across multiple company-wide AI projects. All claims in this document may be verified through the above official channels.

Confidential — for private academic review only.