AI Video Analytics System for Industrial Safety & Operations (Confidential)
BeSpokeAI Group (Uzbekistan) · Lead Engineer / ML + Systems (end-to-end) · Two large automotive manufacturing plants in Uzbekistan · Production (24/7), multi-site
Quick Start for Reviewers
Estimated time: 6 minutes. What to read first:
Terminology
- RTSP
- Real Time Streaming Protocol. A network control protocol for streaming media; used here to pull live video from IP cameras.
- NMS
- Non-Maximum Suppression. Post-processing step in object detection that removes overlapping duplicate boxes, keeping the highest-confidence detection per object.
- AP50
- Average Precision at IoU threshold 0.5. Standard object-detection metric; proportion of correct detections when overlap with ground truth ≥ 0.5.
- P95
- 95th percentile. In latency contexts: 95% of requests complete within this time (e.g. P95 end-to-end latency 2.63 s).
- MTTR
- Mean Time To Repair. Average time to restore service after a failure (e.g. 11 min in this deployment).
- RBAC
- Role-Based Access Control. Authorization model where access is granted according to user roles rather than individual identities.
- VLAN
- Virtual Local Area Network. Logical network segmentation; used here for dedicated routing of camera streams (details omitted).
1. Executive Summary
This document describes a production-grade AI video analytics system processing live RTSP streams across two automotive manufacturing plants. The system provides real-time detection, tracking, and rule-based action safety analytics with low-latency alerts. It operates under strict privacy constraints: no identity recognition, restricted retention, and encrypted storage. The main technical challenge addressed is reliable streaming ML under distribution shift.
2. Problem Definition
2.1 Industrial pain points
Industrial safety and operations depend on detecting small, high-frequency deviations: missing PPE (helmet, vest), unsafe proximity to vehicles, restricted zone violations, improper posture, and distraction. Manually monitoring dozens or hundreds of cameras is infeasible. Rule-based CCTV systems fail under shift changes, lighting variation, and camera-specific quirks.
2.2 System objectives
- Real-time awareness within seconds of an event.
- 24/7 stability with graceful degradation.
- Privacy-first: no face ID, minimization of retention.
- Auditability: explainable alert evidence.
3. Deployment Snapshot (Production)
Scale
Throughput & latency
Availability
3.1 Scale
- Sites: 2.
- Total cameras onboarded: 214 RTSP streams.
- Active concurrent streams (typical): 198.
- Frame sampling: 6 FPS per stream for analytics.
3.2 Throughput and latency
- Frames ingested per second: 1,188 FPS.
- Mean end-to-end alert latency: 1.42 s.
- P95 end-to-end latency: 2.63 s.
- Model inference latency per frame (mean, GPU): 18.7 ms.
- Tracking + rules latency per frame (mean): 6.9 ms.
3.3 Availability
- Platform uptime (last 90 days): 99.73%.
- MTTR: 11 min.
- Stream reconnection success rate: 99.2%.
4. System Architecture (High-level)
The pipeline decouples sampling from inference: ingest nodes pull RTSP, decode, and sample at 6 FPS; frames are dispatched to stateless inference workers. A temporal event engine evaluates rules over tracking state. Key design decisions: decouple sampling from inference; stateless inference workers; temporal event engine; observability mandatory.
5. Data Flow (Detailed)
Ingestion
RTSP pull, decode, sampling, health checks. Per-stream health metrics: heartbeat, bitrate, dropped frames, jitter, reconnect count, time since last valid frame.
Inference outputs
Boxes, classes, confidence; optional pose; no embeddings stored.
Tracking
Single-camera identity only. State includes position history, velocity, dwell time, zone intersections.
Event engine
Temporal constraints; e.g. RestrictedZoneEntry persists ≥0.8 s before firing.
6. ML Models (Production)
- Input resolution: 960×540.
- Batch size per GPU worker: 16.
- Confidence threshold: 0.41.
- NMS IoU: 0.58.
- Versioning includes: weights hash, preprocessing config, label map, evaluation report, deployment manifest.
7. Training Data & Labeling
- Total labeled images: 183,420.
- Total labeled video clips: 1,260.
- Total labeled bounding boxes: 2,946,110.
- Classes: 13.
- Double-annotation: 12,000 images.
- Agreement IoU≥0.5: 0.93.
- Spot-check: 320 random images/day (during active labeling).
- Split: Train 147,900; Val 17,320; Test 18,200.
- Split is camera-aware, no leakage.
8. Evaluation
8.1 Offline test metrics
- Person AP50: 0.961.
- Helmet AP50: 0.933.
- Vest AP50: 0.918.
- Forklift AP50: 0.947.
8.2 Online event-level metrics
- Period: 30 days.
- Reviewed alerts: 9,480.
- True positives: 8,221.
- False positives: 1,259.
- Event precision: 0.867.
- Event recall (estimated): 0.812.
- Mean investigation time saved per shift: 44 min.
Recall estimated via incident logs and sampled review.
9. Distribution Shift Handling
Drift signals monitored: confidence distribution shift, illumination histogram shift, tracking fragmentation rate, alert frequency anomalies per zone. Hard-case queue volume: 1,200 images/day. Selected for labeling: 220 images/day. Retraining cadence: minor 21 days; major 63 days.
10. Infrastructure (Production Stack)
10.1 Compute
- GPU inference servers: 8 nodes.
- GPUs total: 16 GPUs.
- CPU ingest + event nodes: 14 nodes.
- Total RAM: 1.5 TB.
- Total NVMe hot buffers: 92 TB.
10.2 Storage & retention
- Metadata retention: 365 days.
- Evidence retention: 30 days.
- Clip duration per alert: 12 seconds (8 s before + 4 s after).
- Avg evidence clip size: 7.8 MB.
- Evidence clips/day: 1,040.
- Evidence storage/day: 8.11 GB/day.
10.3 Networking
Stable RTSP pulling with dedicated VLAN routing (details omitted). TLS service-to-service auth.
11. Security & Privacy
- No facial recognition, no identity classification, no employee scoring.
- No cross-camera re-ID.
- Store events and short clips only.
- Audit logs of views.
- AES-256 at rest for clips; TLS in transit; RBAC; immutable event logs.
12. Reliability Engineering
- RTSP reconnect with jittered exponential backoff.
- Circuit breakers.
- Frame-drop handling with time windows.
- Observability metrics and alerts: GPU failure, false positive spikes, storage warnings.
13. UI & Workflow
Dashboard: camera grid, alert timeline, evidence playback, feedback (valid / false positive). Routing: Critical → Telegram + dashboard + email; Medium → dashboard + daily report; Low → dashboard only.
14. Outcomes (Operational impact)
Event-driven safety response with auditable events. Quantified snapshot (30 days): total alerts generated 31,204; high severity 3,118; alerts reviewed 9,480; verified true safety events 8,221.
15. My Contributions (End-to-end)
Owned streaming architecture, training pipeline, tracking and event logic, MLOps, observability, and privacy/security constraints.
16. What I'd Improve Next
- Causal alerting beyond rules with interpretability.
- Uncertainty-aware alerts.
- Camera-specific adaptation heads.
- Formal evaluation shift suites.
Appendix A — Event Catalog (Production)
- E01 Helmet missing.
- E02 Vest missing.
- E03 Restricted zone entry.
- E04 Forklift-person unsafe proximity.
- E05 Perimeter intrusion (after-hours).
- E06 Fall-risk posture anomaly (designated zones).
- E07 Blocked safety corridor / obstruction.
- E08 Unattended object in restricted corridor.
Contact for Confirmation & Verification
All facts and claims in this document may be formally confirmed through the following official contact. This person is the project captain and supervisor for my AI internship across multiple company-wide AI projects at BeSpokeAI Group.
- Role
- Founder & Chief Executive Officer
- Organization
- Bespoke AI – Applied Artificial Intelligence Solutions
- Email (official)
- ceo.abbos@bespokeaigroup.uz
- Phone
- +998 99 305 5022
Supervisor of my (Yusufbek Abdurakhimov) AI internship across multiple company-wide AI projects. All claims in this document may be verified through the above official channels.