AI Video Analytics System for Industrial Safety & Operations (Confidential)

BeSpokeAI Group (Uzbekistan) · Lead Engineer / ML + Systems (end-to-end) · Two large automotive manufacturing plants in Uzbekistan · Production (24/7), multi-site

Quick Start for Reviewers

Estimated time: 6 minutes. What to read first:

Deployment Snapshot
System Architecture
Evaluation
Security & Privacy
My Contributions

Terminology

RTSP: Real Time Streaming Protocol. A network control protocol for streaming media; used here to pull live video from IP cameras.
NMS: Non-Maximum Suppression. Post-processing step in object detection that removes overlapping duplicate boxes, keeping the highest-confidence detection per object.
AP50: Average Precision at IoU threshold 0.5. Standard object-detection metric; proportion of correct detections when overlap with ground truth ≥ 0.5.
P95: 95th percentile. In latency contexts: 95% of requests complete within this time (e.g. P95 end-to-end latency 2.63 s).
MTTR: Mean Time To Repair. Average time to restore service after a failure (e.g. 11 min in this deployment).
RBAC: Role-Based Access Control. Authorization model where access is granted according to user roles rather than individual identities.
VLAN: Virtual Local Area Network. Logical network segmentation; used here for dedicated routing of camera streams (details omitted).

1. Executive Summary

This document describes a production-grade AI video analytics system processing live RTSP streams across two automotive manufacturing plants. The system provides real-time detection, tracking, and rule-based action safety analytics with low-latency alerts. It operates under strict privacy constraints: no identity recognition, restricted retention, and encrypted storage. The main technical challenge addressed is reliable streaming ML under distribution shift.

2. Problem Definition

2.1 Industrial pain points

Industrial safety and operations depend on detecting small, high-frequency deviations: missing PPE (helmet, vest), unsafe proximity to vehicles, restricted zone violations, improper posture, and distraction. Manually monitoring dozens or hundreds of cameras is infeasible. Rule-based CCTV systems fail under shift changes, lighting variation, and camera-specific quirks.

2.2 System objectives

Real-time awareness within seconds of an event.
24/7 stability with graceful degradation.
Privacy-first: no face ID, minimization of retention.
Auditability: explainable alert evidence.

3. Deployment Snapshot (Production)

Scale

Sites

Total cameras

214

Active streams (typical)

198

Frame sampling

6 FPS/stream

Throughput & latency

Frames ingested/s

1,188

Mean E2E alert latency

1.42 s

P95 E2E latency

2.63 s

Inference/frame (mean, GPU)

18.7 ms

Tracking + rules/frame (mean)

6.9 ms

Availability

Uptime (90 days)

99.73%

MTTR

11 min

Stream reconnect success

99.2%

3.1 Scale

Sites: 2.
Total cameras onboarded: 214 RTSP streams.
Active concurrent streams (typical): 198.
Frame sampling: 6 FPS per stream for analytics.

3.2 Throughput and latency

Frames ingested per second: 1,188 FPS.
Mean end-to-end alert latency: 1.42 s.
P95 end-to-end latency: 2.63 s.
Model inference latency per frame (mean, GPU): 18.7 ms.
Tracking + rules latency per frame (mean): 6.9 ms.

Latency & Throughput (production)

3.3 Availability

Platform uptime (last 90 days): 99.73%.
MTTR: 11 min.
Stream reconnection success rate: 99.2%.

4. System Architecture (High-level)

The pipeline decouples sampling from inference: ingest nodes pull RTSP, decode, and sample at 6 FPS; frames are dispatched to stateless inference workers. A temporal event engine evaluates rules over tracking state. Key design decisions: decouple sampling from inference; stateless inference workers; temporal event engine; observability mandatory.

[ RTSP Cameras ] → [ Ingest: pull, decode, 6 FPS ] → [ Inference workers ]
                                                                    ↓
[ Observability ] ← [ Event engine (temporal rules) ] ← [ Tracking: single-camera ]

5. Data Flow (Detailed)

Ingestion

RTSP pull, decode, sampling, health checks. Per-stream health metrics: heartbeat, bitrate, dropped frames, jitter, reconnect count, time since last valid frame.

Inference outputs

Boxes, classes, confidence; optional pose; no embeddings stored.

Tracking

Single-camera identity only. State includes position history, velocity, dwell time, zone intersections.

Event engine

Temporal constraints; e.g. RestrictedZoneEntry persists ≥0.8 s before firing.

6. ML Models (Production)

Input resolution: 960×540.
Batch size per GPU worker: 16.
Confidence threshold: 0.41.
NMS IoU: 0.58.
Versioning includes: weights hash, preprocessing config, label map, evaluation report, deployment manifest.

7. Training Data & Labeling

Total labeled images: 183,420.
Total labeled video clips: 1,260.
Total labeled bounding boxes: 2,946,110.
Classes: 13.
Double-annotation: 12,000 images.
Agreement IoU≥0.5: 0.93.
Spot-check: 320 random images/day (during active labeling).
Split: Train 147,900; Val 17,320; Test 18,200.
Split is camera-aware, no leakage.

8. Evaluation

8.1 Offline test metrics

Person AP50: 0.961.
Helmet AP50: 0.933.
Vest AP50: 0.918.
Forklift AP50: 0.947.

8.2 Online event-level metrics

Period: 30 days.
Reviewed alerts: 9,480.
True positives: 8,221.
False positives: 1,259.
Event precision: 0.867.
Event recall (estimated): 0.812.
Mean investigation time saved per shift: 44 min.

Recall estimated via incident logs and sampled review.

9. Distribution Shift Handling

Drift signals monitored: confidence distribution shift, illumination histogram shift, tracking fragmentation rate, alert frequency anomalies per zone. Hard-case queue volume: 1,200 images/day. Selected for labeling: 220 images/day. Retraining cadence: minor 21 days; major 63 days.

10. Infrastructure (Production Stack)

10.1 Compute

GPU inference servers: 8 nodes.
GPUs total: 16 GPUs.
CPU ingest + event nodes: 14 nodes.
Total RAM: 1.5 TB.
Total NVMe hot buffers: 92 TB.

10.2 Storage & retention

Metadata retention: 365 days.
Evidence retention: 30 days.
Clip duration per alert: 12 seconds (8 s before + 4 s after).
Avg evidence clip size: 7.8 MB.
Evidence clips/day: 1,040.
Evidence storage/day: 8.11 GB/day.

10.3 Networking

Stable RTSP pulling with dedicated VLAN routing (details omitted). TLS service-to-service auth.

11. Security & Privacy

No facial recognition, no identity classification, no employee scoring.
No cross-camera re-ID.
Store events and short clips only.
Audit logs of views.
AES-256 at rest for clips; TLS in transit; RBAC; immutable event logs.

12. Reliability Engineering

RTSP reconnect with jittered exponential backoff.
Circuit breakers.
Frame-drop handling with time windows.
Observability metrics and alerts: GPU failure, false positive spikes, storage warnings.

13. UI & Workflow

Dashboard: camera grid, alert timeline, evidence playback, feedback (valid / false positive). Routing: Critical → Telegram + dashboard + email; Medium → dashboard + daily report; Low → dashboard only.

14. Outcomes (Operational impact)

Event-driven safety response with auditable events. Quantified snapshot (30 days): total alerts generated 31,204; high severity 3,118; alerts reviewed 9,480; verified true safety events 8,221.

15. My Contributions (End-to-end)

Owned streaming architecture, training pipeline, tracking and event logic, MLOps, observability, and privacy/security constraints.

16. What I'd Improve Next

Causal alerting beyond rules with interpretability.
Uncertainty-aware alerts.
Camera-specific adaptation heads.
Formal evaluation shift suites.

Appendix A — Event Catalog (Production)

E01 Helmet missing.
E02 Vest missing.
E03 Restricted zone entry.
E04 Forklift-person unsafe proximity.
E05 Perimeter intrusion (after-hours).
E06 Fall-risk posture anomaly (designated zones).
E07 Blocked safety corridor / obstruction.
E08 Unattended object in restricted corridor.

Contact for Confirmation & Verification

All facts and claims in this document may be formally confirmed through the following official contact. This person is the project captain and supervisor for my AI internship across multiple company-wide AI projects at BeSpokeAI Group.

Role: Founder & Chief Executive Officer
Organization: Bespoke AI – Applied Artificial Intelligence Solutions
Email (official): ceo.abbos@bespokeaigroup.uz
Phone: +998 99 305 5022

Supervisor of my (Yusufbek Abdurakhimov) AI internship across multiple company-wide AI projects. All claims in this document may be verified through the above official channels.