How Verkada Detects and Responds to Cybersecurity Threats at Scale
Handling the security of over a million online devices requires a robust and innovative approach. At Verkada, our Detection and Response (DNR) team processes over 400 terabytes of data each month to detect and mitigate potential threats. Rather than entrusting this critical task to a third-party service, we've developed our own advanced security system — a custom data pipeline that allows for precise threat detection and immediate response while also being cost-efficient. This ensures we have complete control over our data and can react swiftly to any issues. Let's explore the details of this custom-built pipeline and how it helps keep Verkada and its customers safe.
From Alert to Action: Mitigating and Investigating Security Threats
Suppose we receive an alert for an unusual camera configuration change made by a Verkada engineer. Now, we need to:
Triage: Determine if the alert is a real attack or a false alarm. Was the change made by a Verkada engineer or an impersonator?
Mitigate: If it's an attack, contain the impact before damage spreads.
Investigate: Find out how the attack happened and who or what was affected.
Remediate: Repair affected systems and implement solutions to prevent recurrence.
Many companies leverage a Security Information and Event Management (SIEM) system to manage this workflow. SIEM solutions are frequently provided by third parties; while being convenient, external solutions are often costly at scale and introduce risks in handling sensitive security logs. Instead, our DNR team decided to build an in-house detection and response system to address these challenges. The pipeline collects expansive security logs, performs real-time analysis to detect and correlate threats across our infrastructure, and allows us to promptly respond to threats at a fraction of the cost. This enables us to protect Verkada and Verkada’s customers from potential cybersecurity risks.
Guiding Principles for our Pipeline Design
Before diving into the details of our detection and response system, let's review the guiding principles that shaped its design.
Alert Precision: With over a million Verkada devices deployed across various customer environments, our detection system must carefully identify genuine security threats while minimizing false positives. The sheer volume of data we process means that each alert's quality is crucial — our team needs to be able to distinguish between routine events and legitimate security concerns.
Scalability: The system ingests data from an ever-expanding range of products and services. Both new product lines and the evolution of existing products introduce unique data patterns and security concerns. The architecture must seamlessly accommodate this growth without requiring constant redesign.
Cost Efficiency: Security systems often face a trade-off between comprehensive coverage and resource efficiency. The system's design needs to optimize this balance by being selective about data collection and implementing efficient processing pipelines, ensuring strong threat visibility while controlling computation and storage costs.
Reliability: Missing events could lead to critical gaps when investigating a real security compromise affecting Verkada’s customers’ physical safety and privacy. To avoid this, the system must strive to maintain consistent uptime and performance. This means engineering for resilience against both unintentional failures and potential tampering attempts is required.
Maintainability: With a lean and focused detection and response team, every minute of maintenance represents a significant opportunity cost. This means designing with maintainability at the forefront is critical. Minimizing maintenance time frees our team to focus on more strategic priorities: scaling detection coverage, enriching logs with deeper contextual information, and continuously improving our incident response capabilities.
The Components of Our Pipeline
At the start of the project, the team made two foundational decisions in pursuit of the guidelines defined above:
Taking a serverless approach to scale for the ingestion of dozens of terabytes of log data daily and to reduce maintenance costs, only paying for the resources used.
Separating log sources into distinct pipelines to establish wide-scale reliability, simplify system management, and ensure extensibility.

The pipeline contains four logical components:
Log Ingestion: To monitor Verkada’s infrastructure effectively, our system processes logs from a wide range of sources: over a million IoT devices, multiple vendors (such as Github and Okta), a centralized backend, and a complex global corporate environment. To achieve this, we combine multiple log ingestion methods: Grove (Hashicorp’s log-pulling framework), log-pushing services, and an internal logging library.
Log Enrichment: Collected logs don't always contain all the details needed to triage an alert. Instead of depending solely on the log producer to provide complete data, the system leverages available context databases to enhance the logs using enrichments built with Brex’s Substation toolkit. The Substation toolkit can be used to add information not present in the original log, such as an engineer’s permissions or geolocation data from an IP address. By enriching logs with external context, we continuously make alerts more precise.
Log Analysis + Alert Processing: A fundamental priority is to manage detections as code. This approach offers significant benefits, including, but not limited to, version control, more flexible detection capabilities for engineers, robust detection testing, and the ability to roll back faulty detections quickly. Generated alerts are forwarded to separate destinations based on alert severity, which helps promptly triage and address the highest-priority alerts.
Logs Data Lake: A log storage and querying system is essential to investigate alerts and security incidents efficiently. Engineers can query the log source that generated an alert and query across other relevant log sources. Furthermore, engineers leverage our data lake to establish a clear representation of the event: what happened before the alert, what other systems were affected, and what follow-up actions occurred.
Scalable Security Monitoring and Streamlined Investigations
By building a scalable, serverless, security pipeline, Verkada now processes over 400TB of data monthly from 40+ distinct information sources, without the high costs and inefficiencies of traditional SIEM solutions. The system’s reliability and ease of use enable our security team to finish investigations in minutes.
We saw this increased efficiency in practice a few weeks ago. A Verkada engineer used many support tokens — an explicit grant of permission for Verkada to attain limited access to a customer organization — in a short timeframe. While support tokens are commonly used, we still alert on high-volume usage internally. In response to the engineer’s actions, our in-house solution generated an alert and added contextual information for easy triaging. The DNR team reviewed the scope of operations, confirmed they were legitimate, and spoke with the engineer to verify their use case. After understanding that the engineer was performing routine maintenance, the DNR team closed the investigation, proving our solution saved a significant amount of work.
With a focus on our data pipeline’s guiding principles, our system enables engineers, both in and outside the DNR team, to swiftly investigate security alerts and incidents, as well as debug product issues. Beyond swift investigation of security alerts and incidents, our in-house system's architecture — primarily its ability to handle immense data volumes and flexibly collect diverse log types — lays the groundwork for significant future improvements. As we think toward the future, our priorities include advancing our on-device threat monitoring capabilities to increase our visibility across our device fleet and improving our detection coverage with LLM-assisted agentic workflows.
Our solution is one of many examples of how Verkada integrates security into our technical infrastructure. In future posts, we’ll dive deeper into how we built components of this detection and response system.