● Engineering reference, for security engineers, SOC analysts, and detection-engineering teams.
Engineering site How it works Technology (Device DNA)
Technology

Device DNA™: a structured switch-derived signal set, pattern-matched against 750 million+ devices, no model in the path.

The patented core of how this works. The External Scan Engine (ESE) polls the customer's managed switches, collects a structured signal set per device, and resolves each device's identity by deterministic pattern-match against CybrIQ's reference database of 750 million-plus devices and growing. The output is a deterministic Device DNA signature: same inputs always produce the same signature, no ML in the decision path, no traffic capture, no endpoint agent for the inventory path. (A separate small optional agent handles USB-threat detection on workstations; covered on the detection page.)

The edge over the alternatives lives in two specifics: the structured signal set the ESE collects, and the 750M-device reference database that resolves those signals into identities. Both are described below at the level your diligence team needs without naming the exact extraction methods. The collection technique is trade-secret; the architectural shape and guarantees are not.

How a port becomes an identity

The flow at one port, per polling cycle. The same flow runs continuously across every port on every in-scope switch.

1. PORT device drawing a link on a managed switch 2. ESE COLLECTS structured signal set per device (proprietary) 3. 750M+ DB LOOKUP deterministic match vendor + model + NDAA status 4. DNA sha256( signals, identity) Deterministic end to end. Same inputs always produce the same Device DNA. No model in the path. No training. No inference.

What the ESE collects

The External Scan Engine reaches every switch in scope (one ESE handles up to 500 switches) and extracts a structured set of signals per device, refreshed continuously. The signal set is engineered to be the smallest collection that uniquely resolves a physical device against the reference database. I'm not going to enumerate the specific fields, since that's the part competitors would copy; the architectural properties are what your team needs.

Properties of the signal set

  • Switch-derived. Pulled from the managed switches themselves, not from traffic capture. No SPAN, no mirror, no inline tap, no endpoint agent for inventory. The ESE talks to the switch management plane in read-only mode.
  • Structured and bounded. A fixed schema per device. Not arbitrary telemetry; a deliberate set chosen so the database lookup is unambiguous.
  • Cross-vendor. Works across the major switch vendors in enterprise deployments. The collection method abstracts the vendor differences.
  • Refreshed continuously. The ESE polls each switch on an ongoing cycle, so a device change on a port surfaces in near-real-time, not on a nightly batch.
  • Read-only. Nothing the ESE does modifies switch state, alters traffic, or injects any packet. Worst-case behavior is "stops polling" and the security blast radius is "we go blind," not "we caused an outage."

Why this works when wire-side capture doesn't scale

Traditional Layer 1 visibility approaches assume you can put something in line with traffic, or hang a sensor off a SPAN port. Both work at small scale and both break at enterprise scale: SPAN is expensive in switch CPU, mirror ports are a finite resource, and putting any in-line device into the production path is a change-management conversation nobody wants. Switch-derived collection sidesteps that. Every signal in the set is data the switch is already producing for its own operation; we just read it.

The tradeoff is that we don't see anything the switch doesn't see. That's a real constraint, and it's the reason the rest of the design (the structured signal set, the 750M-device reference database) has to do the heavy lifting on identity resolution.

The 750 million-device reference database

The other half of the system. The reference database carries 750 million-plus device fingerprints and grows continuously. When the ESE collects a signal set for a port, that set gets matched against the database to resolve the device's identity, vendor, model, and any associated metadata (NDAA status, EOL flags, firmware-family lineage). The match is deterministic: same input set produces the same identity, every time.

How the database scales

The database is the differentiator on the install-base side. Pattern-matching with a million-device reference is a competent product; pattern-matching with 750 million plus is what makes the long tail of weird devices in real environments resolvable. The most common reaction from a customer the first week of pilot is "you identified things our other tools have been calling 'unknown' for years."

Why a database lookup, not a model

A reference-database match has properties an ML model doesn't. The answer is auditable end-to-end: "we collected these signals, looked them up in the database, found this device record, here's the record's hash." The answer is reproducible: anyone running the same lookup gets the same identity. The answer is also defensible against the AI-attack landscape on the AI threats page; there's no model to evade, no training data to poison, no embedding to invert.

How the database grows

New device fingerprints get added through a curated review process. Every entry is human-reviewed before it lands in the production database, and curated updates ship to all tenants twice a week. The growth rate is a function of customer install-base diversity and CybrIQ's own ongoing collection work; in practice, the database gains tens of thousands of entries per month, weighted toward devices customers are actually deploying.

Can a device's signature be spoofed?

The question every security engineer asks, and the right one to ask. The short answer: spoofing is theoretically possible and practically much harder than spoofing the layers above.

Spoofing the Device DNA would require an attacker to engineer a device whose switch-derived signal set matches the legitimate device's so closely that the 750M+ reference-database lookup resolves to the same identity. That's a hardware-fabrication problem, not a software problem. Spoofing a MAC address, a TLS certificate, an 802.1X cert, or an EDR agent ID is software-attack territory and is done routinely with public tools. The switch-derived signal set isn't.

If an attacker does manage a partial match, the similarity score surfaces it as anomalous: a near-match against the prior signature on the same port still produces an event, just with a lower confidence flag. A successful clean spoof would have to match across every dimension of the signal set on the first cycle and hold that match across every subsequent cycle. We don't claim this is impossible. We claim it's several orders of magnitude harder than any attack the higher layers see.

For the threat-model framing of this same answer, see threat-model; for the AI-attack-resistance side, see AI threats.

How the signature is computed

Deterministic, auditable. The signal set is hashed; the matched database identity is hashed; the pair gets combined into the final Device DNA string. Two devices with identical signals on the same database snapshot produce identical signatures. A device whose signals shift outside expected drift produces a different signature on the next cycle, which is the substitution event.

// derivation, expressed in pseudocode
signal_hash    = sha256(canonical_serialize(collected_signals))
identity_hash  = sha256(canonical_serialize(database_lookup(collected_signals)))

device_dna     = "dna:" + base32(sha256(signal_hash || identity_hash))[:16]

// drift detection compares signature pairs across cycles
similarity_to_previous = jaccard(current_signal_set, previous_signal_set)

The signature is deterministic. The similarity score is what distinguishes "same signature plus expected drift" from "same signature plus a swap." Below the substitution threshold, the system emits a device-substituted event with both prior and current signatures.

Stability vs. discrimination

The engineering tradeoff that makes the whole thing useful. The signature needs to stay stable across normal operation (so it doesn't fire on every micro-fluctuation), and discriminate cleanly between different physical devices (so swaps actually trigger). The bands below are how the similarity score reads in practice.

EventExpected similarityOutcome
Same device, same firmware, normal load0.95 to 1.00No event
Same device, firmware update0.85 to 0.95No event (within firmware-shift tolerance)
Same device, behavior shift (codec wake, traffic mode change)0.70 to 0.85No substitution event; recorded for audit
Different unit, same vendor, same model0.40 to 0.65device-substituted fires
Different vendor or modelbelow 0.30device-substituted fires with high confidence

Thresholds are configurable per environment. Most customers run the defaults; a small subset (typically labs with frequent legitimate device rotation) raise the threshold to suppress noise.

Why this isn't ML, and why that matters more in 2026 than it did in 2017

A question that comes up on most evaluation calls. Worth answering directly, because the right answer changed shape between when this product was designed and now.

An ML approach would train a model on labeled device fingerprints, then classify new observations into device types or anomaly buckets. We don't classify; we look up. The Device DNA isn't a prediction. It's a deterministic record of "these signals were collected, and the matching record in the reference database is this one." When the signature changes, the change is the signal, not a model's opinion about the change.

Three consequences:

  • Auditable. "Why did this device get flagged?" has a deterministic answer: "the signal set changed; the database record changed; here's the before-and-after." Not "the model thought it looked different."
  • AI-resistant. No model means no adversarial evasion, no training-data poisoning, no model-supply-chain risk. The full version is on AI threats.
  • Less generalizing. A new device family requires a curated entry in the reference database before its identity resolves cleanly. An ML system would generalize automatically. We accept the slower path because the audit trail demands it.

Patent & IP

The technical core of Device DNA is patented. The patent covers the combination of the switch-derived structured signal set, the deterministic identity resolution through the reference database, and the use of similarity scoring to distinguish legitimate device change from substitution. Freedom-to-operate analyses have been completed against the known prior art; we're the only commercial implementation of this specific combination today.

For diligence purposes, the patent number and a redacted FTO summary are available under NDA.

Want the technical deep-dive?

Our engineering team will walk through the architectural detail, the similarity-score math, how the 750M-device database grows, and the corner cases that come up in real deployments (mixed-vendor environments, lab churn, firmware rollouts). Bring the questions the rest of this page doesn't answer.

Book the deep-dive