Geomean throughput advantage over FAISS-CPU across all public insert/query rows.
Performance that changes the operating envelope.
XLerate™ DNA is built to handle PhotoDNA3 lookups the way they actually arrive in the wild: one query, right now. Our benchmarks use this shape - called batch-1 - to assess local SDK throughput, deployed AWS full-service throughput, access and growth stress and finally, cloud economics.
Geomean throughput advantage over FAISS-GPU while preserving the public recall contract.
Mean same-concurrency throughput multiple in the deployed AWS service matrix.
Mean FAISS-CPU infrastructure spend multiple across the same AWS service-matrix rows.
Local SDK performance
The CPU engine is winning in GPU territory.
XLerate DNA - running on a commodity CPU - takes the lead in performance territory normally reserved for expensive GPUs.
Base
The starting database before growth inserts. Search effort: 0.00275 (0.275% of indexed space). Indexed vectors: 1M.
Medium
The database after adding medium-distance inserts. Search effort: 0.00247 (0.247% of indexed space). Indexed vectors: 2.27M.
Large
The largest tested database after easy, medium, and hard inserts. Search effort: 0.000031 (0.003% of indexed space). Indexed vectors: 4M.
Full performance matrix Values are Queries Per Second or QPS.
| Database size | Query band | XLerate SDK | FAISS-CPU | FAISS-GPU | CPU multiple | GPU multiple |
|---|---|---|---|---|---|---|
| Base | Easy | 1.38M | 51.1k | 615.8k | 27x | 2.2x |
| Base | Medium | 1.46M | 51.7k | 670.7k | 28.2x | 2.2x |
| Base | Hard | 1.44M | 52.2k | 665.6k | 27.7x | 2.2x |
| Medium | Easy | 1.08M | 30.4k | 401.5k | 35.6x | 2.7x |
| Medium | Medium | 1.17M | 31.9k | 453.1k | 36.6x | 2.6x |
| Medium | Hard | 1.16M | 32.1k | 448.4k | 36x | 2.6x |
| Large | Easy | 6.22M | 64.8k | 948.5k | 95.9x | 6.6x |
| Large | Medium | 6.57M | 65.3k | 993.4k | 100.6x | 6.6x |
| Large | Hard | 6.51M | 65.3k | 991.1k | 99.7x | 6.6x |
Benchmark shape
Three databases, one standard test
The benchmark is built around three database sizes: Base, Medium, and Large. Each size is queried with Easy, Medium, and Hard PhotoDNA match bands so growth and match difficulty stay easy to read separately.
The comparison runs XLerate DNA against FAISS on the same CPUs, and against FAISS on a top-tier GPU, to show whether general-purpose vector search can beat a PhotoDNA-specific engine on speed or cost.
The databases and query vectors come from PDNA-1M: one million clean base images plus ten million qualifying query vectors.
Read about the datasetAWS deployment performance
The lead translates to the cloud.
The AWS benchmark uses a full-fledged product deployment - not a benchmark harness. Orchestration, service path, request handling, storage interaction, concurrent client drivers - all run while collecting latency, recall, and cost.
AWS single-query throughput under load
Mean lines summarize the deployed AWS benchmark matrix across the three databases sizes and query bands. The transparent thicker lines show the upper and lower performance bounds behind the mean at each concurrency point.
Latency stays controlled as concurrency rises
Throughput with high latency is just a queue; this curve shows the p95 latency of the service under various levels of concurrency pressure.
Cost model
Unbeatable total cost of ownership.
XLerate DNA's cloud performance redefines the economics. Chasing its throughput and latency profile with FAISS-CPU forces materially higher AWS infrastructure spend.
Cost per million queries under load
Measured AWS economics across the public service matrix. Lower is better; the distance between the lines is the bill moving with the throughput curve.
Queries per dollar
Measured from the same AWS serving-infrastructure cost rows as the cost chart. Higher is better: it shows how many single-query lookups each infrastructure dollar buys.
FAISS infrastructure spend by AWS concurrency
At each AWS client level, the bar shows the mean cost multiple - spent on AWS infrastructure alone - required for FAISS-CPU to try to keep up with XLerate DNA.
Spend-multiple cost matrix Values are same-concurrency throughput, p99 latency, and spend multiples.
| Database size | Query band | AWS clients | Throughput multiple | p99 ratio | Spend multiple |
|---|---|---|---|---|---|
| Base | Easy | 256 | 7.2x | 7.4x | 6.1x |
| Base | Easy | 512 | 8x | 5.8x | 6.7x |
| Base | Easy | 1024 | 8.3x | 5.5x | 7x |
| Base | Easy | 2048 | 8.2x | 5x | 6.9x |
| Base | Medium | 256 | 9.1x | 6x | 7.7x |
| Base | Medium | 512 | 9.8x | 5.7x | 8.3x |
| Base | Medium | 1024 | 9.7x | 5.5x | 8.2x |
| Base | Medium | 2048 | 9.3x | 4.9x | 7.9x |
| Base | Hard | 256 | 8.2x | 5.6x | 6.9x |
| Base | Hard | 512 | 8.1x | 5.1x | 6.8x |
| Base | Hard | 1024 | 9.5x | 5.8x | 8.1x |
| Base | Hard | 2048 | 8.8x | 4.7x | 7.5x |
| Medium | Easy | 256 | 8.6x | 7.8x | 7.3x |
| Medium | Easy | 512 | 9.1x | 7x | 7.7x |
| Medium | Easy | 1024 | 8.7x | 6x | 7.3x |
| Medium | Easy | 2048 | 9.7x | 5x | 8.1x |
| Medium | Medium | 256 | 8.5x | 6.1x | 7.2x |
| Medium | Medium | 512 | 8.7x | 6.3x | 7.4x |
| Medium | Medium | 1024 | 8.8x | 6.3x | 7.4x |
| Medium | Medium | 2048 | 9.5x | 5.8x | 8x |
| Medium | Hard | 256 | 9x | 6.6x | 7.6x |
| Medium | Hard | 512 | 9.3x | 6.7x | 7.8x |
| Medium | Hard | 1024 | 9x | 6x | 7.6x |
| Medium | Hard | 2048 | 10.2x | 5.8x | 8.6x |
| Large | Easy | 256 | 4.9x | 5.1x | 4.1x |
| Large | Easy | 512 | 5.1x | 6.5x | 4.3x |
| Large | Easy | 1024 | 6.6x | 5.9x | 5.6x |
| Large | Easy | 2048 | 6.5x | 4.9x | 5.5x |
| Large | Medium | 256 | 7.2x | 5.6x | 6.1x |
| Large | Medium | 512 | 6.7x | 5.1x | 5.7x |
| Large | Medium | 1024 | 6.9x | 5.6x | 5.9x |
| Large | Medium | 2048 | 6.4x | 4.8x | 5.4x |
| Large | Hard | 256 | 5.3x | 4.4x | 4.5x |
| Large | Hard | 512 | 7x | 5.2x | 5.9x |
| Large | Hard | 1024 | 7.3x | 6.6x | 6.1x |
| Large | Hard | 2048 | 7.2x | 5.7x | 6x |
Bucket health
Near-perfect health under database growth.
XLerate™ DNA's novel, patent-pending2 clustering system never requires learning and maintains unprecedented cluster health across all database sizes. Other methods lose both accuracy and speed to degrading bucket health - requiring expensive reclustering with growth.
Bucket health under growth
Base, Medium, and Large database sizes plotted by effective bucket usage and p99 bucket load pressure.
Stress test
XLerate DNA performs under pathological load.
Real systems do not receive perfectly polite traffic. The stress test, run against the local SDK, shows how throughput behaves when demand concentrates instead of spreading evenly across the index. We hammer XLerate DNA under various Zipfian skews - various levels of same-bucket access, causing contention.
Base
32 thread contention run across increasingly concentrated access patterns.
Medium
32 thread contention run across increasingly concentrated access patterns.
Large
32 thread contention run across increasingly concentrated access patterns.
Fair reading
Clear, spin-free summary.
For high-stakes PhotoDNA workflows, XLerate DNA delivers unparalleled performance.
Takeaway
If you need to perform accurate PhotoDNA matching at scale, the benchmark story is unambiguous: XLerate DNA delivers higher throughput, controlled latency, materially better infrastructure economics and better performance on commodity hardware than alternatives can deliver on top-dollar specialty hardware.
For cloud solution providers, your trust and safety workflow can realize a 7-fold or higher infrastructure cost reduction while increasing its effectiveness. For digital forensics, you can escape the lab backlog by doing live-triage on commodity devices at speeds that rival lab hardware.
This page does claim
- In the local SDK matrix, XLerate DNA preserves the recall contract while delivering 46.2x geomean throughput over FAISS-CPU and 3.4x over FAISS-GPU.
- In the AWS service matrix, XLerate DNA delivers 8.1x mean same-concurrency throughput over FAISS-CPU.
- Across those same AWS service rows, equivalent FAISS-CPU serving-infrastructure spend averages 6.8x XLerate's serving-infrastructure spend.
- All benchmark configurations required both XLerate DNA and FAISS to yield 100% accuracy. This is not typical in vector search - but it is required for high-stakes PhotoDNA workflows.
- Importantly, XLerate DNA maintains its performance on out-of-distribution data - namely adult content. This proves it's not tailored to the PDNA-1M dataset, and thus reliable for the kind of content normally handled in PhotoDNA workflows.
This page does not claim
- That XLerate DNA replaces FAISS for every vector workload. FAISS is used as a reference because it's the gold-standard, high-performance vector library. XLerate DNA provides the same outcome on PhotoDNA vectors asymmetrically with patent-pending technology.
- That the local GPU benchmark has the same operating shape as the others. In fact, FAISS-GPU was given a batch-512 workflow - an advantage - in order to honestly give it the best chance.
- That the AWS economics are a customer price quote. These figures are normalized measured throughput against the stated on-demand serving-infrastructure costs at the time of the benchmark.
Benchmark setup
The details matter.
These are the practical details behind the benchmark: the local harness, the AWS service shape, and the details that matter when reading the numbers.
The important point is that the comparison is not hiding a special disadvantage for FAISS. In the AWS benchmarks, FAISS-CPU was run inside the exact same service code that XLerate DNA runs in - we simply swapped XLerate DNA's runtime out for FAISS. In fact, both FAISS-GPU and FAISS-CPU were given a few unfair advantages.
FAISS-GPU, like all programs that must run on a GPU, suffers the speed bottleneck of transferring data to and from the GPU. If we made FAISS-GPU answer 1 query at a time like XLerate DNA or FAISS-CPU, that bottleneck would make FAISS-GPU appear to be as slow or slower than FAISS-CPU.
FAISS-GPU was allowed to use a batch size of 512. That means that it was allowed to conduct 512 queries at once. If this were a race to taxi people from point A to point B, XLerate DNA and FAISS-CPU would be single-passenger cars competing against against an impossibly large bus. However, this advantage had to be given to afford it a fair chance. In a real deployment, this would mean you would have to hold back answering queries until hundreds more arrived - so even with this advantage, it's not realistic for an on-demand system.
FAISS-CPU was likewise given an advantage to make it practical. It didn't have to cluster vectors inserted, and thus its insertion speeds are far higher than they would be under normal usage. XLerate DNA never has to learn clustering, unlike other vector products. The advantage afforded to FAISS-CPU hides an extraordinary benefit of XLerate DNA.
Hardware and FAISS build
The local matrix ran on a high-end desktop CPU, with FAISS-GPU also tested against an RTX 4090 baseline.
- AMD Ryzen 9 7950X, 16 cores / 32 threads.
- 63.6 GiB system memory, DDR5-5600.
- NVIDIA GeForce RTX 4090, 24,564 MiB VRAM.
- FAISS 1.14.1 with GPU and NVIDIA cuVS support.
- FAISS runtime: IVF Flat, nlist 32,768, nprobe 32 or 2 by row.
Service benchmark shape
The AWS rows compare deployed amd64 services and count serving infrastructure, not the separate load-generation fleet.
- XLerate service: c7i.8xlarge, 32 vCPUs.
- XLerate Garnet: r7i.xlarge, 4 vCPUs.
- FAISS-CPU service: c7i.8xlarge, 32 vCPUs.
- Load generation used 7 c7i.xlarge runner nodes across 256, 512, 1024, and 2048 client levels.
- It's somewhat confusing, but all three database sizes represent the same 1-million unique images. When you edit an image and hash it again, the hash changes - but that hash can still be matched to the original hash. This is why, counterintuitively, you have to search less of the database with this growth pattern - because the goal is not to match the same hash, but the same image. Searching less of the database equals faster searches. This effect is realized by all competitors in this benchmark, not just XLerate DNA.
- FAISS-CPU was not required to build its index in the AWS benchmark runs. A single index was pre-trained optimally for all benchmark bands, which is an advantage for FAISS because index building is incredibly slow and normally must be repeated as the database grows. Not having to build an index via clustering makes FAISS insertion performance look better than it would be in a real deployment and hides an extraordinary benefit of XLerate DNA.
- In our experiments, allowing FAISS to build a fresh index for each run did not materially impact the resulting index quality, nor the n-probe (how much of the database had to be searched) required to achieve 100% recall. This is because FAISS is bound by the limitations of the pure vector space geometry, which means its can only optimize clustering up to a ceiling.
- In the AWS benchmarks, FAISS-CPU was swapped into the same service code that XLerate DNA uses, and was not required to communicate with persistent storage - an advantage over XLerate DNA.
- In the AWS spend-multiple analysis to reach parity with XLerate DNA, storage cost was not included for FAISS - which makes the spend multiple lower than it would be in a real FAISS deployment.
- We did not benchmark FAISS-GPU on AWS GPU infrastructure because the cost per query gap would be the same or greater when factoring in required batching, and the requirement of batching queries to the GPU to efficiently use the hardware violates the instant/on-demand nature of live workflows.
- XLerate DNA's benchmarked performance characteristics have been repeated in an out-of-distribution (OOD) dataset test - adult content. This proves that its advantages are not specific to the PDNA-1M data, and that they are retained when tested on data very similar in visual composition to data targeted by PhotoDNA workflows.
- FAISS-CPU was compiled with AVX-512 support.
- XLerate DNA uses dynamic SIMD dispatch for all vectorizable operations, and so it also leveraged AVX-512.
- The local benchmark CPU uses a double-pump mechanism to provide AVX-512 support, leading to suboptimal performance over a proper 512-bit execution unit.
- XLerate DNA is engineered to refuse to return non-matches - guaranteeing 100% accuracy under configured parameters. That is not typical of vector databases or vector search, but is appropriate for its intended use of high-trust threat identification.