Evaluating facial recognition systems through the National Institute of Standards and Technology (NIST) scores? Here’s an objective guide for how to navigate the NIST data with insight.

Benchmark tests are a useful way to evaluate and contrast the state of facial recognition, but the tests are easily misunderstood and frequently misrepresented. The preeminent facial recognition industry tests that got underway back in 2000, are conducted by NIST, a government agency that is part of the U.S. Department of Commerce.

NIST conducts an on-going battery of tests, known as the Face Recognition Vendor Test (FRVT), to measure the key characteristics of facial recognition algorithms, including accuracy, performance, and bias. Companies and academic institutions are given the opportunity to submit one or more algorithms that NIST then applies to a set of tests. The two most recent FRVT test results of are each over 270 pages in length and were published in June and November.

NIST not only measures specific characteristics of facial algorithms, such as performance, accuracy, and bias, the standards and measurements body also reports on those attributes by image type, such as visa photos, mugshots, webcam, or “wild” images. Wild images are camera unaware faces captured on video: complex because the faces may be tilted, with wide yaw and pitch pose variation. Not to mention there may be many faces within a video frame. Wild images are challenging facial qualities that are precisely the type of real-world conditions for which SAFR was designed. NIST conducts its tests of facial recognition using still photographs. Facial recognition in live video requires concerted optimization in acquisition, accuracy, and speed.

SAFR from RealNetworks is the most accurate high-performance facial recognition algorithm for live video as tested by NIST.

Vendors, researchers, and academic institutions can optimize their submissions for the FRVT test and aren’t required to submit their actual commercial facial recognition algorithm, which can lead to misrepresentative results. A vendor or academic can submit an algorithm that performs remarkably well in accuracy but in commercial real-world conditions may be so computationally expensive as to be impractical. For example, in the November FRVT results several algorithms achieved high accuracy marks for wild faces but performed so slowly – three to five times slower than the SAFR algorithm — that they would be impractical in many real-world conditions, requiring unrealistically expensive computing power, excessive time to recognize a face, and be overwhelmed by many faces in the video.

Some of the algorithms tested by NIST are a bit like Formula One race cars: built to perform well on a particular circuit but not reflective of the commercial world where a thoughtful balance of driving conditions, such as noise, braking, range capacity, safety, and comfort are required.

So how does SAFR perform against the top-rated algorithm for accuracy in the November report? The top algorithm has a wild face score of 0.028 but is 4.7 times slower and 2.4 times larger than the SAFR algorithm, which has a score of 0.048. The top two algorithms performed well on accuracy, but by comparison to SAFR in a large-scale, real-world commercial deployment one would need to have 2-3 times the hardware on-site to achieve results that would be 4-5 times slower, as illustrated in the graph below. The increase in accuracy has a debilitating effect on performance and cost.

SAFR Algorithm Performance

High performance makes a material difference, since it increases the number of opportunities to attempt recognition in a computationally constrained system. In the cohort of algorithms that exceeded 95% accuracy, SAFR is both the fastest and lightest model. This means SAFR is able to sample a face multiple times during the same period of time of other algorithms, subsequently compounding SAFR’s accuracy. As a result, SAFR can unambiguously identify a single individual in a gallery of 10,000 faster than any other algorithm.

SAFR is highly competitive in accuracy for still photos: as tested by NIST it ranked in the top 10 commercially available products worldwide and in the top 3 from a United States company. However, as noted, the NIST score of accuracy doesn’t convey the entire picture. NIST measures the match of a single image to a single image for wild faces, while in real life the people moving within a video frame are in constant motion. SAFR uses edge intelligence to select the right image to match from hundreds of frames of video. What that means is that SAFR’s accuracy is actually higher than what is measured by NIST since SAFR continuously monitors and resamples the video to capture and submit the best frame for recognition. NIST doesn’t use video in its tests.

SAFR – Increasing Accuracy

This chart illustrates how SAFR increases accuracy through successive matches as a result of its superior performance.

SAFR stands apart from other facial recognition algorithms because it achieves results with a fraction of the compute power required by most of the algorithms in its class.  A lot of companies that submit to NIST are tuned to score high on accuracy but fail to strike a balance between performance and accuracy.

SAFR from RealNetworks is committed to providing the best accuracy and performance with the least bias, using readily available hardware to recognize people in real-world conditions.

SAFR is the premier platform for facial recognition in the real world.