Revocable Fingerprint Biotokens: Accuracy And Security Analysis

Transcription

Revocable Fingerprint Biotokens: Accuracy and Security AnalysisT. E. oult', ,',W. J. cheirer',' and R. Woodworth2,',VAST Lab University of Colorado at Colorado Springs and 2 ecurics,IncColorado Springs, CO. lastname@vast.uccs.edu or lastname@securics.com1AbstractThis paper reviews the biometric dilemma, the pendingthreat that may limit the long-term value of biometrics insecuriv applications. Unlike passwords, if a biometricdatabase is ever compromised or improperly shared, thennderlyirg biometric dutu cunnot be chur ged. Pieconcept of revocable or cancelable biometric-basedidenti@ tokens (biotokens), ifproperly implemented, canprovide sigrzificant enhancements in both privacy andsecuriv and address the biometric dilemma.The key to effective revocable biotokens is the need tosupport the highly accurate approximate matching neededin any biometric system as well as protectingprivacy/securily of the underlying data. We briefly reviewprior work and show wh.y it is insufficient in both accurac yand securiv.This paper adapts a recently introduced approach thatseparates each datum into two fields, one of which isencoded and one which is left to support the approximatematching. Previously applied to faces, thispuper uses thisapproach to enhance an existing fingerprint system.Unlike previous work in privacy-enhanced biometrics, ourapproach improves the accuracy of the underlying system!The securiv analysis of these biotokens includesaddressing the critical issue ofprotection of smallfields.The resulting algorithm is tested on three differentfingerprint verification challenge datasets and shows anaverage decrease in the Equal Ewor Rate of over 30% -providing improved securiv and improvedprivacy.1. The Biometric DilemmaThe key properties of biometrics, those unique traits thatdo not change significantly over a lifetime, are also theirAchilles heel.The biometric dilemma is that whilebiometrics can initially improve securiv, as biometricdatabases become widespread, compromises willultimately undernine biometrics' usefulness for securiv.At least 40 million "financial records" werecompromised or illegally sold in 2005, and over 50 millionmore financiallidentity records lost or stolen in 2006. Adatabase with millions of permanent "non-revocable"biometric records will become more significant cyber-'?his work supported in part under NSF STTR "ImprovingPrivacy and Security in Biometrics," Award Number 01106112831-4244-1180-7/07/ 25.00 2007 IEEEtarget. With the current trend, it is a question of when,not if, a major biometric database will be compromised.Of course, they don't have to be hacked, just shared orsold. In 2001, Colorado tried to sell their hce andfingerprint DB, and still offers it free of charge to anygovernment agency that wants access [Krouse-011.While many companies say their biometric templatescannot be used to recreate the data, [Ross-051 has shownrecovery of fingerprints from templates. Furthermore,reconstructing the original fingerprint data is not required,just generation of any of the infinitely many prints thatmatch the stored template. With techniques that allowgeneration of "gummy fingers" [Matsumoto-et-al-021, thepotential loss is not "academic" and not just an issue ofprivacy. While vendors claim new "liveness" detectionprevents spoofing, the authors have tested many of thenew sensors, from optical to capacitive to thermal, andspoofed every one we have tested.A compromised biometric cannot be "replaced" and thatpermanent loss feeds the perception of invasion from anyuse of biometrics - if decades later the government or acorporation wants to play Big Brother, you cannot takeback the information they gathered or lost. With the easewith which today's sensors can be spoofed and with theinstructions readily available on the Internet, this is asecurity time bomb!While many people like to think of biometrics as"unique", operationally they are not. Even FBI examinershave made high-profile misidentifications with latentfingerprints, e.g. [Cole- 051 documents 22 examples.Fingerpritn examiners have nearly unlimited time andcan use all level of features. The best fingerprint systemstested by the US government have only 98% trueacceptance rates, when set to reject 99.99% of falsematches. At 99.99%, finding a false match in a databaseof millions is likely, leading to what we call thedoppelganger threat, where compromised databases withmillions of users will allow an intrudder to find a few"close enough matches they can directly impersonate.A critical issue in the biometric area is the developmentof a technology that allies the privacy concerns whilesupporting the security goals. A partial solution is tonever store the original biometrics, but some cancelabletoken generated from it. This concept was introduced in[Ratha-et-al-011, and called Cancelable biometrics.However, since the underlyng biometric data is not

actually canceled or revocable, this oxymoron can causeconfusion, We introduce the term biotoken, to refer to"the revocable identity token produced by applyng arevocable transform to biometric data, such that identitymatching is done in the encodedirevocable form".To address the biometric dilemma the biotokens mustsupport a large variety of tokens per individual so thateach database is independent and non-linkable and alloweffective revocation and reissue.They must havesufficiently high accuracy and must provide forcryptographically strong protection of the underlyng data.Even with all of that, they must explicitly have someapproach to address the doppelganger threat.A few approaches for cancelable or revocable biotokenshave been discussed in the literature, with a review andclassification of the leading prior work presented in[Ratha- et.al.-071, only the most significant are discussedherein. They divide the field into four catagories:Biometric salting, Fuzzy schemes,Biometric Keygeneration and non-invertible forms. Our approach doesnot fit within any of these catagories."Biometric encryption" and biometric salting work bymixing randomness but allowing matching viaconvolution or other approach. This underlyes the workof many groups, but has not produced systems withdemonstrated effective accuracylprivacy. For fingerprintsthey require pre-aligned data, an unrealistic assumption.Finally the security depends on "one-time" pad arguments,but because of symmetry of data and pad it means thesame print (even with a different pad) cannot be reused inother databases [ Scheirer-Boult-071Fuzzy schemes, the best of which we consider to be[Tuyls-et-al-051, are making progress but still signficantlydecrease the effectivness of the underlyng algoithms. Theexisting work also presumes the fingerprint data has beenaligned, e.g., using core-delta alignment, which isproblematic. Even given aligned data, the best reportedresults only increased the Equal Error Rate of theunderlyng algorithm by factor of 2 while they embedded a40bit key. Many fuzzy scheme's also subject to an attack ifthey same print reused multiple times [ Scheirer-Boult-071Non-invertible forms were first suggested in [Ratha-etal-011. Non-invertibility alone does not provide significantprotection. Since fingerprint systems are tolerant ofmoderate error levels, even if the ambiguities can never beresolved, the protection may still not be sufficient. In theirmore recent work [Ratha-et-al-071, define sophisticatedtransforms. They present performance data on theirmatcher, using an IBM internal DB of 181 pairs. Therevocable transforms reduced the system accuracy, it ishard to interpret the performance numbers of kom aninternal DB. Furthermore, these transforms, which areformally non-invertible, have very limited ambiguity. Intheir best performing "surface folding" transform, onlyabout 8% of the data changes its local topology, hence wecan conclude only a small kaction of the data is logicallynon-invertible. Given the transform, one could invert thenon-folded regions, and then take each point in the foldedregion and send it to the few (2 or 3) potential locations.Since a fingerprint matcher would likely match a printwith 8% extra data, we would consider that to effectivelycompromised and hence not cryptographically secure.A good review of biometric key generation is given in[Uludag et. al. 041. The idea is a mixture of quantizationand encryption of the biometric data to produce a uniquekey. However, the process of encryption will transformthe input with only one bit difference, i.e., nearly identicalbiometrics, to very different numbers. Given that anybiometric has a range of expected variations for the sameindividual, either the biometric key will often not matchthe individual or the data must be degraded so that allvariations for an individual map to the same data.However, this would significantly degrade the missdetection rate. The results in (Uludag et. al. 04) show aloss of two orders of magnitude in accuracy.In [Boult 061 an approach somewhere between the noninvertible and key-generation approach was proposed forhce-based biometrics. That approach combines the ideasof transformation of data, robust distance measures andencryption of biometric data. After some scaling, itseparates data into two parts, the kactional part, retainedfor local distance computation, and the integer part whichis then encrypted. When applied to faces, the approachsignificantly improved the performance of the PCA, LDAand EMGB algorithms. It is the only algorithms of whichwe are aware that actually improved performance whileproviding privacy protection and security enhancements.2. Cryptographically Secure Biotoken OverviewThe computation of our Cryptographically SecureBiotoken, which we call a BiotopeTM, uses a feature spacetransform applied to an existing minutiae-basedfingerprint system.The approach supports bothtransforms that are public-key cryptographically invertibleand/or using cryptographic one-way functions such asMD5. In either case, even if both the parameters and thetransformed data are compromised, there is no practicalway to recover the original data, thus, removing the risksif centralized databases are compromised. (Obviously,given access to the private key, transform parameters anddata would allow inversion, but that key is not used in theverification process and need not be online at all.)In short, the fundamental advances of the approach areprovided by a transformation that provides a robust,distance-based computation for supporting confidence inverification while supporting revocability and verification

without identification, while at the same time, permit havethousands of simultaneous instances in use without theability for anyone to combine the stored data to reconstructthe original biometric.Before we introduce the transform, we discuss a helperfunction. As we split the encoded data, we will be using amodulus-like operation. Such an operation can takenearby elements and separate them by significantdistances. Thus, we developed what we call a reflectedmodulus, or cmod, such that nearby elements are mappedto nearby elements after applyng cmod. We implementthis with a folding technique to map items near each otherafter mapping,, e.g,, if we want a window of size E,, we letx d % (E*2),rmod(d,E) x if x E, andrmod(d,E) (E*2)-x otherwise.It is easy to show that if x and y are such that x-y t thenrmod(i,z)-rmod(y, t. Since cmod does not increasedistances between points, as a traditional modulus can do,it is better suited to many of the transforms needed inpublic key biotokens.A key insight into the approach following our work in[Boult-061, is that a robust distance measure is, bydefinition, not strongly impacted by outliers. Logically,outside a window, non-matching data has constant, orzero, impact. Many fingerprint systems use a robustdistance in matching, e.g., in [Ratha-et-al-071 they use amatcher which ignores any match outside a fixed size box.In [Boult-061 we used this observation to define atransform that scales the data then separated them into thekaction (r) and integer (g). The integer, g, is consideredstable and matchs exactly so then when these fields areencrypted it will still match. We presented a theorem thatif the scaling is correct, then the robust distance measureon the raw data and the induced distance measure afterencoding can only improve the matching performance.While floating point and kactionlintegers may beappropriate for face-based representations, for fingerprintsthe data is inherently small integers. We still transformeach datum via v' (v-t)*s, with scale (s) and translation(t). However, we then separate each datum into 2 parts,one, q (quotient), that must match exactly, basicallydefining the 'kndow" for the robust computation, and thesecond, r (reflected modulus), which is not encoded andwhich supports the local distance computation. Given aparameter E, which depends on the expected range ofvariations in v, we define residual r rmod(v:E), andquotient q int(vl/E).Then we can apply one-way orcryptographic transform of q to produce w, which werequire to match exactly. As the data is separated in to rand q, the result leaves an unencrypted value, r, withinthe "window" in which local distance can be computed,and then encryptslencodes the larger (and hence verystable) part of the position information, thus effectivelyhiding the original positional data.To ensure that the biometric data is protected even if the"transfornation" parameters are compromised, we need toensure that the q values are cryptographically secured. Forlarge data items, e.g., doubles, encryption of q may beeffective.For small data items, as we have infingerprints, additional work must be done to protect thedata. For a single small field there is little that can bedone. As we will see later, for a collection of fields, thereis a mix of both public-key and hashing that can protectmany small fields and improve the overall performacewhile making reissue straightforward.We can also add a user passcode. This passcodedbiotoken inherently provides two-factor security mixedsuch that only a single biotoken is stored. The inclusion ofthe passcode provides strong revocation, and makes theresulting biotoken "verificaition only", providing increaseprivacy protection and the best protection from adoppleganger threat.Applications that require"duplication detection" during enrollment, e.g., passports,can use regular biotokens during enrollment andverification-only biotokens during operation. Theenrollment testing DB, which is going to be inkequentlyused, can be more tightly controlled and keep the keysneeded for generation of the revocable tokens in a differentserver. Then the verification only biotokens can then beused for the day-to-day operation. We call this approacha operationally-verification system, which is stillconsiderable better for both privacy and security than atraditional verification system or even a revocablebiotoken-based verification.We consider this"verifivation only" biotokens the only real protectionagainst a doppleganger threat since the use of a revocabletechnology does not stop someone kom searching a DB tofind a victim that is a natural match. If two people'sbiometrics approximately match after a revocabletransform they almost assuredly likely match in raw form.So a operationally-verification, technology is the only realdefense against a doppleganger threat, and tbe bestdefence against the biometric dilemma.3. Background on the Bozorth MatcherThe description of the original matcher is based on thesource code and on [Watson-et-al-041. The natural formof the Bozorth matcher takes as input a minutiae file withx,y,O,q, where x,y is the location, 0 the angle and q thequality, with such files produced by mindct kom the NISTtoolset. Matching comprises three major steps:1. Construct intra-fingerprint minutia pair comparisontables for the probe fingerprint and one table for eachgallery fingerprint to be matched.

optional), so going from 32 unknown bits down to 8 or 16yelds p 224or P 216. Thus, to recover the original highorder bit data for the row, a brute-force attack will need toresolve the p-fold ambiguity. We can further increase theambiguity by having multiple columns in the encodeddata, one for the CRC-result of protected data, 2 for theAES encrypted data (or 4 if we use 8 bit CRCs) and, ifdesired, additional columns of chaff (random data). In the"enrollment template", we randomize the columns(separately for each row) so there is no apparent ordering,but using the key that was in the encrypted block, we candefine the "random positions" for the AES encryptedinvertability data within each row. (Obviously, the PKIencrypted AES key and index must be determinable).With a total of c possible match positions for the data inthe columns of data chaff, this produces a (64*pc)-foldambiguity a would-be attacker must resolve to recover thedata on that row. (The factor of 64 is from the number oftransforms that might have been applied to a row).Importantly, we don't have to resolve this ambiguity whenmatching, because we consider a match of any fieldagainst any field (without replacement), plus require thethree residuals fields to match. For a true positive, theprocess will match the encoded result. The chance of animposter matching the CRC, even given the potentialordering and chaff is less than 1 in 20,000 and when oneadds the requirement of simultaneously matching the 3residuals, it is small enough to not significantly impact theoverall matching performance. It is important to note thataccidentally matching a few rows (even a few dozen) doesnot have a significant impact in a Bozorth-like matcher,because a spurious row can only impact the final answer ifit can form a part of a larger "web" of results with aconsistent overall rotationltranslation. The formation of aweb makes it very unlikely random matches could producea significant false match score.In terms of the security analysis, however, it is critical tonote that to recover the original data requires much morethan just resolving the pc-fold ambiguity per row. There isno test, per row, that can help decide which is correct.While we don't know if it can be done, it seems plausiblethat by combining different rows simultaneously onemight construct a consistent "web" of underlyng minutiaethat may provide a test for constancy. It is unknown if thiswould be unique, and hence identify the true data, or if itwould still result in a many-fold or infinite ambiguity. Ifit is possible to develop such a test it would requiresimultaneously resolving thepc-fold ambiguity for n rows,each of which is independent. Thus, a brute force searchwould require d4*' attempts. In the worst case, one mightgenerate m2 rows from m minutiae, though in practice welimit to 512 rows independent of the number of originalminutiae, deleting less significant rows. It is clear that torecover something that might be an acceptable subset of aprint would require at least 1 row per desired minutiae andmore likely 2 rows per minutiae.To get a relativelyminimal "16 point" print would require recovering at least16 rows, and a more realistic estimate is that one wouldhave to recover at least 32 rows to even have a smallchance of forming a web that properly interrelates 16minutiae points, and n 64 to have a decent chance torecover a larger point match. (Again we don't know howto go from rows back to minutiae so the numbers neededare somewhat uncertain.)In our current implementation 64pc 26*216*27 229,SOfor a brute force attack to recover 16 minutiae wouldrequirea minimum n Z4, nP' 2(4*25) 21 0 and more 217' brute forcerealistically it would be n z7,nP' 2(7*25)attempts to recover 16 original minutiae. Again, all of thisanalysis is presuming that after generating hypotheses foreach of the unknown items in a row there is a testablehypothesis to confirm the collection of rows as correct. Nosuch algorithm is known, but since we have no analysis tosuggest it is computationally infeasible, we do not includeits difficulty in our security analysis.The important thing is that the approach has providedsufficient ambiguity that, unlike the encoding of individualfields or the folding approach of (Ratha et. Al. 07), itprovides a reasonable protection against a brute forceattack. Given that there are other ways to "acquire"fingerprint biometrics, such as following a person aroundand picking up a left object, the 2100-2175seems sufficientprotection. If it is not, adding more chaff columns (i.e.increasing c) and or having additional data (e.g. minutiaetype) that is part of the "stable data" (i.e., increasingp),exponentially increases the protection.Cariations on this idea allow tradeoffs between storagesize, computational cost and security. Originally, weimplemented this as described, but because the inversionof the pair-table is not obvious, and because it is largerthan the original minutiae data, we moved to a moreefficient form wherein we encrypt a compressed form ofthe raw minutiae data rather than the 32 bit fields to beprotected. (Le. EllE2 in Figure 1 are replaced with AESencrypted raw minutiae). We still generate the CRCversion of the q fields and use it in matching. Since theraw minutiae data is block encrypted, it is properlyprotected. Of course, none of that data can be used forapproximate matching, but it does make good chaff. Sinceboth the encrypted minutiae and the AES encrypted "q"data appears as random chaff to the matching algorithm,the difference is immaterial to the performance but itmakes the PK-inversion much simpler and requires lessspace.A secondary advantage of our biotoken approach is thatit supports a simple "company" level re-issuance. When

the data is encoded, the "index" that allows one to identifythe CRC-data among the chaff can be encoded with thecompany's master public key. If the company stores that asan enrollment master-public key biotoken, they can thenuse their private key to recover the order and issue anoperational biotoken. The operational biotoken isgenerated by using an additional key as a post-pad to theCRC-computation of the data fields, e.g., take the encoded16 bit CRC from the user, append the new keys andcompute a new 16 bit CRC. Since this is non-invertible,they can then PWAES encrypt the original CRC-values,replacing chaff columns with the results. If an operationalbiotoken is compromised, or if the companies' biotokenpolicy usage limit is reached, the company can use theirkey to recover the original CRC values (i.e., go back totheir original master public key biotoken) and then reissuea new operational biotoken from their master. Still theusers data is protected, though knowledge of the order ) the 64pc factor and reduces the bruteremoves ( 2 fromforce effort needed by the company, but it still requiresvery significant effort (of 2") for even an insider to try to abrute force attack. But it provides solid operationalsecurity model with no customer inconvenience.The post-pendmg of keys, or a more general multi-stageprocess, can be used to support per-transaction uniquepublic key biotokens. For example, with the CRC modelwe can take the operational public key biotoken, appendeda transaction-specific key, and produce a new encodedfield. For the transaction-level, the system does not need tounderstand the order or re-encode the original CRC databecause no adhtional transforms will be applied after thetransaction, so there is no need for inversion and it canjust apply the final CRC computation to all the columns,and does not reduce the security at this level at all. Formatching, the user's biometric is then subjected to asimilar process and the results can be matched. While thetrue traditional CRC-based approach may be sufficient forbasic transactions, higher security applications could usemore advanced cryptographic hashes, which require largerstorage and more computation. They can also use aCRCIhash such that the operational and transaction key,though applied separately, can be combined into a singlekeyltransform to be applied so that the user's machinenever receives the separate keys.6. PerformanceWe implemented the fingerprint-CryptographicallySecure Biotoken by extenhng the NISTFBI Bozorthmatcher (also called NIST VBT). There are at least 2major aspects of performance, speed and accuracy, whichare discussed separately.During enrollment we require the generation of an RSAkey and full PK encryption, which is the most expensive0.00010.0010.010.11False Accept RateFigure 3: ROC curves comparing BiotopeTMbiotokensand the NISTIBozorth matcher on FVC2002 datastep. On a 380x380 image, the computational aspects ofenrollment take approximately 750ms to 3000ms on al.6Ghz Pentium 4 processor depending on the size of thechosen RSA key (5 12-2048 bits). Of this 250ms to 2500msfor the key generation and encoding the AES key, 350msis for the minutiae extraction and image processing, andother parameters and 50ms is for the AES encoding andsecure biotoken generation. Matching does not require thePK encoding steps, greatly reducing the time to a totalaverage of 423ms, of which 394ms is for the imageprocessing and 29ms is the biotoken generation andmatching. This is only 8ms more than the time for thestandard NIST implementation of the Bozorth matcher onwhich our biotoken is based.More important that speed, however, is how the biotokenprocess impacts matching accuracy. Accuracy is a strongfbnction of the number of minutiae or table sizemaintained. For the Bozorth algorithm, we use the presupplied defaults, which allows for 150 minutiae and10,000 pairs. For the biotoken, we limited the table size tokeep the biotoken storage size below 24K, with an averagesize of 13K. Limiting was done first using the defaults forpruning on NIST-computed quality of the minutiae butalso trying to ensure that each minutiae was included in afew pairs rather than letting the best minutiae take part inall of their pairs. This was done to ensure better spatialcoverage.While a few may consider a 20K tokenexcessively large, we believe a little storage is a smallprice to pay for the security and privacy enhancements ofbiotokens. Then again, with 20K biotokens, 50 Milliontokens fit on a Gigabyte card and tokens for all of the USfit on a laptop.We made only minimal changes to the matcher code,extending it to handle added columns and to test the

DatasetBiotokenVerificationImprovementOver EER ofencoded fields with the described "subset matching".Because the encoding of the tables and quality pairing canchange the number of entries, we added normalization tothe scoring based on the number of rows used.For analysis we applied both algorithms to the wellknown Fingerprint Verification Challenge datasets, e.g.,see (http://bias.csr.unibo.it/fvc2000). Each of the FVC200?verification tests has 8 prints each of 100 subject,producing 2800 true matches and 4950 hlse matchattempts. Figure 3 shows ROC curves comparing thebiotoken algorithm with the original NIST'Bozorthmatcher. The new biotokens consistently outperform theoriginal algorithm. (Note this is a semi-log ROC.) Todefine the improvement quantitatively we use the EqualError Rate, shown in Table 1, and have an average of 33%reduction in the EER. As can be seen Figure 3,improvement general increased with decreasing FARIncluding more features during matching (e.g., ridgecounts) might improve biotoken performance but were notincluded because they are not used by Bozorth3 and wouldbe an unhir comparison. Even without the added features,for NC2000, these scores would have resulted in it beingthe 3rd place algorithm overall, and in the top ten inNC2002.The EVC2004 data requested subjects tointentionally distort their prints, which may have movedminutiae outside the window used for matching and inbuilding the feature-web in the Bozorth algorithm, whichnegatively impacted both algorithms' performance.While the accuracy gains with our approach are not assignificant we reported in [Boult-061 for face, they are stillsignificant. In addition, prior fingerprint-based techniquesattempting to provide privacy, e.g., [Ratha-et-al-071 or[Tuyls-et-al-051 have all had to trade accuracy for privacy.7. Conclusions and future workThe paper introduced the use of a robust revocablefingerprint-based biotoken. We analyzed previous workand showed it lacks in bother accuracy and/or security.The paper introduces a "reflective modulus" operator withan important local neighborhood "nearness" preservationproperty, which is important to the effectiveness of thebiotoken algorithm. The transforms combined withencryption maintain the privacy while the unencryptedpart supports a robust distance measure, something that iscritical to make biometrics effective. While the paperpresents only fingerprints, the approach applies to almostall biometrics.As Admiral James Loy, Head of Transportation SecurityAgency, stated at the 9th Annual Privacy & AmericanBusiness Conference, 2003 "Don't be too quick to strike abalance between privacy and security. As Americans, weare entitled to a full measure of both". Secure biotokensshow, that at least for biometrics, we don't have to acceptthe loss of privacy to gain security. BiotopesTM and othersecure biotokens can solve the biometric dilemma. Notonly do they provide privacy and security, they actuallyimprove the accuracy of the underlyng biometrics!References[Boult-061 T. Boult, "Robust distance measures for hcerecognition supporting revocable biometric tokens",IEEE Conf. on Face and Gesture. 2006[Krause-01] M. Krause "The expanding surveillancestate: why Colorado should scrap the plan to map everydriver's hce and should ban facial recognition in publicplaces," Independence Institute, Issue Paper #8-2001.[Matsumoto-et-al-021 Tsutomu Matsumoto, HiroyukiMatsumoto, Koji Yamada, Satoshi Hoshino "Impact ofArtificial "Gummy" Fingers on Fingerprint Systems"SPIE Vol. #4677, Optical Security and CounterfeitDeterrence Techniques IV

matcher, using an IBM internal DB of 181 pairs. The revocable transforms reduced the system accuracy, it is hard to interpret the performance numbers of kom an . to nearby elements after applyng cmod. We implement this with a folding technique to map items near each other after mapping,, e.g,, if we want a window of size E,, we let