IDV and Mastering the Three States of Data, Part 3: Data in Use

Note: This blog post was originally published on 24 June 2023 and has been updated to reflect the current state of the regulatory environment.

In this third instalment of our 3-part blog series on IDV privacy and ethics, we will be covering specific issues concerning using data under BIPA and the GDPR.

Privacy in the Prairie State

The Biometric Information Privacy Act (BIPA) was enacted in the US state of Illinois back in 2008, and it is a fraught statute for IDV providers and clients alike. The actual requirements of BIPA are simple enough: you must collect consent from your users before collecting and processing their biometric data, inform them of a few things, and delete their biometrics once you no longer need them. Pretty normal so far; nothing that is out of line with the GDPR or other global privacy laws.

But the sting in the tail is the private right of action and $1,000 compensation—$5,000 if intentional or reckless—per violation for each user with no need to prove any loss. Being the US, where litigation is a celebrated national tradition, this has led to a frenzy of class action lawsuits against companies not complying with the letter of the law. Facebook has previously settled for $650 million, Google for $100 million, and Tiktok $92 million. There are thousands of additional cases waiting to be heard.

IDV providers have been stung, too, with industry heavyweights having settled for multi-million dollar sums in the recent past.

Making matters more onerous is the fact that insurers generally will not give you cover against BIPA claims. There are, of course, technical measures you could implement to avoid processing biometrics in Illinois. You could block by geolocation or by the address on the users’ identity documents during the IDV onboarding process. Neither are perfect, and neither has been tested in court. If you think a user is an Illinois resident, then you have to meet the letter of the BIPA law.

Europe has its own protections

Compared to BIPA, GDPR is a much more flexible and risk-based bit of regulatory policy. There is not space in this blog to cover all the requirements of the GDPR, but the following are some recent highlights:

To process biometric data, you are expected to have documented and considered all the risks involved. In practice, this requires you to complete a Data Protection Impact Assessment (DPIA). To help you do this, you can ask your IDV provider for a copy of their DPIA. Do not take no for an answer, as the IDV vendor should be able to provide you with one that does not contain proprietary information. After all, under the GDPR you are the controller and your processors (the IDV provider) should be able to tell you what they are doing with the data of your customers.
EU regulators are getting hotter on the use of data to train algorithms; this is what the rest of this blog covers.

The use of personal data requires consent

Obtaining sufficient real-world data for the training of machine learning algorithms is tough for a variety of reasons. For starters, it often requires explicit consent and the ability to manage deletion requests. Furthermore, many people aren’t reachable online, or if they are, they’re not willing to give their data to some private company.

It’s especially difficult to legally obtain consented biometric data. If you’re striving to reduce algorithmic bias, it’s even more expensive or impossible to get data on hard-to-reach groups like, for example, black women over the age of 60.

There are various datasets available for free online, but there are two serious compliance issues involved with using that information. The first is that the IDV provider will often have no idea if the subjects in the dataset have properly consented. A few years ago, a couple of US tech giants got into legal hot water because they used free datasets to train their face matching algorithms; it turned out the data was all scraped from Flickr without any of the users’ consent. The second problem is that datasets are normally marked as not for commercial use.

Many IDV vendors have a circular attitude to personal data. They ingest the data to perform the identity checks, but then they may retain or immediately use that personal data to train their own algorithms (a “secondary purpose”). That latter purpose is for its own benefit and not yours as the client. It also has dubious legality.

Two options, neither perfect

To legally use and process data for this secondary purpose, the IDV provider has to either (i) adopt the position that it is something that you, the data controller, would want or has given express permission to the IDV provider to do, or (ii) obtain consent from the end user as a data controller in their own right.

The thing is, both approaches are fraught with legal issues.

As a data controller, you have a responsibility to ensure that you only collect and use personal data as much as you need to for your purpose, namely, verifying the identity of a person. It’s a stretch for your IDV provider to argue that the purpose includes allowing your IDV provider to improve its systems. Worse yet, it is your company that’s on the hook if that argument does not work. In other words, your IDV provider gets all the benefit and you assume most of the risk.

Claiming that the IDV vendor itself is a controller somehow is also potentially problematic. To properly obtain consent, it must be collected in both your name (to perform the biometric match) and the IDV provider’s name (to train their algorithm). Does the consent screen you present to your end users make this clear?

If relying on the lawful processing ground of “legitimate interests” (and not collecting express consent), then the IDV vendor still needs to inform the end user that their data is being used for training purposes and give them the right to opt out at the point of collection and at anytime later on—and it’s not likely that your customers will appreciate their rights having come through the journey. Also, of course, to use biometric data for training, your IDV provider cannot rely on legitimate interests and must collect consent.

So what, then, is the solution? IDV vendors need the data to create the best product possible, and you want your vendor to provide a quality service as well as be truthful with your customers.

Using synthetic data

The answer is to use an IDV vendor that does not rely on your end users’ personal data for training—or indeed any personal data whatsoever for training. The next generation of providers are training their algorithms entirely on synthetic data, also known as generative adversarial networks (GANs).

There are two major advantages in using GANs in this way:

You completely avoid the privacy or ethical concerns outlined above, and
The IDV provider can react very quickly to a new attack vector by spinning up, literally overnight, tens of thousands of GANs that recreate the threat.

Questions to ask your IDV provider:

Please show me your consent screen and show me your legal advice that it complies with BIPA and CUBI (the Capture or Use of Biometric Identifier Act, Texas’ less scary version).
Where can I read your public “biometric processing statement”?
Do you use the data of Illinois residents in any form of training?
Can you send me your Data Protection Impact Assessment (DPIA) covering the service you are selling to me?
How have you trained your algorithms?
Will you be reusing personal data from our end users to train your algorithms, and if so, what is your lawful processing ground?
From where do you source your training data?
How have you, or the source of your data, collected express consent from the data subjects?
Can you prove to us that you have that consent?
How can a person withdraw consent if they later change their mind?

About the post:
Images are generative AI-created. Prompt: Three little pigs, one sleeping, one running, one working on laptop. Tools: Craiyon (fka DALL-E Mini), ChatGPT.

About the author:
Peter Violaris is Global DPO and Head of Legal EMEA for IDVerse. Peter is a commercial technology lawyer with a particular focus on biometrics, privacy, and AI learning. Peter has been in the identity space for 6 years and before that worked for London law firms.