AI and AI wannabes are everywhere! And AI can be a powerful tool to move your business forward, as long as it is developed and used properly. And for CEO’s and Board members who are responsible for oversight of key practices at your company, the AI buck stops with you!
New rules and guidelines are starting to propagate and you may be familiar with specific bans on facial recognition in the U.S., or the European Union’s recently announced proposed AI regulations. You may even have heard of the DoD’s “Ethical Principles for Artificial Intelligence” adopted in 2020, and reaffirmed in early 2021, for developing and implementing A.I. in a responsible, equitable, traceable, reliable, and governable manner. These are big, but arguably high level, principles and regulations. Soon, however, regulators are going to get serious about much more regulation on AI, and particularly on one of the most challenging parts of AI, which is gathering and curating high quality, appropriately attained training data for training your AI system.
Here’s an example of what not to do. This query from a journalist from an online enterprise AI publication came across my desk. Except where noted, the text is exactly as received, emphasis is mine.
Summary: Webscraping for AI
Name: George [Last Name Redacted ]
Category: High Tech
Media Outlet: [Publication Name Redacted]
Query: I am working on a story for [Publication Name Redacted] looking at how web scraping with Python and other coding languages can assist with data collections. I am looking for sources that can weigh in on the following via email by [date redacted]… –What are some of the top use cases you see for web scraping as part of AI development and what role do Python or other coding languages play? -What are some of the different ways that webscraping can augment data? -What are the challenges and limitations of doing it well, and some of the best practices for addressing these? –What are some of your favorite tools for using webscraping as part of AI development and why? Thanks in advance, George
Frankly, as someone in the AI business, I was shocked. The ASSUMPTION of this writer was that companies were, simply put, regularly stealing individual data and using it to train their AI systems. Likely he assumed that most AI companies were following the well known, but uninspiring example, of Clearview AI. A recent, extensive BuzzFeed investigation included this description of the company.
The New York City–based startup claims to have amassed one of the largest-known repositories of pictures of people’s faces — a database of more than 3 billion images scraped without permission from places such as Facebook, Instagram, and LinkedIn. If you’ve posted images online, your social media profile picture, vacation snapshots, or family photos may well be part of a facial recognition dragnet that’s been tested or used by law enforcement agencies across the country.
Buzzfeed continued their description of Clearview AI’s method of obtaining training day with, “Clearview has touted its software as the “world’s best facial-recognition technology,” but its most novel innovation is doing what no other company has been willing to do: rip billions of personal photos from social media and the web without permission or consent.”
Obviously at least some companies are following suit. What is perhaps at least as disturbing is that existing databases of individual data that was collected for one specific purpose – for example for receiving healthcare, applying for a loan, opening a bank account etc, may be being used, many would say abused, by companies using that data, WITHOUT WRITTEN PERMISSION, to train their AI systems. Why would written permission be expected? Read on to see what’s coming your way.
Utah’s Office of the State Auditor convened a commission last year which has also recently added to the mix of issues for companies to address, with new guidelines for Utah government entities as they contemplate acquiring advanced technologies. That’s right, these are guidelines for GOVERNMENT entities to follow when procuring advanced technology, particularly relating to Privacy and AI, not just focused on companies which collect consumer data.
I believe that these types of guidelines will migrate into best practices that will eliminate some companies from doing business with government entities. If you are a founder, CEO or member of a Board of Directors for a company pushing the state of the art in technology, this is for you!
These guidelines came out in the form of a set of principles with a companion set of questions that state agencies and Utah government entities should ask themselves as they look to procure advanced technology solutions and are particularly focused on Privacy and AI. According to the Office’s press release, these “…documents are intended to help government entities with their procurement of advanced software technologies that have the potential to impair the privacy of Utahns or could lead to discrimination against them.”
At least at this time, Utah is apparently one of only a handful of U.S states which currently have or are publicly working on such expansive rules for government entities to follow, but you can bet that more such regulations are coming (apparently other states are asking about what Utah has come up with!)
Why should it matter to CEOs and Corporate Boards?
- First, if your business sells to or does business with a government entity, these same guidelines may be coming to a client near you!
- Second, these are strong practices that, in some form, are highly likely to percolate through society and the economy as a whole. Once standards like this are being put in place somewhere, clients, citizens and government officials will begin to realize that they have value to society as a whole.
- High level guidelines as discussed above will eventually have to get down to the nitty gritty that Utah has focused on.
- If your company’s vendors don’t match up to these high standards, the trouble could affect your business and potential contacts.
So, if your company can’t meet these requirements today, don’t rest comfortably just because you don’t do business with Utah or don’t currently have government customers.
I predict these standards will become the price of entry for businesses using private information and AI.
We’ll just talk today about the Auditor’s concerns about AI.
Governments collect a lot of information about their citizens which Utah has recognized as a potential risk for privacy and a potential way to systematize discrimination and bias. Utah’s State Auditor has put a stake in the ground with these new principles, that the government has a duty to protect the privacy of its citizens and has an obligation to prevent discrimination and bias from being built into government systems.
Below are excerpts from the Principles highlighting the new standard as agencies and entities evaluate potential advanced technology vendors with AI-centric solutions. (Numbers are references from the original document)
Would your firm be able to pass this review?
6. Perform In-Depth Review of AI/ML Algorithms: All claims of AI or ML should be clearly validated and explained, including:
a. AI algorithms used in the software application
b. model training and evaluation procedures used
c. model outputs utilized by product / feature
d. source and composition of the training, validation and test datasets
e. demonstration of having a license to use those datasets
f. pre- and post-processing methods applied to data
g. processes for model lifecycle management, including on-going evaluation
The output of an AI-based software application may still have issues of bias or lack of fairness, even if the inputs and system are reasonably judged not to include such failings. The output of the software application should be monitored to ensure protection of privacy and avoidance of improper or unexpected bias.
8. Review Steps Taken to Mitigate Discrimination: Ensure that the vendor has considered the question of bias and discrimination within their software application and that the vendor has mechanisms, such as audit results, to demonstrate that their software application does not disproportionately affect various categories of individuals, particularly any federally protected class.
a. For example, consider sources of data that may include implicit or historic bias (e.g., distribution of video cameras by region or neighborhood)
b. For example, consider how model choice and training may introduce bias.
c. Understand the interpretation of model output.
d. For AI-based or ML-based software applications, determine whether the source of training and model data has been evaluated by subject matter experts.
e. Entity should use best in class models for evaluation to prevent discrimination, particularly in the case of biometric analysis or facial recognition. As an example, the U.S. NIST provides evaluations of the accuracy of facial recognition based on demographic differences.
9. Determine Ongoing Validation Procedures: The government entity must have a plan to oversee the vendor and vendor’s solution to ensure the protection of privacy and the prevention of discrimination, especially as new features/capabilities are included.
10. Require Vendor to Obtain Consent of Individuals Contained Within Training Datasets: Many biometric characteristics may be captured without an individual’s knowledge or consent. Examples may include facial recognition or gait analysis. Ensure that a vendor has consent from the individuals whose biometric characteristics are used in training datasets.
10.1 Does vendor have the permission of every individual whose information is contained within its training, validation and test datasets?
10.1.1 Is there any risk that data in its dataset(s) has been “scraped” or otherwise gathered from online sources without the permission of those whose information is included and/or without permission of the owners (who otherwise have permission to use the data)? Have vendor provide credible confirmation of these permissions.
Let me just stop here, at number 10. Government agencies and entities in the State of Utah should require a potential vendor, perhaps your business, to have written consent of every individual whose data is containing in the training dataset! Can you pass that test?
This is a high bar, perhaps even higher than would have been expected given the press inquiry at the beginning of this post. But even well-intentioned and well managed companies may not be able to meet it. At the same time, if my instincts are right and these types of guidelines will become standard across government and non-government clients, the time to start setting these expectations internally is right now, both for technology your company develops, and technology that your company uses!
Finally, if you are a member of a Board of Directors concerned about corporate governance and best practices, or you are a CEO who wants to build a truly great company, are you requiring your vendors to follow these same guidelines? The State of Utah has done a lot of the work for you, why don’t you include these best practices in your own procurement process to make sure you are ahead of the curve in your own use of private data and artificial intelligence?