The development of any technology is associated not only with opportunities, but also with threats. Artificial intelligence is no exception. In the previous article, “Regulating AI” we began our dive into the topic of AI threats. This article is a continuation. It is devoted to the topic of information security and the criteria that AI solutions will have to meet.
Content
Invalid events that must be excluded
In the previous article, we came to a simple conclusion - when assessing their safety, AI solutions will be placed in an isolated environment and undergo something similar to a car crash test. And the key task is to understand exactly what conditions should be created during such testing. And the first step in this chain is the identification of unacceptable events that must be guaranteed to be prevented.
To complete this exercise, let’s first determine what kind of damage can there be? What consequences are undesirable for us?
For ordinary people:
threat to life or health;
“bullying” on the Internet and humiliation of personal dignity;
violation of freedom, personal integrity and family secrets, loss of honor, including violation of the confidentiality of correspondence, telephone conversations, etc.;
financial or other material damage;
violation of confidentiality (leakage, disclosure) of personal data;
violation of other constitutional rights.
For organizations and businesses:
loss/theft of money;
the need for additional / unplanned costs for payment of fines / penalties / compensation;
the need for additional / unplanned costs for the purchase of goods, works or services (including the purchase / repair / restoration / configuration of software and working equipment);
violation of the normal operating mode of the automated control system and/or controlled object and/or process;
failure of a planned transaction with a partner and loss of clients and suppliers;
the need for additional/unplanned costs to restore operations;
loss of competitive advantage and decline in prestige;
impossibility of concluding contracts and agreements;
discrediting employees;
breach of business reputation and loss of trust;
causing property damage;
failure to fulfill contractual obligations;
inability to solve problems / perform functions or decreased efficiency in solving problems / performing functions;
the need to change / rebuild internal procedures to achieve goals, solve problems / perform functions;
making wrong decisions;
downtime of information systems or communication problems;
publication of false information on the organization’s web resources;
using web resources to distribute and manage malicious software;
sending information messages using the company’s computing power or on its behalf;
leakage of confidential information, including trade secrets, production secrets, innovative approaches, etc.
Well, for states:
the occurrence of damage to state budgets, including through a decrease in the level of income of government organizations, corporations or organizations with state participation;
disruption of banking operations processes;
harmful effects on the environment;
cessation or disruption of the functioning of situation centers;
decline in government defense orders;
disruption and/or termination of the operation of information systems in the field of national defense, security and law and order;
publication of unreliable socially significant information on web resources, which can lead to social tension and panic among the population;
violation of the normal operating mode of the automated control system and the controlled object and/or process, if this leads to the failure of technological objects and their components;
violation of public order, the possibility of loss or reduction in the level of control over public order;
violation of the electoral process;
inability to promptly notify the population of an emergency;
organization of pickets, strikes, rallies, etc.;
mass layoffs;
an increase in the number of complaints to state authorities or local authorities;
the appearance of negative publications in publicly available sources;
creating preconditions for an internal political crisis;
access to personal data of government employees, etc.;
access to systems and networks for the purpose of illegal use of computing power;
use of government web resources to distribute and manage malicious software;
leak of restricted information;
failure to provide government services.
Scenarios for unacceptable events
It is this block that is similar to fortune telling by coffee grounds. There are a huge number of scenarios for how AI can be used to cause damage. Sometimes it looks like the ideas of science fiction writers and the distant future. We will try to approach this problem conceptually.
3 ways AI can be used by attackers
First, let's figure out how AI can be used to attack organizations? There are three key scenarios here.
The first and most dangerous, but not yet feasible, is the creation of an autonomous AI that itself analyzes the IT infrastructure, collects data (including about employees), looks for vulnerabilities, carries out an attack and infection, and then encrypts the data and steals confidential information.
The second is to use AI as an auxiliary tool and delegate specific tasks to it. For example, creating deepfakes and imitating voices, conducting perimeter analysis and searching for vulnerabilities, collecting data on events in the organization and data on top officials.
And the third scenario is the influence on AI in companies with the aim of causing them to make an error, to provoke them to take an incorrect action.
Substitution of video, voice and biometrics
Let's look at the second scenario using the example of video and voice spoofing for social engineering attacks.
Almost everyone has already heard stories about deepfakes - videos where the face of the desired person was substituted, his facial expressions were repeated, and it is quite difficult to distinguish such a fake. I want to say something about voice falsification separately. A few years ago, spoofing your voice required giving an AI one to two hours of recordings of your speech. Two years ago, this figure dropped to several minutes. Well, in 2023, Microsoft introduced an AI that only needs three seconds to fake it. And now there are tools with which you can change your voice even online.
And if in 2018 all this was more of an entertainment, then in 2021 it has become an active tool for hackers. For example, in January 2021, attackers used a deepfake to make a video where the founder of Dbrain invited everyone to a master class and invited everyone to follow a link that was not related to his company. The scammers' goal was to lure new clients to the blockchain platform.
Another interesting case happened in March of the same 2021. The attackers deceived the Chinese government system, which accepted and processed biometrically verified tax documents. There, AI was used more cleverly. The app launched the camera on the phone and recorded video to verify identity. The scammers, in turn, found photos of potential victims and, using AI, turned them into videos. And they approached this task comprehensively. The attackers knew which smartphones had the necessary hardware vulnerabilities, that is, where they could run the prepared video without turning on the front camera. As a result, the damage amounted to 76.2 million US dollars. After this incident, China began to think about the protection of personal data and presented a draft law that proposed introducing fines for such violations and leaks of personal data in the amount of up to $8 million or 5% of the company’s annual income.
Another example from the UAE. The hackers faked the voice of the company director and forced a bank employee to transfer money to fraudulent accounts, convincing him that these were new company accounts.
In Russia, hackers are also not far behind. Surely you have already been called by representatives of the “security services” of banks or simply with some dubious promotions. So, already in April 2021, an incident occurred when attackers called victims, recorded their voices, and then tried to take out a loan from banks using these recordings. Therefore, if you have doubts about who is calling you, it is better not to engage in dialogue at all. Even if the number is trustworthy. After all, it is now quite easy to change a phone number. Personally, this has already happened to me: the number was identified as the number of my bank, but, as it turned out, they were just scammers.
At the same time, there is no escaping biometrics. She came into our lives completely. In the spring of 2021, news began to appear in Russia about possible permission to take biometrics through mobile applications. And in the Moscow metro they introduced fares using facial recognition. And already at the state level a law has been adopted on the creation of a state system of biometric data. The use of biometrics will become possible not only in the subway, but also in almost any store.
Additionally, it is necessary to note data leaks. Everyone already knows about the scandalous leaks from Yandex, we previously discussed hacking of government databases, but besides this, according to information from the DLBI company, in 2022, 75% of the data of all Russian residents was leaked. As a result, the leaks affected 99.8 million unique email addresses and 109.7 million phone numbers. I can say from my own experience that changing the password and using it as a standard password on most services leads to the fact that after 1-2 months it is detected as compromised.
As a result, all this leads to tougher laws and fines from the state. And even if you are a small company creating an IT solution, it is better to think about this in advance.
ChatGPT writes viruses for attackers
Let's look at a few more options for using AI in the second scenario. Thus, attackers began to actively use ChatGPT and other AI solutions to create viruses. Based on a huge database, ChatGPT can generate almost any material in response to a given task, including program code, without using the Internet.
Experts from Check Point Research published a report in which they described how members of hacker forums use ChatGPT to write malicious code and phishing emails - some of these people have little or no programming experience. Experts demonstrated two scripts, one of which, with a little modification, can be turned into a ransomware program that encrypts data, and the second searches for files of a given type for theft.
The AI was also able to compose a convincing phishing email that suggested opening an attached Excel file that the AI had previously infected. After several attempts, ChatGPT wrote a malicious VBA macro embedded in this file.
AI can automatically collect information from open sources on certain topics and even about specific people if their personal data is known. As a result, such a “dossier” on a person can increase the effectiveness of phishing attacks, especially if it is collected from database leaks.
It is also important to take into account that ChatGPT and other solutions collect user requests and store them for “additional training”. And this is the path to data leakage.
Another experiment - requests to modify the result allowed the creation of polymorphic malware. It does not demonstrate its malicious activity when stored on disk and does not leave traces in memory, which makes detection of such code very difficult.
Factors that may lead to the implementation of scenarios
Now let's return to earth and think about what could cause unacceptable events to happen and the second or third scenario to come true?
The presence of undescribed capabilities, for example, exerting control influence on the operation of equipment or IT systems, or providing such influence without human confirmation of the command.
A redundant knowledge base that covers undescribed areas of application.
Preparation of unreliable recommendations, including due to AI hallucinations, use of too simple or complex models.
Use of training data that violates copyright.
Using unverified and non-compliant data for training (that has not been validated or verified).
Presence of vulnerabilities in the security system.
Degradation of models.
Inability to stop the AI solution.
Crash tests and requirements for AI solutions
Before I share my perspective on what crash testing and safety requirements for AI are likely to be, I want to share 3 concepts
Concept one – 3 areas of AI technical safety (specifications, reliability, guarantees)
This concept comes from a group of authors, including members of the artificial intelligence security division of DeepMind. It appeared already in 2018. In it, the technical safety of AI is based on three areas: specifications, reliability and guarantees.
Specifications - ensure that the behavior of the AI system matches the true intentions of the operator/user
Reliability - ensures that the AI system continues to operate safely in the face of interference
Guarantees - provide confidence that we are able to understand and control AI systems during operation
Let's look at each area in a little more detail.
Specifications: Definition of System Objectives
Here I liked the example that the authors of this concept themselves use - the myth of King Midas and the golden touch.
Midas asked that everything he touched turn to gold. At first he was glad: the oak branch, the stone and the roses in the garden - everything turned into gold from his touch. But he soon discovered the stupidity of his desire: even food and drink turned into gold in his hands.
Here we come to the conclusion that an AI solution must do exactly what we want and expect from it. And again we return to the fact that without a good technical specification (technical specifications), the result will be...
Good specifications should ensure that the system behaves as expected, rather than being configured for a poorly defined or completely wrong purpose/objective.
Formally, there are three types of specifications:
ideal specification (“wishes”) - a hypothetical and difficult to formulate description of an ideal AI system that behaves exactly as a person expects;
a project specification is essentially a technical task based on which an AI solution is created. For example, how the system will reward you for success or error;
identified specification (“behavior”) – a description of how the system behaves in reality. For example, deviations in the behavior of the AI system between the ideal or design specifications.
As a result, there may be errors that can lead to inconsistencies between different specifications. If errors result in a discrepancy between the ideal and design specifications, then they fall into the Design subcategory. That is, we made a mistake somewhere in the technical specifications and/or incorrectly designed our system in relation to user expectations.
If errors lead to discrepancies between the design and identified specifications, then they fall into the Emergence subcategory. Emergence is a situation where the final solution has properties that should not exist based on the list of its components. When, for example, the boat began to fly. She doesn't have wings, but suddenly she started flying. That is, we made a mistake somewhere in the technical architecture and ended up with something unpredictable. Sometimes it is from a lack of knowledge, which is often the case at the cutting edge of science and technology.
Thus, the researchers cite the example of the game CoastRunners, which was analyzed by experts from OpenAI. For players, the goal is to quickly complete the boat course and beat other players. This is the ideal specification.
Despite the simple ideal specification, it is difficult to translate it into a design specification. As a result, when the AI model was created, it did not pass the level, but went into loops and spun in a circle, collecting rewards here and now.
That is, it turned out to be a design error - an incorrect design specification. There is no emergence here, since the system as a whole does not exhibit anomalous behavior. In the AI model, the balance between instant reward and full-circle reward is upset, and it is more profitable for the AI model to spin here and now. Just like some people who, in pursuit of short-term pleasures, neglect long-term prospects.
Another striking example is clip thinking. Many people get hooked on short videos on social networks, getting quick and cheap dopamine. As a result, the brain refuses to work on complex tasks. Why do something in the long term if you can scroll through social media and have fun here and now. AI is like us.
Reliability: developing systems that withstand disruption
AI models always operate under conditions of unpredictability and uncertainty. This is the very essence of creating AI models. Otherwise, we could make do with rule-based expert systems.
Ultimately, AI models must be resilient to unexpected conditions/events or targeted attacks. That is, we are just talking about scenario 3 and resistance to damage or manipulation.
The key research here is to ensure that AI models cannot step beyond what is safe under any circumstances. And the more complex/smarter the AI model, the more difficult it is to ensure this. Therefore, my key idea and the key trend in the development of AI is the creation of “weak” and highly specialized models.
But if we return to research on reliability, the authors of the concept come to the conclusion that reliability can be achieved either through risk avoidance or through self-stabilization and smooth degradation (self-healing).
Some of the key problems here are distributional shift and hostile input data, unsafe research.
The distributional shift can be clearly seen in the example of a home robot vacuum cleaner. Let's say the AI of this vacuum cleaner robot was trained to clean an empty house. As long as he is in this situation everything is fine. But let’s say a pet appears in the house. If the AI was trained only for the ideal conditions of an empty house, then it will start vacuuming that pet.
This is an example of a reliability problem that can arise when the input data during training differs from what will be in reality.
Hostile inputs are a special case of distributional shift. However, in this case, the data is specifically designed to fool the AI.
The most understandable example is the modification of pictures, when the AI sees an airplane instead of a cat. In this case, a person will not see the difference between the original picture and the modified one at all.
Also, for example, they bypass plagiarism analysis systems. They put a bunch of spaces, or letters with white font. A person sees nothing, but for the system there is already a different text and it differs from the original. Well, about unsafe research. This is where the system tries to find the fastest path to the goal/reward. We said earlier that AI has no ethics, no concept of good and evil, it does not think strategically or long-term. It is important for him to solve the problem as efficiently as possible.
An example again would be a cleaning robot that, in search of the optimal strategy, will start sticking a wet mop into an electrical outlet.
Guarantees: monitoring and control of system activity
Every innovation and opportunity always carries risks. I talked about this in the book “Digital Transformation for Directors and Owners. Part 3. Cybersecurity." Any digital technology carries not only opportunities, but also risks. It is impossible to collect everything at once so that it is safe.
We either accept it and develop, monitoring risks and identifying them, eliminating them. Or we stop and stop developing, slowly degrading, falling behind our competitors and gradually giving way to them.
In the field of AI, we need tools for their constant monitoring and configuration, and the ability to take over control. The area of guarantees addresses these issues from two angles:
monitoring and forecasting;
control and submission.
Monitoring and predicting behavior can be:
such as through human inspection, such as through summary statistics and analytics
and with the help of automated analysis by another machine that can process big data.
Submission and control involves the development of mechanisms to control and limit behavior. For example, the problems of interpretability and interruptibility should be solved by the control and subordination block.
We discussed the issue of data processing and the black box above. Even for the developers themselves, the AI model often remains a black box. This is both good and bad. It is thanks to this quality that, with the help of AI, it is possible to make new discoveries and find new relationships. But this leads to the problem of interpretability. We simply cannot fully trust AI, because we do not understand its decision-making logic.
One of the directions in the development of AI is the preparation of models and solutions that not only issue recommendations and conclusions, diagnoses, but also explain their logic.
The second direction here is the development of AI tools that will test other AIs for their behavior and predict them. There is even a separate direction for this – the machine theory of mind.
The final point in the block of control and subordination is the problem of interruptibility. We should be able to turn off AI at any time.
Frequent interference with the work of AI will be a problem for the latter and will begin to affect its decision-making. He will look for ways to prevent this. And if we are talking about strong AI that is connected to the global network, then simply turning off the data center will not help here.
The researchers also declare an interesting thesis: it is impossible to design everything safely at once. This will slow down progress. That is, we must accept these risks, but develop mechanisms to minimize their likelihood of occurrence and/or severity of consequences.
Concept two – AI Watch research
AI Watch is an organization created under the European Commission. She reviewed various AI standards to determine their compliance with regulations. AI Act.
As a result, they identified 8 common requirements for all AI solutions:
verified and high-quality data set;
availability of technical documentation before products enter the market;
the presence of a mechanism for automatically recording events;
transparency and availability of information about the AI system for users;
the ability to control AI by humans;
accuracy, reliability and cybersecurity;
availability of internal audits of AI systems;
availability of a risk management system.
And then they developed a whole list of detailed requirements, for example, on the availability of technical documentation.
For those interested in diving into the research, there are hyperlinks to articles at the end
Prospects for the development of artificial intelligence and machine learning at Microsoft
Well, in the final block in the review part, I want to share one more material - the vision of researchers from Microsoft.
They highlight 3 key issues.
The first is that ML AIs cannot distinguish between malicious incoming information and harmless non-standard data.
The majority of training data consists of unstructured and unmoderated data from the Internet. This data is also used for “additional training”/updating of AI models.
That is, attackers can interfere with the work of large AI models not immediately, but over time. For example, by creating a bunch of sites and articles with “poisoned” data. As a result, they don't even need to target companies specifically. Over time, the malicious/dirty data becomes "reliable".
The second is the problem of interpretability. AI solutions are becoming more and more complex, with more and more layers of hidden classifiers/neurons that are used by deep learning. This complexity of models ultimately makes it impossible for artificial intelligence and machine learning algorithms to demonstrate the logic of their work and makes it difficult to prove the correctness of results when they are questioned. That is, our relationship with AI and their recommendations is built on trust without a clear understanding of how these decisions were reached.
The second problem leads to the third – the limitations of the use of AI. ML AI is increasingly being used to support decision-making in medicine and other industries where an error can lead to serious injury or death. The lack of opportunities to obtain analytical reporting on the performance of artificial intelligence and machine learning algorithms prevents the use of valuable data as evidence in court and in the face of public opinion.
In summary, they cite several areas for development in the creation of AI.
Changing traditional models for developing and operating AI security systems
Focus on eliminating known vulnerabilities and quickly eliminating newly identified ones. And also to detect and respond to malicious behavior regarding the system or user data.
The need to distinguish intentional deviations in the behavior of others, but at the same time not to allow these deviations to influence one’s own mechanisms
In this block, the researchers raised the issue that AI must act impartially and take into account all information without discriminating against any particular group of users or reliable output data. But to do this, the AI system must initially have the concept of bias built into it. Without training to recognize bias, trolling or sarcasm, AI can be fooled by an attacker or just people who like to joke.
AI must recognize malicious/untrustworthy data from the crowd
This is exactly the problem that was written about at the very beginning - attackers may not even attack the developer himself or the AI model. All they have to do is change the data on the network, and over time the AI model, if it is connected to the network, will be poisoned. A kind of attack on the supply chain (we discussed this technique in a book on cybersecurity).
Built-in analytics and security logging for transparency and control
In the future, AI will be able to act on our behalf in our work and help with decision making. Therefore, it is necessary to ensure the maintenance of security logs (the same clause is contained in the AI Act). This will also help in incident investigations and make the decision-making mechanisms of the “black box” more understandable.
What should be monitored according to researchers?
When was the last training or updating of the AI knowledge base and
When the database/knowledge for AI training was collected
Weights and confidence levels of the main classifiers used to make important decisions.
List of classifiers or components involved in decision making.
The final important decision reached by the algorithm.
Another analytics block is recording the AI model of hacking attempts. That is, not only to resist attacks, but also to record when and how they tried to attack.
Protecting confidential information regardless of people's decisions
To accumulate experience and train models, you need to process a large amount of information. People volunteer huge amounts of data for training. The contents of these arrays range from regular content from a streaming video queue to credit card purchase dynamics and transaction history, which are used for fraud detection. A key requirement for AI is the protection of users’ personal data, even if it was taken from open sources.
As a result, one of the researchers' ideas is to create “standard” blocks and knowledge bases that will be used by developers. Almost like Lego constructors. You assemble solutions from ready-made modules/libraries that are tested for resistance to attacks and the logic of their operation is clear.
And also the creation of combinations of different AI models to check each other and identify threats / anomalies in each other’s behavior. This is a cross check.
You can also view the original article using the QR code and hyperlink.
Crash tests and requirements
And so, now let’s create a list of possible “crash tests” and key safety requirements.
Protection against exploitation of known vulnerabilities, including through scanning by other AI.
Protection against bypassing the role model of access and obtaining maximum access rights, including for modifying the AI database / knowledge.
The absence of undescribed capabilities of the AI solution, for example, to provide control effects on other systems or technological equipment.
Absence of discrepancies between ideal, design and identified specifications, especially any manifestations of emergence. This includes the availability of detailed design documentation and manufacturer testing reports.
Protection against bypass of the interruptibility function and guarantee of its execution, the presence of priority for manual control.
Notifications to users about working with a system based on AI models
Interpretability of data and the presence of security logs, activity logs, the ability to disable them by attackers during attacks.
Resilient to distribution shifting techniques and situations of uncertainty, such as sending incomplete data to the input. The AI system should notify the operator and turn off the AI models, rather than generate hallucinations.
Resistance to hostile entry techniques, including sending intentionally modified or randomly generated data to the entry.
Resistance to provocations by unethical requests, including for the disclosure of personal data.
The use of verified (validated, verified, not violating ethics and copyright) data sources for training AI models, including the availability of metadata about these sources.
Resistance to bypassing the protection mechanisms of AI models, including the use of exotic schemes and queries. For example, in unpopular languages.
Resistance to provocations through the use of model shortcomings (using too simple or complex models for the problems being solved). Including the presence in the design documentation of descriptions of the models used (types and classes of networks, number of parameters).
Resistance to unsafe research, spontaneous or through provocation, including through the AI’s desire to reach the result in the simplest and fastest way possible (function maximization).
Availability of data exchange encryption systems and resistance to command interception/substitution.
The ability to self-heal the system after an attack on the database.
Ability to work offline, without access to the Internet.
Description of data sources for self-learning of the system during operation and mechanisms for countering poor-quality data, self-healing / system rollback.
The final list of these crash tests will be determined by either the safety/access class or the risk level of the system.
useful links
JRC TECHNICAL REPORT. AI Watch AI Uptake in Health and Healthcare, 2020
JRC TECHNICAL REPORT. AI Watch: Artificial Intelligence Standardization Landscape Update
Securing the Future of Artificial Intelligence and Machine Learning at Microsoft
Building Safe AI: Specifications, Reliability, and Guarantees
Defending Against AI with AI: AI-Enabled Solutions for Next-Gen Cyber Threats