Threat Analysis

Building Trustworthy AI: Contending with Data Poisoning

by Nisos | Aug 1, 2024 | Research

Executive Summary

As Artificial Intelligence (AI) and Machine Learning (ML) systems are adopted and integrated globally, the threat of data poisoning attacks remains a significant concern for developers and organizations deploying AI technologies. This paper will explore the landscape of data poisoning attacks, their impacts, and the strategies being developed to mitigate this threat.

Key Findings

The field of AI security is rapidly evolving, with emerging threats and innovative defense mechanisms continually shaping the landscape of data poisoning and its countermeasures.
Data poisoning attacks are capable of compromising AI/ML model performance, introducing biases, or creating backdoors for malicious exploitation of AI/ML systems.
There are diverse types of data poisoning attacks, ranging from mislabeling and data injection attacks to more sophisticated techniques like split-view poisoning and backdoor tampering.
Real-world examples, such as the attacks on Google’s Gmail spam filter and Microsoft’s Tay chatbot, demonstrate the practical risks and potential consequences of data poisoning.
Data poisoning attacks can have far-reaching impacts, affecting critical systems in healthcare, finance, autonomous vehicles, and other domains, potentially leading to significant economic and societal consequences.
Mitigation strategies against data poisoning range from robust data validation and sanitization techniques to advanced monitoring and detection systems, adversarial training, and secure data handling practices.

Introduction

AI and ML systems are increasingly and rapidly being adopted across various sectors, from healthcare and finance to autonomous vehicles and social media. As these technologies continue to evolve, threat actors are already seeking to adapt to, and exploit new vulnerabilities. One of these vulnerabilities is data poisoning.

Data poisoning is when a threat actor intentionally compromises a training dataset used by an AI or ML model to manipulate or degrade the model, or introduce specific vulnerabilities for future exploits (see source 1 in appendix). These attacks can cause AI systems to make wrong decisions, exhibit bias, or even fail completely. As organizations increasingly rely on AI/ML systems for critical decision-making processes, the threat of data poisoning attacks becomes more urgent.

Modern deep learning models are trained on massive datasets, often containing billions of samples automatically crawled from the internet (see source 2 in appendix). While this scale has enabled significant advancements in AI capabilities, it has also introduced new vulnerabilities. Poisoning even a minuscule fraction (as little as 0.001%) of these large, uncurated datasets can be sufficient to introduce targeted mistakes in a model’s behavior (see source 3 in appendix).

As AI systems become more integrated into our daily lives and critical infrastructure, the potential impact of these attacks grows exponentially. As the industry shifts to smaller, more specialized models, this attack surface will only increase. Additionally, as training cycles shorten, threat actors’ ability to poison datasets will only become easier. From compromising autonomous vehicle safety systems to manipulating financial algorithms, the consequences of successful data poisoning attacks can range from financial losses to threats to human life (see source 4 in appendix).

Poisoning as little as 0.001% of AI datasets can be sufficient to introduce targeted mistakes in a model’s behavior

Evolution of Data Poisoning Attacks

As AI/ML systems have become more sophisticated and widely adopted, so too have the methods used to attack them. Early forms of data poisoning were relatively simple, and often involved the injection of mislabeled data into training sets. However, as AI/ML models became more complex, threat actors developed more sophisticated, targeted, and undetectable techniques. These may involve subtle manipulations of training data that cause specific misclassifications or introduce backdoors into models for future exploitation, without disrupting the performance of the model (see source 5 in appendix).

Types of Data Poisoning Attacks

Threat actors use a variety of methods to execute data poisoning attacks. We have captured various types and examples in the table below to highlight the complexity and diversity of threats facing AI/ML systems. Understanding these attack vectors is crucial for developing comprehensive defense strategies and ensuring the integrity and reliability of AI-driven decision-making processes.

Type of Attack

Description

Example

Mislabeling Attack

A threat actor deliberately mislabels portions of the AI model’s training data set, leading the model to learn incorrect patterns and thus provide inaccurate results after deployment. This type of attack is particularly effective against supervised learning algorithms, where the model learns to map inputs to outputs based on labeled examples.

A threat actor could mislabel numerous images of dogs as cats during the training phase and teach the AI system to mistakenly recognize dogs as cats after deployment.

Data Injection Attack

Data injection attacks involve introducing entirely new, malicious data samples into training data sets. These injected samples are carefully crafted to bias the model’s decision boundaries or create vulnerabilities that can be exploited later.

A threat actor could insert carefully crafted malicious images into the training dataset of a computer vision system, causing it to misclassify military tanks as civilian vehicles.

Data Manipulation Attack

Data manipulation involves altering data within a model’s training set to cause the model to misclassify data or behave in a predefined malicious manner in response to specific inputs. This can include altering feature values or making subtle changes that are difficult for humans to detect but significantly impact the model’s learning process.

In a facial recognition system, a threat actor could slightly alter pixel values in images of a specific individual, causing the model to misidentify that person consistently.

Backdoor Attack

Threat actors can plant a hidden vulnerability—known as a backdoor—in the training data or the ML algorithm itself. The backdoor is then triggered automatically when certain conditions are met. Backdoor attacks are particularly dangerous as an affected model will appear to behave normally after deployment.

A threat actor could perform a backdoor attack by inserting a small, specific pattern into traffic sign images that causes an autonomous vehicle’s vision system to misclassify stop signs as speed limit signs when the pattern is present.

To obtain the complete research report, including endnotes, please click the button below.

DOWNLOAD PDF

About Nisos®

Nisos is the Managed Intelligence Company. We are a trusted digital investigations partner, specializing in unmasking threats to protect people, organizations, and their digital ecosystems in the commercial and public sectors. Our open source intelligence services help security, intelligence, legal, and trust and safety teams make critical decisions, impose real world consequences, and increase adversary costs. For more information, visit: https://nisos.com.

Threat Monitoring

OSINT Monitoring & Analysis