Data Poisoning is a category of adversarial attack targeting the training phase of machine learning systems. By injecting malicious, mislabeled, or carefully crafted data into training datasets, attackers can manipulate model behavior, create hidden backdoors, reduce model accuracy, or introduce targeted biases.
Types of data poisoning attacks include: label flipping (changing the correct labels of training examples to cause misclassification), backdoor attacks (inserting trigger patterns that cause specific model behavior when present in inputs), clean-label attacks (adding correctly labeled but strategically chosen data that shifts decision boundaries), and gradient-based poisoning (optimizing poisoned samples to maximally impact model training).
Attack vectors for data poisoning include: compromising data collection pipelines, contributing poisoned data to crowdsourced datasets, manipulating web-scraped training data, insider threats in data labeling teams, supply chain attacks on pre-training datasets, and exploiting data augmentation or synthetic data generation processes.
Enterprise defense strategies include: data provenance tracking (maintaining chain of custody for all training data), data validation and anomaly detection (identifying statistical outliers in training datasets), robust training methods (algorithms that are resistant to poisoned data), data sanitization (filtering suspicious samples before training), model behavior testing (validating model outputs across diverse scenarios after training), and supply chain security (vetting data sources and labeling providers). Data poisoning is particularly concerning as organizations increasingly rely on third-party and open-source training data.
