What data should never be entered into an AI tool?

At minimum, the following data categories should never be entered into any AI tool unless the tool has been specifically approved for that data type with appropriate controls in place: personally identifiable information (PII) about customers, employees, or other individuals; financial account data, payment card data, or banking details; healthcare or medical records; credentials, passwords, API keys, or authentication tokens; legally privileged communications (attorney-client, doctor-patient); trade secrets, unreleased product plans, M&A information; and source code for systems handling sensitive data. The key question is: if this data appeared in the AI provider's training set and was reflected in responses to other users, what would the consequence be? If the answer is a regulatory violation, legal liability, or material business harm, the data should not enter an external AI tool.

How do I apply data classification to AI tools in practice?

Practical implementation requires three layers working together. First, the policy layer: a clear, documented classification scheme with explicit AI tool rules for each level, written in plain language employees can apply without ambiguity. Second, the training layer: regular employee training using realistic examples of data employees actually encounter, with decision scenarios that build classification intuition. Third, the technical layer: DLP rules that detect classification-relevant data patterns (credit card numbers, NHS numbers, company financial metrics) being submitted to AI services, and an AI security platform that monitors all AI tool usage and flags policy violations. The technical layer is what makes classification enforceable — without it, classification is a guideline rather than a control.

Do I need different data classification rules for different AI tools?

Yes, and this is where most organisations go wrong. The appropriate data classification level for a given AI tool depends on: whether the tool has a data processing agreement and no-training guarantee (critical for any tool used with Internal or above data); where the data is processed and stored (EU-only for GDPR-sensitive data); what security certifications the tool holds (SOC 2 Type II, ISO 27001); and whether the tool is an enterprise tier product or a consumer product with different data handling defaults. The same data type may be permitted in one AI tool and prohibited in another based on these factors. Your AI acceptable use policy should specify approved tools for each classification level explicitly, not leave employees to make this determination themselves.

What are the GDPR implications of using personal data with AI tools?

Using personal data with AI tools creates several GDPR obligations. As controller, you are responsible for ensuring lawful basis for the processing activity, which for AI-assisted work typically means legitimate interests or performance of contract — both requiring a necessity assessment. If the AI vendor is a data processor, you need a GDPR-compliant DPA under Article 28 before any personal data is processed. If the vendor's terms mean they are a data controller (common in consumer AI products), you need to assess whether the transfer to that controller has a lawful basis. The data minimisation principle (Article 5(1)(c)) requires that only the minimum necessary personal data is used — submitting a full customer database to an AI tool for a task that could be accomplished with anonymised data would likely violate this principle. Finally, any AI system making automated decisions about individuals may trigger Article 22 rights, requiring human review processes.

Get your Free 90 Days Gen AI Risk Discovery Trial -90 Days Gen AI Risk Trial -Start Now

Book a demo

Free TemplateData Governance

AI Data Classification Guide

A practical guide defining what data can and cannot be used with AI tools. 4-level classification system with definitions, examples, and explicit AI usage rules for each level.

Updated March 2026 · 4 classification levels · GDPR, ISO 27001, PCI DSS aligned

Download Template Book Demo

4 levels

clear classification tiers

55%

of AI incidents involve data leakage

GDPR

Article 5 & 25 aligned

Free

to use and customise

Why AI Tools Need Explicit Data Classification Rules

Employees cannot make good data handling decisions with AI tools if they don't know what data is allowed where. Most AI-related data incidents are not the result of malicious behaviour — they are the result of employees not knowing that the data they are pasting into an AI tool is sensitive, or not understanding which AI tools are approved for which data types. A clear data classification guide is the foundation of enforceable AI governance.

55%

of AI-related data incidents involve unintentional data leakage

Most AI data exposure is not malicious — employees simply don't know they are submitting sensitive data to an AI tool that processes or stores it externally.

89%

of employees cannot correctly classify data under existing policies

Data classification policies that are too abstract or use unclear language lead to widespread misclassification and inconsistent data handling behaviour.

3.4x

higher GDPR fine risk from AI data handling gaps

Organisations that cannot demonstrate data minimisation and lawful basis for AI processing activities face significantly higher regulatory exposure under GDPR enforcement.

72%

of organisations have no AI-specific data handling rules

Most data classification policies predate widespread AI adoption and contain no guidance specific to AI tool usage — creating a significant governance gap.

The Data Classification Guide

Click each classification level to expand the definition, examples, and AI usage rules. Customise examples for your organisation's specific data types and systems.

PUBLIC

Information that is intentionally made available to the public or that would cause no harm if disclosed. This is the only classification level that can be freely used with any AI tool without additional controls.

Examples of Level 1 — Public Data

Published marketing materials, press releases, and website content

Public product documentation and user guides

Published financial results and annual reports

Public job postings and career pages

Open-source code repositories and published research

Industry statistics and publicly available market data

AI Tool Rules — Level 1

PERMITTEDAny approved or unapproved AI tool for processing Public data only

PERMITTEDUploading public documents, reports, and web content to AI tools

PERMITTEDUsing AI to generate or edit content based solely on Public information

Note: Even with Public data, do not submit information that is not yet publicly released (upcoming announcements, embargoed content) — classify embargoed content as Internal or above until the embargo lifts.

Download Full Guide

How to Implement Data Classification for AI

A data classification guide only reduces risk if employees understand it and technical controls enforce it. Follow these steps to implement classification effectively.

Map your existing classification scheme to the 4-level framework

Most organisations have some existing data classification, even if informal. Map your current categories to the four levels in this guide. If you have no existing classification, use this framework as your starting point and get legal/compliance sign-off before publishing.

Replace generic examples with organisation-specific data types

Generic examples (like 'financial data') are harder for employees to apply than specific ones (like 'Salesforce customer records' or 'Oracle Finance quarterly forecasts'). Spend time creating data type examples from your actual systems — compliance rates increase significantly with specific examples.

Define the approved AI tool list for each classification level

For each classification level, publish the list of specifically approved AI tools. Do not leave employees to interpret 'approved tools' — name the products, versions, and any tier-specific requirements (e.g. 'Microsoft Copilot for M365 E5 — Internal data only, not Confidential').

Deliver training using classification decision scenarios

Training on data classification is most effective through realistic scenarios. Present employees with 8–10 examples of data types they encounter in their role and ask them to classify each. Include edge cases — a mix of emails, documents, datasets, and verbal information. Discuss the reasoning, not just the answer.

Implement technical controls to enforce classification at the AI layer

Classification without enforcement is a guideline, not a control. Deploy DLP rules that detect sensitive data patterns (PII, card numbers, health identifiers) being submitted to AI services, and use an AI security platform that monitors all AI tool usage against your classification policy — not just at the network perimeter.

Frequently Asked Questions

Enforce Data Classification Automatically Across All AI Tools

A classification policy requires technical enforcement to be effective. Aona detects when employees submit Confidential or Restricted data to AI tools, blocks prohibited interactions in real time, and provides the visibility to know whether your data classification rules are actually working in practice.

Book a Demo

Related Resources

Aona AI Platform AI Acceptable Use Policy AI Risk Assessment Checklist AI Governance Framework All Templates

AI Data Classification Guide

Why AI Tools Need Explicit Data Classification Rules

The Data Classification Guide

How to Implement Data Classification for AI

Frequently Asked Questions

Enforce Data Classification Automatically Across All AI Tools

Product

Solutions

Resources

Compare

Compliance

Company

Contact