Zum Hauptinhalt springen

Pseudonymization

Pseudonymization is the practice of replacing identifying information (names, emails, phone numbers) with placeholders before processing data, then restoring the original values afterward. This page is the practical how-to.

The pipeline

A typical pseudonymized automation has four parts:

The external AI step never sees the real names, emails, or phone numbers. The mapping between placeholders and original values lives in memory and is consumed by the deanonymizer at the end.

The two building blocks are documented in LLM Data Anonymizer and LLM Data Deanonymizer. This page describes how to use them well.

When to pseudonymize

Pseudonymize when:

  • The input contains identifying information (names, e-mails, phone numbers, addresses).
  • The processing step uses an external AI provider.
  • The output is consumed in a context where the real values must be present.

If any of the three is false, you may not need pseudonymization. For example:

  • Self-hosted models do not require it (data stays on your premises).
  • Aggregations and summaries that do not need to reference individuals can drop identifying values entirely.

Building the pipeline

A small, complete workflow:

  1. Trigger — incoming support ticket via webhook.
  2. LLM Data Anonymizer — input: the ticket text from the trigger. Output: anonymized_text. Defaults are usually enough (names, emails, phones, addresses). Add a custom pattern description for any domain-specific terms ("Replace any internal project codename with a placeholder").
  3. LLM Prompt — input: anonymized_text. Use an external model to classify, summarize, or draft a response.
  4. LLM Data Deanonymizer — input: the response from step 3. Output: restored_text.
  5. HTTP Request — send restored_text back to the ticketing system, addressing the customer by their real name.

📷 SCREENSHOT: A four-step workflow with the anonymizer, an LLM prompt, the deanonymizer, and a final HTTP request, with the memory-input/output names visible.

What gets anonymized by default

CategoryDefaultNotes
NamesOnPersonal first/last names.
EmailsOnAnywhere in the text.
Phone numbersOnInternational formats included.
Postal addressesOnStreet + city + postcode patterns.
LocationsOffCities, countries — turn on only if even places are sensitive.
Social handlesOff@-mentions and social links.
Custom patternsEmptyDefine per-automation.

You can disable categories you do not want anonymized. For example, if a workflow specifically requires the location to remain intact for the AI to make a decision, keep locations off.

Custom patterns

The custom pattern description is a free-form sentence the anonymizer hands to the AI. Examples:

  • "Replace any internal project codename like 'Project Aurora' with a placeholder."
  • "Mask any case-file reference of the form CF- followed by digits."
  • "Mask all monetary amounts above 10 000 €."

The AI does the matching. Be specific in the description; the more precise the wording, the more reliable the masking.

Selective deanonymization

The deanonymizer accepts a restrict to types parameter. Use it when you want some placeholders to remain in the final output:

  • Restore names but keep email addresses masked when the result is shown to a support agent.
  • Restore postal addresses but keep phone numbers masked in a public-facing response.

Pitfalls and how to avoid them

The mapping does not survive the workflow

The anonymizer stores the placeholder-to-original mapping in the workflow's memory for the current run only. If the deanonymizer runs in a different run (different job), the mapping is gone and placeholders pass through unchanged. Always keep anonymizer and deanonymizer in the same run.

Multiple AI calls between anonymize and deanonymize

Multiple external-AI calls between the two endpoints are fine — the placeholders are stable, the mapping stays in memory until the deanonymizer consumes it. Just make sure each external call only ever reads the anonymized text.

Conditional skipping

If an LLM Prompt step is wrapped in a condition and gets skipped, the workflow still has the mapping. The deanonymizer at the end is harmless on the original input (it just passes through). This is a useful property: you can write robust workflows that anonymize defensively even when only some branches actually go to an external AI.

Partial recovery

The AI sometimes invents new variations of names ("Mr X" becomes "Mr. X" or "X, Mr"). The deanonymizer is tolerant but not infallible. Spot-check.

Recommendations

  • Default on: any workflow that sends customer data to an external AI should have anonymize → external AI → deanonymize as the spine.
  • ✅ Use custom patterns liberally. Industry-specific terms (case files, project codes, account numbers) are not in the defaults.
  • ✅ Test pseudonymized workflows with realistic data. The anonymizer is statistical; you want to see what it actually does.
  • ✅ Combine pseudonymization with a self-hosted fallback for highly sensitive records. Belt and braces.
  • ⚠️ Locations are off by default. Turn them on only if a location-only leak is a problem in your context.
  • ❌ Do not rely on pseudonymization to satisfy regulations that demand "no transmission of personal data". The provider's TOS, your contract, and your DPO have the final word here.
  • ❌ Do not store the placeholder-to-original mapping outside the platform. Treat it as transient.

What to do next