When building chat applications, it’s crucial to ensure the secure handling of sensitive data, especially Personal Identifiable Information (PII). PII can be directly or indirectly linked to an individual, making it essential to protect user privacy by preventing the transmission of such data to language models.
Example of PII
Consider the text below, where PII has been highlighted:
Hello, my name is John and I live in New York.
My credit card number is 3782-8224-6310-005 and my phone number is (212) 688-5500.
And here is the anonymized version:
Hello, my name is <PERSON> and I live in <LOCATION>. My credit card number is <CREDIT_CARD> and my phone number is <PHONE_NUMBER>.
Analyze and anonymize data
Integrate Microsoft Presidio for robust data sanitization in your Chainlit application.
import chainlit as cl
@cl.on_message
async def main(message: cl.Message):
response = await cl.Message(
content=f"Received: {message.content}",
).send()
Before proceeding, ensure that the Python packages required for PII analysis and anonymization are installed. Run the following commands in your terminal to install them:
pip install presidio-analyzer presidio-anonymizer spacy
python -m spacy download en_core_web_lg
Create an async context manager that utilizes the Presidio Analyzer to inspect the incoming text for any PII. This context manager can be included in your main function to scrutinize messages before they are processed.
When PII is detected, you should present the user with the option to either continue or cancel the operation. Use Chainlit’s messaging system to accomplish this.
from presidio_analyzer import AnalyzerEngine
from contextlib import asynccontextmanager
analyzer = AnalyzerEngine()
@asynccontextmanager
async def check_text(text: str):
pii_results = analyzer.analyze(text=text, language="en")
if pii_results:
response = await cl.AskActionMessage(
content="PII detected",
actions=[
cl.Action(name="continue", value="continue", label="✅ Continue"),
cl.Action(name="cancel", value="cancel", label="❌ Cancel"),
],
).send()
if response is None or response.get("value") == "cancel":
raise InterruptedError
yield
@cl.on_message
async def main(message: cl.Message):
async with check_text(message.content):
response = await cl.Message(
content=f"Received: {message.content}",
).send()
If your application has a requirement to anonymize PII, Presidio can also do that. Modify the check_text context manager to return anonymized text when PII is detected.
from presidio_anonymizer import AnonymizerEngine
anonymizer = AnonymizerEngine()
@asynccontextmanager
async def check_text(text: str):
pii_results = analyzer.analyze(text=text, language="en")
if pii_results:
response = await cl.AskActionMessage(
content="PII detected",
actions=[
cl.Action(name="continue", value="continue", label="✅ Continue"),
cl.Action(name="cancel", value="cancel", label="❌ Cancel"),
],
).send()
if response is None or response.get("value") == "cancel":
raise InterruptedError
yield anonymizer.anonymize(
text=text,
analyzer_results=pii_results,
).text
else:
yield text
@cl.on_message
async def main(message: cl.Message):
async with check_text(message.content) as anonymized_message:
response = await llm_chain.arun(
anonymized_message
callbacks=[cl.AsyncLangchainCallbackHandler()]
)