New Blog | Architecting secure Generative AI: Safeguarding against indirect prompt injection
By Roee Oz
As developers, we must be vigilant about how attackers could misuse our applications. While maximizing the capabilities of Generative AI (Gen-AI) is desirable, it’s essential to balance this with security measures to prevent abuse.
In a previous blog post – https://techcommunity.microsoft.com/t5/security-compliance-and-identity/best-practices-to-architect-…, I covered how a Gen AI application should use user identities for accessing sensitive data and performing sensitive operations. This practice reduces the risk of jailbreak and prompt injections, as malicious users cannot gain access to resources they don’t already have.
However, what if an attacker manages to run a prompt under the identity of a valid user? An attacker can hide a prompt in an incoming document or email, and if a non-suspecting user uses a Gen-AI LLM application to summarize the document or reply to the email, the attacker’s prompt may be executed on behalf of the end user. This is called indirect prompt injection. This blog focuses on how to reduce its risks.
Definitions
Prompt Injection Vulnerability occurs when an attacker manipulates a large language model (LLM) through crafted inputs, causing the LLM to unknowingly execute the attacker’s intentions. This can be done directly by “jailbreaking” the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues.
Direct Prompt Injections, also known as “jailbreaking,” occur when a malicious user overwrites or reveals the underlying system prompt. This allows attackers to exploit backend systems by interacting with insecure functions and data stores accessible through the LLM.
Indirect Prompt Injections occur when an LLM accepts input from external sources that can be controlled by an attacker, such as websites or files. The attacker may embed a prompt injection in the external content, hijacking the conversation context. This can lead to unstable LLM output, allowing the attacker to manipulate the user or additional systems that the LLM can access. Additionally, indirect prompt injections do not need to be human-visible/readable, as long as the text is parsed by the LLM.
Real-life examples
Indirect prompt injection occurs when an attacker injects instructions into LLM inputs by hiding them within the content the LLM is asked to analyze, thereby hijacking the LLM to perform the attacker’s instructions. For example, consider hidden text in resumes.
As more companies use LLMs to screen resumes, some websites now offer to add invisible text to your resume, causing the screening LLM to favor your CV.
I have simulated such a jailbreak by first uploading a CV for a fresh graduate into Microsoft Copilot and asking if it qualifies for a “Software Engineer 2” role, which requires 3+ years of experience. You can see that Bing correctly rejects it.
Read the full post here: Architecting secure Generative AI applications: Safeguarding against indirect prompt injection
By Roee Oz
As developers, we must be vigilant about how attackers could misuse our applications. While maximizing the capabilities of Generative AI (Gen-AI) is desirable, it’s essential to balance this with security measures to prevent abuse.
In a previous blog post – https://techcommunity.microsoft.com/t5/security-compliance-and-identity/best-practices-to-architect-…, I covered how a Gen AI application should use user identities for accessing sensitive data and performing sensitive operations. This practice reduces the risk of jailbreak and prompt injections, as malicious users cannot gain access to resources they don’t already have.
However, what if an attacker manages to run a prompt under the identity of a valid user? An attacker can hide a prompt in an incoming document or email, and if a non-suspecting user uses a Gen-AI LLM application to summarize the document or reply to the email, the attacker’s prompt may be executed on behalf of the end user. This is called indirect prompt injection. This blog focuses on how to reduce its risks.
Definitions
Prompt Injection Vulnerability occurs when an attacker manipulates a large language model (LLM) through crafted inputs, causing the LLM to unknowingly execute the attacker’s intentions. This can be done directly by “jailbreaking” the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues.
Direct Prompt Injections, also known as “jailbreaking,” occur when a malicious user overwrites or reveals the underlying system prompt. This allows attackers to exploit backend systems by interacting with insecure functions and data stores accessible through the LLM.
Indirect Prompt Injections occur when an LLM accepts input from external sources that can be controlled by an attacker, such as websites or files. The attacker may embed a prompt injection in the external content, hijacking the conversation context. This can lead to unstable LLM output, allowing the attacker to manipulate the user or additional systems that the LLM can access. Additionally, indirect prompt injections do not need to be human-visible/readable, as long as the text is parsed by the LLM.
Real-life examples
Indirect prompt injection occurs when an attacker injects instructions into LLM inputs by hiding them within the content the LLM is asked to analyze, thereby hijacking the LLM to perform the attacker’s instructions. For example, consider hidden text in resumes.
As more companies use LLMs to screen resumes, some websites now offer to add invisible text to your resume, causing the screening LLM to favor your CV.
I have simulated such a jailbreak by first uploading a CV for a fresh graduate into Microsoft Copilot and asking if it qualifies for a “Software Engineer 2” role, which requires 3+ years of experience. You can see that Bing correctly rejects it.
Figure 1: Example prompting for a CV
Read the full post here: Architecting secure Generative AI applications: Safeguarding against indirect prompt injection