New Blog | Architecting secure Gen AI applications: Preventing Indirect Prompt Injection Attacks

By Roee Oz

As developers, we must be vigilant about how attackers could misuse our applications. While maximizing the capabilities of Generative AI (Gen-AI) is desirable, it’s essential to balance this with security measures to prevent abuse.

In a recent blog post, we discussed how a Gen AI application should use user identities for accessing sensitive data and performing sensitive operations. This practice reduces the risk of jailbreak and prompt injections, preventing malicious users from gaining access to resources they don’t have permissions to.

However, what if an attacker manages to run a prompt under the identity of a valid user? An attacker can hide a prompt in an incoming document or email, and if a non-suspecting user uses a Gen-AI large language model (LLM) application to summarize the document or reply to the email, the attacker’s prompt may be executed on behalf of the end user. This is called indirect prompt injection. Let’s start with some definitions:

Prompt injection vulnerability occurs when an attacker manipulates a large language model (LLM) through crafted inputs, causing the LLM to unknowingly execute the attacker’s intentions. This can be done directly by “jailbreaking” the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues.

Direct prompt injections, also known as “jailbreaking,” occur when a malicious user overwrites or reveals the underlying system prompt. This allows attackers to exploit backend systems by interacting with insecure functions and data stores accessible through the LLM.
Indirect Prompt Injections occur when an LLM accepts input from external sources that can be controlled by an attacker, such as websites or files. The attacker may embed a prompt injection in the external content, hijacking the conversation context. This can lead to unstable LLM output, allowing the attacker to manipulate the LLM or additional systems that the LLM can access. Also, indirect prompt injections do not need to be human-visible/readable, if the text is parsed by the LLM.

Read the full post here: Architecting secure Gen AI applications: Preventing Indirect Prompt Injection Attacks

By Roee Oz

As developers, we must be vigilant about how attackers could misuse our applications. While maximizing the capabilities of Generative AI (Gen-AI) is desirable, it’s essential to balance this with security measures to prevent abuse.

In a recent blog post, we discussed how a Gen AI application should use user identities for accessing sensitive data and performing sensitive operations. This practice reduces the risk of jailbreak and prompt injections, preventing malicious users from gaining access to resources they don’t have permissions to.

However, what if an attacker manages to run a prompt under the identity of a valid user? An attacker can hide a prompt in an incoming document or email, and if a non-suspecting user uses a Gen-AI large language model (LLM) application to summarize the document or reply to the email, the attacker’s prompt may be executed on behalf of the end user. This is called indirect prompt injection. Let’s start with some definitions:

Prompt injection vulnerability occurs when an attacker manipulates a large language model (LLM) through crafted inputs, causing the LLM to unknowingly execute the attacker’s intentions. This can be done directly by “jailbreaking” the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues.

Direct prompt injections, also known as “jailbreaking,” occur when a malicious user overwrites or reveals the underlying system prompt. This allows attackers to exploit backend systems by interacting with insecure functions and data stores accessible through the LLM.
Indirect Prompt Injections occur when an LLM accepts input from external sources that can be controlled by an attacker, such as websites or files. The attacker may embed a prompt injection in the external content, hijacking the conversation context. This can lead to unstable LLM output, allowing the attacker to manipulate the LLM or additional systems that the LLM can access. Also, indirect prompt injections do not need to be human-visible/readable, if the text is parsed by the LLM.

Read the full post here: Architecting secure Gen AI applications: Preventing Indirect Prompt Injection Attacks Read More

Cart

Cart