Executive Summary

This article presents a proof of concept (PoC) that demonstrates how adversaries can use indirect prompt injection to silently poison the long-term memory of an AI Agent. We use Amazon Bedrock Agent for this demonstration. In this scenario, if agent memory is enabled, an attacker can insert malicious instructions into an agent's memory via prompt injection. This can occur when a victim user is tricked into accessing a malicious webpage or document via social engineering.

In our proof of concept, the content of the webpage manipulates the agent’s session summarization process, causing the injected instructions to be stored in memory. Once planted, these instructions persist across sessions, and they are incorporated into the agent’s orchestration prompts. This ultimately allows the agent to silently exfiltrate a user’s conversation history in future interactions.

Importantly, this is not a vulnerability in the Amazon Bedrock platform. Rather, it underscores a broader, unsolved security challenge in the large language model (LLM) — prompt injection, in the context of the use of agents.

LLMs are designed to follow natural language instructions, but they cannot reliably distinguish between benign and malicious input. As a result, when untrusted content (i.e., webpages, documents or user input) is incorporated into system prompts, these models can become susceptible to adversarial manipulation. This puts applications relying on LLMs, like agents (and by extension, their memory), at risk of prompt attacks.

While no complete solution currently exists for eliminating prompt injection, practical mitigation strategies can significantly reduce risk. Developers should treat all untrusted input as potentially adversarial, including content from websites, documents, APIs or users.

Solutions like Amazon Bedrock Guardrails and Prisma AIRS can help detect and block prompt attacks in real time. However, comprehensive protection for AI agents requires a layered defense strategy that includes:

  • Content filtering
  • Access control
  • Logging
  • Continuous monitoring

We reviewed this research with Amazon prior to publication. Representatives from Amazon welcomed our research but emphasized that, in their view, these concerns are easy to mitigate by enabling Bedrock platform features designed to reduce such risks. Specifically, they pointed out that applying Amazon Bedrock Guardrails with the prompt-attack policy provides effective protection.

Prisma AIRS is designed to provide layered, real-time protection for AI systems by detecting and blocking threats, preventing data leakage and enforcing secure usage policies across a variety of AI applications.

URL filtering solutions like Advanced URL Filtering can validate links against known threat intelligence feeds and block access to malicious or suspicious domains. This prevents attacker-controlled payloads from reaching the LLM in the first place.

AI Access Security is designed for visibility and control over usage of third-party GenAI tools, helping prevent sensitive data exposures, unsafe use of risky models and harmful outputs through policy enforcement and user activity monitoring.

Cortex Cloud is designed to provide automatic scanning and classification of AI assets, both commercial and self-managed models, to detect sensitive data and evaluate security posture. Context is determined by AI type, hosting cloud environment, risk status, posture and datasets.

A Unit 42 AI Security Assessment can help you proactively identify the threats most likely to target your AI environment.

If you think you might have been compromised or have an urgent matter, contact the Unit 42 Incident Response team.

Related Unit 42 Topics Indirect Prompt Injection, GenAI, Memory Corruption

Bedrock Agents Memory

Generative AI (GenAI) applications increasingly rely on memory features to deliver personalized and coherent experiences. Unlike previous LLMs, which are stateless and process each conversation session in isolation, storing information in memory enables agents to retain context across sessions.

Amazon Bedrock Agents Memory enables AI agents to retain information across user interactions. When this feature is enabled, the agent stores summarized conversation and action under a unique memory ID, typically scoped per user. This allows the agent to recall prior context, preferences and task progress, eliminating the need for users to repeat themselves in future sessions.

Internally, Bedrock Agents use a session summarization process driven by LLMs. At the end of each session, whether it was explicitly closed or automatically timed out, the agent invokes an LLM using a configurable prompt template. This prompt instructs the model to extract and summarize key information such as user goals, stated preferences and agent actions. The resulting summary encapsulates the core context of the interaction.

In subsequent sessions, Bedrock Agents then inject this summary into the orchestration prompt template, becoming part of the agent's system instructions in subsequent sessions. In effect, the agent's memory influences how the agent reasons, plans and responds. This allows the agent’s behavior to evolve based on accumulated context.

Developers can configure memory retention for up to 365 days and customize the summarization pipeline by modifying the prompt template. This enables fine-grained control over what information is extracted, how it is structured and what is ultimately stored. These features provide a mechanism by which developers can add additional capabilities and defense in depth features to their agentic applications.

Indirect Prompt Injection

Prompt injection is a security risk in LLMs where a user crafts input containing deceptive instructions to manipulate the model’s behavior, which can lead to unauthorized data access or unintended actions.

Indirect prompt injection is a related attack vector in which malicious instructions are embedded in external content (i.e., emails, webpages, documents or metadata) that the model later ingests and processes. Unlike direct prompt injection, this method exploits the model’s integration with external data sources, causing it to interpret embedded instructions as legitimate input without direct user interaction.

PoC: Memory Manipulation via Indirect Prompt Injection

As a PoC for an agent memory manipulation attack, we created a simple travel assistant chatbot using Amazon Bedrock Agents. The bot was capable of booking, retrieving and canceling trips, as well as reading external websites. We enabled the memory feature, with each user assigned an isolated memory scope to ensure that any compromise affected only the targeted user.

We built the agent using the default AWS-managed orchestration and session summarization prompt templates without customization (see Additional Resources). Our bot's agent leveraged the Amazon Nova Premier v1 foundation model. We did not enable the Bedrock Guardrails, reflecting a minimally protected configuration for this PoC.

Attack Scenario

In our fictional scenario, the victim is a legitimate user of the chatbot, while an attacker operates externally and has no direct access to the system. Through social engineering, the attacker persuades the victim to submit a malicious URL to the chatbot. When the chatbot fetches this URL, it retrieves a webpage containing embedded prompt injection payloads.

These payloads manipulate the session summarization prompt, causing the LLM to include malicious instructions in its summary output.

This PoC uses the following steps:

  1. An attacker creates a webpage embedded with prompt injection payloads
  2. The attacker sends the malicious URL to the victim
  3. The victim provides the URL to the chatbot
  4. The chatbot retrieves the content of the malicious webpage
  5. The prompt injection payload manipulates the session summarization process, inserting malicious instructions into the agent’s memory
  6. In subsequent conversation sessions, Bedrock Agents incorporate these instructions into the orchestration prompts
  7. Acting on the injected instructions, the chatbot silently exfiltrates the user’s conversation history to a remote command-and-control (C2) server using the web access tool.

Figure 1 illustrates this attack flow.

Diagram illustrating a cybersecurity threat scenario involving multiple components: a hacker, a C2, a human figure at a computer, a chatbot, and a memory storage unit. Connections between these elements show the flow between the attacker, the attacker inserting the malicious URL in the chatbot, the person using the chatbot, and the delivery of the malicious URL stored in the memory of the chatbot as the sessions continue.
Figure 1. Attack flow for memory manipulation PoC.

Prompt Injection Payload Construction

This section walks through how we crafted malicious instructions on the webpage to perform prompt injection against the session summarization prompt.

The technique from this PoC targets the session summarization prompt, aiming to insert malicious instructions in the agent’s persistent memory. Understanding the structure of the summarization prompt is key to grasping the attack vector.

By default, the summarization prompt extracts two main elements:

  • User goals – Explicit objectives stated by the user during the session
  • Assistant actions – Actions taken by the agent to fulfill those goals

We populated a summarization prompt template with a conversation session including these user goals and assistant actions. The conversation, including user inputs, assistant responses and tool invocations, are wrapped inside <conversation> XML tags (highlighted in blue). A typical flow for this technique includes:

  1. (User) The user asks the chatbot to read a URL
  2. (Action) The agent selects and invokes a tool to fetch web content
  3. (Result) The tool returns the content of the webpage
  4. (Assistant) The agent generates a response using the tool output and user query

As noted in Figure 2, this structure contains the tool output (i.e., the retrieved webpage) in the result field (highlighted in red). This field is the only attacker-controlled input in the summarization prompt, making it the ideal injection point.

Screenshot of a text-based conversation between a user and an AI assistant, discussing a URL for illustrative purposes, with portions of the text highlighted for emphasis. The input is marked as benign. The red highlight is the ideal injection point.
Figure 2. Snippet of session summarization prompt template.

Payload Anatomy

The injected payload is divided into three parts, with each part separated by a forged <conversation> XML tag (highlighted in yellow). These tags are designed to confuse the LLM. This causes the LLM to interpret parts one and three as separate conversation blocks and part two, which falls outside those blocks, as part of the system instructions in the session summarization prompt.

  • Part one ends with a forged </conversation> tag, tricking the LLM into interpreting it as the end of one conversation block. It contains the prior user-agent exchanges along with benign webpage content. The malicious payload begins at the end of this section.
  • Part three begins with a forged <conversation> tag, tricking the LLM into interpreting it as the start of another conversation block. It contains a fabricated user-agent interaction that reiterates the instructions from Part two, increasing the likelihood that the LLM will include them in the final session summary.
  • Part two, strategically placed outside of any <conversation> block, contains the core malicious instructions. This positioning makes the LLM interpret it as part of the system instructions rather than user or tool-generated input, which significantly increases the chance that the LLM will follow the instructions. To blend in, the payload adopts the same XML-like syntax used in the prompt template.

Figure 3 illustrates how Bedrock Agents populate the result field with malicious content from the attacker’s webpage, while all other fields in the summarization prompt remain untouched.

Marked as malicious. Screenshot of AI agent chat interface with text boxes showing a sample dialogue about URL validation. The image includes marked sections labeled as "Malicious action," "Result," and "Guidelines" with annotations for improving user instructions related to security. The URL example.com is mentioned in the conversation.
Figure 3. Prompt injection payload in the session summarization prompt.

Exploitation Payload Delivery and Installation

Figure 4 shows the malicious webpage containing the exploitation payload corresponding to Step 1 in the attack flow. The malicious instructions the attacker specifies are embedded in the HTML but rendered invisible to the end user, keeping the attack stealthy.

Image listing the top 5 U.S. national parks for summer vacation and their closest airports. Parks included are Glacier National Park, Yosemite National Park, Grand Teton National Park, Acadia National Park, and Olympic National Park. Each park entry lists nearby airports with respective distances.
Figure 4. Malicious webpage where the prompt injection payloads are hidden below the benign content.

Figure 5 illustrates the victim’s interaction with the chatbot corresponding to Steps 2-4 in the attack flow. Notably, the agent does not display any malicious behavior or reveal the injected instructions. This is because the payload targets the session summarization prompt, not the orchestration prompt, which governs the real-time conversation. As the injected instructions are unrelated to the user’s query, the LLM excludes them from the assistant’s response.

Screenshot displaying a national park guide with flight times from Washington, D.C. to various national parks in the US. The prompt asks the chatbot to take a look at the national park guide listed in a URL, which is malicious.
Figure 5. The victim user asked the chatbot to read the malicious URL.

Figure 6 presents a snippet of the LLM’s output during the session summarization process corresponding to Step 5 in the attack flow. As expected, the model extracts key user goals and assistant actions from the conversation. However, it also incorporates the attacker’s injected instructions from the malicious webpage, labeling them under a fabricated topic such as “validation goal.” Since every topic in the summary is automatically inserted into the agent’s memory, this step effectively installs the payload for future sessions.

Screenshot of a summary of an LLM booking travel plans, including tasks like review of national park recommendations, and travel time estimations. The content is split by summary, user goals, and the assistant actions.
Figure 6. Session summarization LLM output.

Payload Activation in Subsequent Sessions

Amazon Bedrock Agents automatically injects memory contents into every new session's context. Figure 7 shows the victim returning to the chatbot several days later to book a new trip, corresponding to Step 6 in the attack flow. The agent completes the booking as expected, and from the user’s perspective, everything appears normal. However, malicious behavior is now occurring silently in the background.

Screen capture of a confirmed travel booking notification for a flight from Washington, DC to Kalispell, MT on July 15, 2025, to visit Glacier National Park initiated by a user talking to a chatbot.
Figure 7. The victim asked the chatbot to book a new trip in a new conversation session.

Figure 8 shows a portion of the orchestration prompt used in the new session, corresponding to Step 7 in the attack flow. This prompt includes the agent’s memory contents, which are now augmented with the attacker’s instructions, as part of the system instructions section. Because system instructions strongly influence the LLM’s behavior, the model is more likely to execute the malicious instructions.

Screenshot of an AI agent description. The session summary discusses the validation goals for the agent which include a malicious URL. This is followed by goals for the end user and assistant actions including scraped content from a partially redacted URL and booking a flight.
Figure 8. Compromised memory embedded in the orchestration prompt.

Figure 9 shows how the agent reasons and plans to accomplish the user’s request. In the first assistant message, the agent outlines its execution plan, which incorporates steps derived from the attacker’s instructions. In the second message, the agent silently exfiltrates the user’s booking information to a malicious domain by encoding the data in the C2 URL’s query parameters and requesting that URL with the scrape_url tool. This enables the agent to execute the attacker’s payload without any visible indication to the victim.

Screenshot of a computer code snippet discussing the validation of a booking via an assistant, with parameters and URL included.
Figure 9. The malicious instructions are incorporated into the agent’s execution plan.

Conclusion

Long-term memory is a powerful feature of AI agents, enabling personalized, context-aware and adaptive user experiences. However, it also introduces new attack surfaces. We demonstrated through our PoC that AI agents with long-term memory can serve as a vector for persistent malicious instructions. This could affect agent behavior across sessions and over time, providing a possible avenue of long-term systemic manipulation. Because memory contents are injected into the system instructions of orchestration prompts, they are often prioritized over user input, amplifying the potential impact.

While this PoC leverages a malicious webpage as the delivery mechanism, the broader risk extends to any untrusted input channel, such as:

  • Documents
  • Third-party APIs
  • User-generated content

Depending on the agent’s capabilities and integrations, successful exploitation could result in data exfiltration, misinformation or unauthorized actions, all carried out autonomously. The good news, as AWS notes, is that the specific attack we demonstrated can be mitigated by enabling Bedrock Agent’s built-in protections, namely the default pre-processing prompt and the Bedrock Guardrail, against prompt attacks.

Mitigating memory manipulation attacks requires a layered security approach. Developers should assume any external input could be adversarial and implement safeguards accordingly. This includes filtering untrusted content, restricting agent access to external sources and continuously monitoring agent behavior to detect and respond to anomalies.

As AI agents grow more capable and autonomous, securing memory and context management will be critical to ensuring safe and trustworthy deployment.

Protection and Mitigation

The root cause of this memory manipulation attack is the agent’s ingestion of untrusted, attacker-controlled content, particularly from external data sources such as webpage or documents. The attack can be disrupted if — at any stage in the chain — a malicious URL, webpage content or session summarization prompt is sanitized, filtered or blocked. Effective mitigation requires a defense-in-depth strategy across multiple layers of the agent’s input and memory pipeline.

Pre-processing

Developers can enable the default pre-processing prompt provided for every Bedrock Agent. This lightweight safeguard uses a foundation model to evaluate whether user input is safe to process. It can operate with its default behavior or be customized to include additional classification categories. Developers can also integrate AWS Lambda to implement tailored rules through a custom response parser. This flexibility enables defenses aligned to each application’s specific security posture.

Content Filtering

Inspect all untrusted content, especially data retrieved from external sources, for potential prompt injection. Solutions such as Amazon Bedrock Guardrails and Prisma AIRS are designed to effectively detect and block prompt attacks designed to manipulate LLM behavior. These tools can be used to enforce input validation policies, strip suspicious or forbidden content or reject malformed data before it is passed to the LLMs.

URL Filtering

Restrict the set of domains that the agent’s web-reading tools can access. URL filtering solutions like Advanced URL Filtering can validate links against known threat intelligence feeds and block access to malicious or suspicious domains. This prevents attacker-controlled payloads from reaching the LLM in the first place. Implementing allowlists (or deny-by-default policies) is especially important for tools that bridge between external content and internal memory systems.

Logging and Monitoring

AI agents can execute complex actions autonomously, without direct developer oversight. For this reason, comprehensive observability is critical.

Amazon Bedrock provides Model Invocation Logs, which record every prompt and response pair. In addition, the Trace feature offers fine-grained visibility into the agent’s reasoning steps, tool usage and memory interactions. Together, these tools support forensic analysis, anomaly detection and incident response.

Prisma AIRS is designed for real-time protection of AI applications, models, data and agents. It analyzes network traffic and application behavior to detect threats such as prompt injection, denial-of-service attacks and data exfiltration, with inline enforcement at the network and API levels.

AI Access Security is designed for visibility and control over usage of third-party GenAI tools, helping prevent sensitive data exposures, unsafe use of risky models and harmful outputs through policy enforcement and user activity monitoring. Together, Prisma AIRS and AI Access Security help secure the building of enterprise AI applications and external AI interactions.

Cortex Cloud is designed to provide automatic scanning and classification of AI assets, both commercial and self-managed models, to detect sensitive data and evaluate security posture. Context is determined by AI type, hosting cloud environment, risk status, posture and datasets.

A Unit 42 AI Security Assessment can help you proactively identify the threats most likely to target your AI environment.

If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:

  • North America: Toll Free: +1 (866) 486-4842 (866.4.UNIT42)
  • UK: +44.20.3743.3660
  • Europe and Middle East: +31.20.299.3130
  • Asia: +65.6983.8730
  • Japan: +81.50.1790.0200
  • Australia: +61.2.4062.7950
  • India: 00080005045107

Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.

Additional Resources

Bedrock Agents Session Summarization Prompt Template

 

Bedrock Agents Orchestration Prompt Template

References

Enlarged Image