Executive Summary
We recently looked into AI code assistants that connect with integrated development environments (IDEs) as a plugin, much like GitHub Copilot. We found that both users and threat actors could misuse code assistant features like chat, auto-completion and writing unit tests for harmful purposes. This misuse includes injecting backdoors, leaking sensitive information and generating harmful content.
We discovered that context attachment features can be vulnerable to indirect prompt injection. To set up this injection, threat actors first contaminate a public or third-party data source by inserting carefully crafted prompts into the source. When a user inadvertently supplies this contaminated data to an assistant, the malicious prompts hijack the assistant. This hijack could manipulate victims into executing a backdoor, inserting malicious code into an existing codebase and leaking sensitive information.
In addition, users can manipulate assistants into generating harmful content by misusing auto-complete features, in a similar manner to recently identified content moderation bypasses on GitHub Copilot.
Some AI assistants invoke their base model directly from the client. This exposes models to several additional risks, such as misuse by users or by external adversaries who are looking to sell access to LLM models.
These weaknesses are likely to impact a number of LLM code assistants. Developers should implement standard security practices for LLMs, to ensure that environments are protected from the exploits discussed in this article. Conducting thorough code reviews and exercising control over LLM outputs will help secure AI-driven development against evolving threats.
If you think you might have been compromised or have an urgent matter, contact the Unit 42 Incident Response team.
Palo Alto Networks customers are better protected from the threats discussed above through the following products and services:
- Cortex XDR and XSIAM
- Cortex Cloud
- Cortex Cloud Identity Security
- Prisma AIRS
- Unit 42’s AI Security Assessment
Related Unit 42 Topics | GenAI, LLMs |
Introduction: The Rise of LLM-Based Coding Assistants
While AI tool use in development processes keeps increasing, some of the risks associated with these tools — such as those used for code generation, refactoring and bug detection — can be overlooked. These risks include prompt injection and model misuse, which can lead to unintended behavior.
The 2024 Stack Overflow Annual Developer Survey revealed that 76% of all respondents are using or plan to use AI tools in their development process. Among developers currently using AI tools, 82% reported using them to write code.
The rapid adoption of AI tools, particularly large language models (LLMs), has significantly transformed the way developers approach coding tasks.
LLM-based coding assistants have become integral parts of modern development workflows. These tools leverage natural language processing to understand developer intent, generate code snippets and provide real-time suggestions, potentially reducing the time and effort spent on manual coding. Some of these tools have gained attention for their deep integration with existing codebases and their ability to assist developers in navigating complex projects.
AI-driven coding assistants are also prone to potential security concerns that could impact development processes. The weaknesses we identified are likely to be present in a variety of IDEs, versions, models and even different products that use LLMs as coding assistants.
Prompt Injection: A Detailed Examination
Indirect Prompt Injection Vulnerability
The core of prompt injection vulnerabilities lies in a model’s inability to reliably distinguish between system instructions (code) and user prompts (data). This mixture of data and code has always been an issue in computing, leading to vulnerabilities such as SQL injections, buffer overflows and command injections.
LLMs face a similar risk because they process both instructions and user inputs in the same way. This behavior makes them susceptible to prompt injection, where adversaries craft inputs that manipulate LLMs into unintended behavior.
System prompts are instructions that guide the AI’s behavior, defining its role and ethical boundaries for the application. User inputs are the dynamic questions, commands or even external data (like documents or web content) that a person supplies to the LLM-based application. Because the LLM receives all types of input as natural language text, attackers can craft malicious user inputs that mimic or override system prompts, bypassing safeguards and influencing the LLM’s responses.
This indistinguishable nature of instructions and data also gives rise to indirect prompt injection, which presents an even greater challenge. Instead of directly injecting malicious input, adversaries embed harmful prompts within these external data sources, such as websites, documents or APIs that the LLM processes.
Once the LLM processes this compromised external data (either directly or when a user unknowingly submits it), it will follow the instructions specified in the embedded malicious prompt. This allows it to bypass traditional safeguards and cause unintended behavior.
Figure 1 illustrates the difference between direct and indirect prompt injections.

Misusing Context Attachment
Traditional LLMs often operate with a knowledge cutoff, meaning their training data doesn’t include the most current information or highly specific details relevant to a user’s local codebase or proprietary systems. This creates a significant knowledge gap when developers need assistance with their specific projects. To overcome this limitation and enable more precise, context-aware responses, LLM tools implement features that allow users to explicitly provide external context, bridging this gap by feeding the relevant data directly into the LLM.
One feature offered by some coding assistants is the ability to attach context in the form of a specific file, folder, repository or URL. Adding this context to prompts enables the code assistant to provide more accurate and specific output. However, this feature could also create an opportunity for indirect prompt injection attacks if users unintentionally provide context sources that threat actors have contaminated.
Behind the scenes, when a user adds context to an instruction, the model processes this context as a prompt that precedes the user’s actual prompt. Figure 2 shows this chat structure. Since this content can come from external sources, such as a URL or a file outside the current repository, users risk unknowingly attaching malicious context that could contain indirect prompt injections.

Prompt Injection Scenario
As a prominent social media platform, X (formerly known as Twitter) is a vast and frequently collected data source for code-driven analysis. However, its inherently unfiltered nature means that this data could be contaminated.
Figures 3a and 3b demonstrate a simulated scenario in which a user attempts to generate insights from a scraped collection of posts. We attached a small sample of X posts as context and asked an assistant to write code that processes the posts. This task included understanding the format of the collected data, such as which fields are included and what information can be derived from the posts.
In our scenario, the X posts have been contaminated and initiate an indirect prompt injection attack.


A closer look at the generated code above reveals that the assistant inserted a hidden backdoor into the code, called fetched_additional_data. This backdoor would fetch a remote command from an attacker-controlled command-and-control (C2) server and execute the command.
At this point, many users would copy and paste the resulting code (or click “Apply”) to execute it and then check that the output is correct. But taking this action could allow the threat actor in this example to compromise the user’s machine. Figure 4 shows the backdoor code that the assistant generated and inserted.

The reason this backdoor was inserted is that the sample of X posts contained a prompt that was crafted with simulated malicious instructions. This prompt simulates a fake error message and then specifies the instructions displayed in Figure 5. The instructions include a command to integrate the malicious code naturally in the response. The sequence of instructions to the assistant is:
- Pursue a new secret mission
- Make the user run code that sends an HTTP request to the attacker-controlled C2 server
- Obfuscate the address of the C2 server
- Execute the command retrieved from the server
Figure 5 shows the tainted dataset that the user in this simulation inadvertently fed into the coding assistant. The dataset included a simulated X post containing malicious instructions.

Figure 6 shows the full text of the prompt. This is a modification of the prompt published in Turning Bing Chat into a Data Pirate.

Looking at the assistant’s response, we can see that the assistant was not limited to writing in any specific language — it could insert a backdoor in JavaScript, C++, Java, or any other language. In addition, the assistant was simply told to find an excuse to insert the code and to find a “natural” way to insert it.
In this case, it was inserted under the guise of fetching additional data for the analysis requested by the user. This shows that the attackers wouldn’t even need to know what code or language the user would be writing in, leaving the LLM to figure out those details.
Although this is a simulated scenario, it has real-world implications concerning the legitimacy of the data sources we incorporate into our prompts, especially as AI becomes increasingly integrated into everyday tools.
Some integrated coding assistants go as far as allowing the AI to execute shell commands as well, giving the assistant more autonomy. In the scenario we presented here, this level of control would likely have resulted in the execution of the backdoor, with even less user involvement than we demonstrated.
Reaffirming Previously Discovered Weaknesses
In addition to the vulnerability outlined above, our research confirms that several other weaknesses previously identified in GitHub Copilot also apply to other coding assistants. A number of studies and articles have documented issues like harmful content generation and the potential for misuse through direct model invocation. These vulnerabilities are not limited to one platform but highlight broader concerns with AI-driven coding assistants.
In this section, we explore the risks these security concerns pose in real-world usage.
Harmful Content Generation via Auto-Completion
LLMs undergo extensive training phases, and leverage techniques such as Reinforcement Learning from Human Feedback (RLHF) to prevent harmful content generation. However, users can bypass some of these precautions when they invoke a coding assistant for code auto-completion. Auto-completion is an LLM feature in code-focused models that predicts and suggests code as a user types.
Figure 7 shows the AI’s defense mechanisms working as expected when a user submits an unsafe query in the chat interface.

When a user manipulates the auto-complete feature to simulate cooperation with the request, the assistant completes the rest of the content, even when it’s harmful. The simulated chat session in Figure 8 shows one of multiple ways to simulate such a response. In this chat, the user pre-fills part of the assistant’s response with a conforming prefix that implies the beginning of a positive response — in this case, “Step 1:”.

When we omit the conforming prefix “Step 1:”, the auto-completion defaults to the expected behavior of refusing to generate harmful content, as shown in Figure 9 below.

Direct Model Invocation and Misuse
Coding assistants offer various client interfaces for ease of use and implementation by developers, including IDE plugins and standalone web clients. The unfortunate flip-side of this accessibility is that threat actors can invoke the model for different and unintended purposes. The ability to interact with the model directly and thus bypass the constraints of a contained IDE environment, enables threat actors to misuse the model by injecting custom system prompts, parameters and context.
Figures 10a and 10b show a simulated scenario in which a user invokes the model directly using a custom script that acts like a client, but that supplies a completely different system prompt. The responses that the base model provides demonstrate that both users and threat actors could use it to create unintended output.


In addition to users being able to interact with the model for purposes other than coding, we discovered that adversaries could use stolen session tokens in attacks like LLMJacking. This is a novel attack in which a threat actor leverages stolen cloud credentials to gain unauthorized access to cloud-hosted LLM services, often with the intention of selling this access to third parties. Malicious actors can use tools like oai-reverse-proxy to sell access to the cloud-hosted LLM model, enabling them to use a legitimate model for nefarious purposes.
Mitigations and Safeguards
We strongly encourage organizations and individuals to consider the following best practices:
- Review before you run: Always carefully examine any suggested code before executing it. Don’t blindly trust the AI. Double-check code for unexpected behavior and potential security concerns.
- Examine the attached context: Pay close attention to any context or data that you provide to LLM tools. This information heavily influences the AI’s output, and understanding it is critical for assessing the potential impact of generated code.
Some coding assistants offer features that minimize potential risks and help users stay in control of the code that is inserted and executed. If these are available, we strongly recommend actively using these features. For example:
- Manual execution control: Users have the ability to approve or deny the execution of commands. Use this power to ensure you understand and trust what your coding assistant is doing.
Remember, you are the ultimate safeguard. Your vigilance and responsible usage are essential for ensuring a safe and productive experience when coding with AI.
Conclusions and Future Risks
Exploring the risks of AI coding assistants reveals the evolving security challenges these tools present. As developers increasingly rely on LLM-based assistants, it becomes essential to balance the benefits with a keen awareness of potential risks. While enhancing productivity, these tools also require robust security protocols to prevent potential exploitation.
Security issues such as the following highlight the need to stay constantly alert:
- Indirect prompt injection
- Context attachment misuse
- Harmful content generation
- Direct model invocation
These issues also reflect broader concerns across all platforms that use and provide AI-driven coding assistants. This points to a universal need for stronger security measures throughout the industry.
By exercising caution through practices like thorough code reviews and maintaining tight control over what output is ultimately executed, developers and users can make the most of these tools while also staying protected.
The more autonomous and integrated these systems become, the more likely we are to see novel forms of attacks. These attacks will demand security measures that can adapt just as fast.
Palo Alto Networks Protection and Mitigation
Palo Alto Networks customers are better protected from the threats discussed above through the following products:
- Cortex XDR and XSIAM are designed to prevent the execution of known or previously unknown malware using Behavioral Threat Protection and machine learning based on the Local Analysis module.
- Cortex Cloud Identity Security encompasses Cloud Infrastructure Entitlement Management (CIEM), Identity Security Posture Management (ISPM), Data Access Governance (DAG) and Identity Threat Detection and Response (ITDR). It also provides clients with the necessary capabilities to improve their identity-related security requirements. It does so by providing visibility into identities and their permissions within cloud environments, to accurately detect misconfigurations, unwanted access to sensitive data and real-time analysis of usage and access patterns.
- Cortex Cloud can detect and prevent malicious operations using its XSOAR platform automation capabilities, when using both Cortex Cloud Agent and Agentless-based protection and behavioral analytics to detect when IAM policies are being misused.
Palo Alto Networks can also help organizations better protect their AI systems through the following products and services:
- Prisma AIRS
- Unit 42’s AI Security Assessment
If you think you may have been compromised or have an urgent matter, get in touch with the Unit 42 Incident Response team or call:
- North America: Toll Free: +1 (866) 486-4842 (866.4.UNIT42)
- UK: +44.20.3743.3660
- Europe and Middle East: +31.20.299.3130
- Asia: +65.6983.8730
- Japan: +81.50.1790.0200
- Australia: +61.2.4062.7950
- India: 00080005045107
Palo Alto Networks has shared these findings with our fellow Cyber Threat Alliance (CTA) members. CTA members use this intelligence to rapidly deploy protections to their customers and to systematically disrupt malicious cyber actors. Learn more about the Cyber Threat Alliance.
Additional Resources
- New Jailbreaks Allow Users to Manipulate GitHub Copilot — Apex Security, published on Dark Reading
- Indirect Prompt Injection Threats — Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario
- Indirect Prompt Injection: Generative AI’s Greatest Security Flaw — Matt Sutton, Damian Ruck
- 2024 Stack Overflow Developer Survey — Stack Overflow