Securing LLMs: A Guide to the OWASP Top 10 Risks and Mitigations

by
Britt Crawford

Bridging Familiar Security Lessons With the Unique Challenges of AI

The rapid rise of large language models (LLMs) has been nothing short of revolutionary, reshaping industries and redefining how we interact with technology. Yet, with great potential comes significant risk. The inaugural OWASP Top 10 for LLM Applications 2025 shines a spotlight on the critical vulnerabilities developers and organizations must address to ensure their systems remain safe, trustworthy, and efficient.

If some of these risks feel familiar, that’s because they are. Many echo the security challenges developers have faced with web applications for years—think injection attacks and improper data handling. But with LLMs, the stakes are higher, and the solutions are more nuanced. LLMs don’t just process inputs; they interpret, generate, and act on them. Their dynamic interaction with users and their vast processing power introduce vulnerabilities that require fresh thinking.

OWASP’s list offers a lens—a structured way to evaluate how these AI systems can be safely built, deployed, and maintained in a world that increasingly relies on their capabilities. Whether you’re a developer, a decision-maker, or simply someone interested in the evolving landscape of AI security, this short guide walks you through the key risks and why they matter, drawing connections to longstanding practices while offering a clear view of what’s new along with mitigation techniques.

Prompt Injection

Prompt injection exploits the shared channel in LLMs where instructions and data overlap, allowing malicious users to embed harmful commands into inputs. Unlike traditional injection attacks, this vulnerability is unique to the interpretive nature of LLMs.

Why It Matters: Imagine a customer support bot manipulated into leaking sensitive account details or bypassing internal workflows. Such breaches erode trust and expose organizations to reputational, operational, and legal risks. As LLMs expand into high-stakes applications, prompt injection represents a critical, evolving threat.

Mitigation: According to the OWASP guide, mitigating prompt injection involves strict input validation, context isolation, and employing guardrails around user inputs to prevent commands from being misinterpreted.

Sandgarden also recommends:

  • Dynamic Context Isolation: Create session-specific context tokens that restrict LLM memory bleed across interactions.
  • Granular Command Filtering: Leverage regex or structured allowlists to filter out unexpected syntax.
  • Controlled Execution: Use role-based access permissions to limit the scope of actions an LLM can trigger.

Sensitive Information Disclosure

Sensitive information disclosure occurs when LLMs unintentionally reveal private, proprietary, or restricted data during interactions. This vulnerability can arise from poorly sanitized training datasets, insufficient filtering of outputs, or inadvertently exposed system prompts.

Why It Matters: Consider a customer-facing LLM unintentionally disclosing sensitive information like account credentials, personal identifiers, or confidential business strategies during a live chat. In sectors like healthcare, finance, or enterprise software, such incidents could result in severe regulatory penalties, legal challenges, and reputational harm. With increasing reliance on LLMs for critical applications, mitigating this risk is essential to safeguarding trust and compliance.

Mitigation: According to the OWASP guide, addressing sensitive information disclosure involves implementing output filters, sanitizing datasets, and enforcing access controls. These measures focus on minimizing the risk of both accidental leaks and targeted attacks.

Sandgarden also recommends:

  • Real-Time Output Monitoring: Deploy AI-powered filters that actively monitor and flag potentially sensitive outputs before they reach the user.
  • Semantic Data Classification: Use advanced labeling to classify sensitive vs. non-sensitive data, ensuring granular control over what information the LLM can access or generate.
  • Differential Privacy Techniques: Incorporate privacy-preserving mechanisms that anonymize individual data points, making it nearly impossible to trace responses back to the original dataset.
  • Zero-Trust Architecture: Enforce role-based permissions and audit trails to ensure that only authorized entities access sensitive data.

Supply Chain Vulnerabilities

Supply chain vulnerabilities arise when third-party components, such as plugins, pre-trained models, or external APIs, introduce exploitable weaknesses into LLM systems. These components often extend functionality but can also serve as attack vectors if compromised.

Why It Matters: A single compromised plugin could act as a backdoor, exposing sensitive data, degrading system performance, or providing attackers with a foothold to infiltrate broader systems. With the growing reliance on expansive LLM ecosystems, including pre-trained models and integrations, the risk of supply chain attacks has become increasingly significant. Inadequate oversight of dependencies can jeopardize the security of the entire application.

Mitigation: The OWASP guide emphasizes robust dependency management, continuous monitoring, and secure integration practices to reduce the attack surface.

Sandgarden also recommends:

  • Proactive Dependency Vetting: Vet all third-party components rigorously before integration, assessing their security posture, update frequency, and vendor reputation.
  • Immutable Infrastructure for Plugins: Deploy plugins and extensions in containerized environments with strict isolation to limit their access to core LLM systems.
  • Software Bill of Materials (SBOM): Maintain an up-to-date inventory of all components used in the system, enabling quick identification and replacement of compromised dependencies.
  • Continuous Auditing: Regularly audit dependencies for vulnerabilities, leveraging automated tools like dependency-checkers or static analysis tools.
  • Code Signing and Verification: Require code signing for plugins and pre-trained models to validate their authenticity and ensure they haven’t been tampered with.

Data and Model Poisoning

Data and model poisoning occurs when attackers inject malicious, biased, or deceptive data into training datasets or during model updates. This manipulation can result in harmful outputs that distort the model’s behavior or undermine its reliability.

Why It Matters: Imagine an LLM used in financial decision-making being fed poisoned training data that skews its recommendations to favor harmful investments or fraudulent activities. The implications go beyond technical issues—regulatory scrutiny, legal liabilities, reputational damage, and financial losses can quickly follow. Worse, biased or poisoned models may perpetuate systemic harms, embedding misinformation or unethical practices into critical applications.

Mitigation: The OWASP guide underscores the importance of robust data pipelines, validation protocols, and secure update processes to mitigate poisoning risks.

Sandgarden also recommends:

  • Data Validation Pipelines: Implement automated systems to validate and sanitize data during ingestion. Look for anomalies, duplicates, or unexpected patterns that could signal manipulation.
  • Provenance Tracking: Maintain detailed records of data sources, including metadata, to ensure traceability and accountability.
  • Differential Testing: Compare model outputs against controlled datasets to identify unintended biases or irregularities introduced by poisoned data.
  • Secure Update Practices: Protect model updates with cryptographic signatures to ensure only authorized updates are applied.
  • Adversarial Training: Introduce poisoned data simulations during training to help the model identify and mitigate the impact of malicious inputs in real-world scenarios.
  • Monitoring for Anomalous Behavior: Deploy continuous monitoring tools to flag suspicious activity during the training or inference stages, enabling quick intervention.

Improper Output Handling

Improper output handling occurs when LLMs generate unfiltered, unchecked, or unsanitized responses. This oversight can result in outputs that are misleading, offensive, or inappropriate for their intended audience, creating significant reputational, operational, and legal risks.

Why It Matters: Consider a customer-facing chatbot generating offensive content or an AI-driven content generation tool publishing factually incorrect material. In industries like healthcare, finance, or media, such lapses could lead to user distrust, regulatory penalties, or widespread misinformation. Even seemingly minor errors can escalate into PR disasters, eroding confidence in the system and the organization behind it.

Mitigation: The OWASP guide highlights the need for rigorous post-processing and content moderation to address this challenge. 

Sandgarden also recommends:

  • Post-Processing Pipelines: Automate review processes that analyze and sanitize model outputs before they reach end users. Integrate sentiment analysis and keyword detection to flag potentially harmful responses.
  • Content Moderation Systems: Develop multi-layered moderation strategies that combine automated tools with human oversight for high-risk or public-facing outputs.
  • Contextual Awareness Filters: Implement filters that account for the specific context and audience of the LLM’s outputs, tailoring safeguards to fit different use cases.
  • Explainability Enhancements: Incorporate tools that explain how outputs were generated, enabling developers to trace and correct problematic outputs more effectively.
  • Feedback Loops: Allow users to flag inappropriate or inaccurate responses, feeding this feedback into retraining or refinement cycles to continuously improve output quality.
  • Output Curation for Critical Applications: In high-stakes domains, enforce human review of outputs to ensure factual accuracy and ethical compliance.

Excessive Agency

Excessive agency refers to scenarios where LLMs operate autonomously, taking actions beyond their intended scope without adequate human oversight. This is especially prevalent in agentic architectures, where LLMs interact with other systems or plugins to execute complex workflows.

Why It Matters: Imagine an LLM-enabled assistant with the ability to autonomously perform actions like transferring funds, sending emails, or downloading files. If these capabilities are not properly constrained, unintended or malicious actions could result in financial loss, ethical violations, or significant security breaches. The increasing integration of LLMs into critical workflows magnifies the potential risks, making unchecked autonomy a critical concern for developers and organizations alike.

Mitigation: The OWASP guide emphasizes implementing strict access controls to limit the autonomy of LLMs. 

Sandgarden also recommends:

  • Role-Based Access Controls (RBAC): Define clear roles and permissions for LLMs, ensuring they can only perform actions within explicitly authorized parameters.
  • Action Logging and Auditing: Log all autonomous actions taken by the LLM and regularly audit these logs to identify patterns of misuse or unexpected behavior.
  • Permission-Request Frameworks: Require LLMs to seek explicit approval for actions beyond a defined scope, incorporating human-in-the-loop oversight for critical decisions.
  • Restrict Plugin Capabilities: Vet and limit plugins or integrations that can grant additional functionalities to LLMs, maintaining a curated allowlist for approved components.
  • Sandboxing: Execute high-risk actions within isolated environments where potential damage is contained and monitored.
  • Continuous Policy Updates: Regularly update policies and constraints to align with evolving system requirements and emerging threats in LLM operations.

System Prompt Leakage

System prompt leakage occurs when the hidden system instructions embedded within prompts are unintentionally exposed, giving attackers insights into the model’s behavior and structure. These exposed instructions can be manipulated to alter or bypass the intended functionality of the system, creating a critical vulnerability.

Why It Matters: Hidden prompts often contain sensitive configurations, operational rules, or proprietary logic that govern how the LLM functions. If these prompts are leaked, attackers could exploit the information to manipulate the system or uncover vulnerabilities. For example, a leaked system prompt in a customer service chatbot could reveal proprietary algorithms, lead to unauthorized access, or even allow attackers to bypass authentication steps. As LLMs become central to many applications, safeguarding system prompts is vital to maintaining both security and operational integrity.

Mitigation: The OWASP guide highlights isolating and encrypting system prompts as key strategies. 

Sandgarden also recommends:

  • Prompt Encryption: Encrypt all system prompts using robust cryptographic methods to prevent unauthorized access, both at rest and in transit.
  • Segmentation of Instructions and Data: Separate system-level instructions from user inputs to minimize the risk of accidental exposure during interactions.
  • Dynamic Prompting: Use dynamically generated prompts that are session-specific and expire after a short period, reducing the usefulness of captured data.
  • Strict Access Controls: Limit access to system prompts to only those components and personnel that absolutely require it, employing multi-factor authentication (MFA) where applicable.
  • Testing for Leakage Scenarios: Conduct regular penetration tests and simulated attacks to identify potential leakage vectors and assess system resilience.
  • Masking Techniques: Use masking or abstraction methods in outputs to ensure that any inadvertent references to system prompts are sanitized before being displayed or logged.
  • Contextual Layering: Design prompts to include only the minimum necessary instructions for specific tasks, reducing the risk of leaking sensitive configurations.

Vector and Embedding Weaknesses

Vector and embedding weaknesses occur when attackers exploit vulnerabilities in the mathematical representations used by LLMs to process and retrieve context. These weaknesses are especially critical in systems relying on RAG (Retrieval-Augmented Generation) , where embeddings are used to fetch and ground model outputs in relevant information. A compromised embedding model can result in corrupted context, biased outputs, or the exposure of sensitive data.

Why It Matters: Embeddings are central to the effectiveness of LLMs in handling complex, context-driven queries. For example, a RAG-based customer support system might rely on embeddings to retrieve knowledge base entries. If an attacker manipulates the embeddings, the model might produce incorrect responses, leak sensitive data, or exhibit biased behavior. Beyond technical failures, this can erode user trust, damage brand reputation, and lead to costly operational failures.

Unlike traditional hashing or encryption, embeddings are not inherently secure and can be reverse-engineered to infer sensitive details. The growing reliance on RAG amplifies the potential impact of such vulnerabilities, making their mitigation a high priority.

Mitigation: The OWASP guide emphasizes robust vectorization practices and validation. 

Sandgarden also recommends:

  • Secure Vectorization Protocols: Use encrypted embeddings to ensure that vectors are protected from tampering or interception during storage and transmission.
  • Embedding Validation: Regularly validate the integrity and relevance of embeddings using automated quality checks and adversarial testing.
  • Isolation of Embedding Models: Store embedding models separately from other critical system components to minimize exposure in the event of an attack.
  • Adversarial Training: Train LLMs and embedding systems using adversarial examples to improve robustness against potential exploits.
  • Contextual Sanitization: Before embedding sensitive data, apply sanitization techniques to ensure that no identifiable or proprietary information is encoded.
  • Access Controls on RAG Pipelines: Enforce strict role-based permissions for accessing or modifying embedding models and retrieval pipelines.
  • Monitoring for Anomalous Patterns: Deploy real-time monitoring systems to identify unusual activity, such as unexpected shifts in embedding behavior or context retrieval failures.

Misinformation

Misinformation arises when LLMs generate outputs that appear credible but are factually incorrect or misleading. This issue is often rooted in biases within the training data, gaps in model understanding, or deliberate manipulation by bad actors. The challenge lies in the model’s ability to produce responses with high linguistic confidence, which can make falsehoods seem authoritative.

Why It Matters: The consequences of misinformation extend far beyond technical inaccuracies. Consider a healthcare chatbot dispensing incorrect medical advice or a financial news generator fabricating stock predictions—these scenarios can result in real-world harm, from public panic to financial losses and even life-threatening outcomes.

In industries like public health, education, and media, the stakes are particularly high. A single instance of misinformation can erode trust, damage reputations, and invite legal or regulatory scrutiny. As LLMs are increasingly integrated into decision-making processes, the ability to mitigate misinformation becomes not just a technical requirement but an ethical imperative.

Mitigation: The OWASP guide highlights the importance of validating outputs to prevent misinformation. 

Sandgarden also recommends:

  • Fact-Checking Workflows: Integrate automated fact-checking tools into your pipeline to verify model outputs against trusted sources. Combine this with human review for high-stakes applications.
  • External Knowledge Bases: Use Retrieval-Augmented Generation (RAG) systems to ground LLM responses in up-to-date, authoritative data. This reduces reliance on potentially outdated or biased training datasets.
  • Bias Audits: Regularly assess training data for inherent biases and update datasets to ensure diverse and accurate representation.
  • Confidence Scoring: Implement a confidence metric that flags outputs with lower certainty, prompting additional validation or review before release.
  • Real-Time Monitoring: Deploy monitoring systems to detect patterns of misinformation and trace them back to root causes, such as corrupted training data or adversarial inputs.
  • User Education: Clearly communicate the limitations of LLMs to end-users, ensuring they approach model outputs with a critical eye and cross-check critical information.
  • Adaptive Feedback Loops: Collect user feedback to identify and correct recurring misinformation trends, continuously improving model accuracy.

Unbounded Consumption

Unbounded consumption occurs when an LLM consumes excessive resources, such as processing power, memory, or API calls, leading to operational inefficiencies. This vulnerability can stem from poorly managed resource allocation, unregulated user behavior, or unexpected system demands.

Why It Matters: Imagine a high-traffic application relying on an LLM—such as a customer support chatbot—suddenly experiencing a surge in user queries. Without proper limits, the system could exhaust available resources, leading to costly API overruns or outright service outages. For organizations, the consequences range from ballooning operational costs to lost revenue and diminished user trust.

In sectors like e-commerce, healthcare, or real-time analytics, where high availability is critical, resource overconsumption can disrupt user experiences, tarnish reputations, and even expose companies to regulatory risks if SLAs (Service Level Agreements) are violated.

Mitigation: The OWASP guide emphasizes the importance of resource management and limiting unbounded consumption. 

Sandgarden also recommends:

  • Rate Limiting: Apply strict rate-limiting mechanisms to cap the number of API calls a user or application can make within a specified timeframe. This prevents individual users or malicious actors from monopolizing system resources.
  • Quotas for API Usage: Define usage quotas for different tiers of users or services, aligning consumption with organizational priorities and cost management strategies.
  • Real-Time Resource Monitoring: Deploy monitoring systems to track API usage, memory consumption, and compute demand in real-time. Leverage analytics to detect anomalies or trends that may signal overuse.
  • Dynamic Scaling: Implement auto-scaling solutions to dynamically allocate resources during traffic spikes, ensuring seamless performance without overcommitting to permanent infrastructure costs.
  • Circuit Breaker Patterns: Use circuit breakers to temporarily limit access to certain functionalities or services when resource thresholds are exceeded. This ensures system stability while preventing complete outages.
  • Load Testing: Conduct regular load and stress testing to identify resource bottlenecks and optimize allocation strategies before they become critical issues.
  • User Behavior Controls: Analyze user behavior to identify patterns that lead to excessive resource consumption. Use this data to refine UX/UI designs, nudging users toward more efficient interactions.

Why OWASP’s Lens Matters for LLMs

The OWASP Top 10 for LLM Applications provides a roadmap for navigating the unique security challenges posed by these transformative tools. While the risks are significant, proactive mitigation strategies can protect users, enhance trust, and ensure the long-term success of LLM systems.

At Sandgarden, we specialize in building secure, scalable AI solutions. Whether you’re tackling instruction injection or optimizing resource allocation, we’re here to help you navigate the complexities of LLM security. Learn more about how Sandgarden can help.

/