The AI Agent Security Crisis: Four Hard Lessons from San Francisco

The “agentic” honeymoon is over. If 2024 was the year of the demo and 2025 was the year of the pilot, June 2026 has become the year of the reckoning. At the recent AI Agent Security Summit in San Francisco, the sentiment at the Commonwealth Club wasn’t about whether agents should be deployed, but how to stop them from inadvertently burning the house down.

We’ve moved past simple prompt injections. We are now dealing with “skill poisoning,” autonomous zero-day discovery, and agents that can be tricked into doing anything they are technically capable of doing. If your security strategy still treats an AI agent like a standard SaaS app, you’re already behind. OpenAI acquires Promptfoo to secure the future of AI agents, signaling a massive shift toward automated red-teaming to protect real-world infrastructure from agentic failure.

The Why: The Perimeter is Dead

Traditional security relies on silos: identity, endpoint, and application. AI agents break all of them. An agent, by design, spans across your confidential data, your browser, and your internal APIs. It isn’t just a tool; it’s a persistent identity with the power to act. When Anthropic’s “Project Glasswing” (Claude Mythos) launched earlier this spring, it didn’t just find bugs—it found thousands of zero-days by acting autonomously. However, Anthropic’s Mythos AI model can discover zero-day vulnerabilities in minutes, leading to significant concerns about the risks of automated software discovery. The problem this solves? Bridging the gap between the speed of AI development and the slow, ticket-based reality of traditional IT security.

Step-by-Step: Securing Your Agentic Ecosystem

1. Map the “Intent” path. Stop looking at raw logs and start looking at business context. Use Zenity or similar AI-SPM tools to trace an agent’s actions back to its original goal. If an HR bot is suddenly querying the engineering codebase, the “intent” has been hijacked, even if the credentials used are technically valid.

2. Sanitize your MCP (Model Context Protocol) sources. Just as we once worried about SQL injection, we now face “MCP poisoning.” Ensure that any external knowledge source your agent “reads” is verified. Attackers are currently seeding public data with instructions designed to override agent goals. Discover how People.ai uses the Model Context Protocol (MCP) to eliminate data blindness, providing a blueprint for connecting AI agents to unstructured data securely.

3. Audit the Skill Registry. If your builders are using open-source skill libraries like ClawHub, you need a whitelist. In early 2026, malicious skills—pre-packaged sets of instructions that look helpful but exfiltrate data—exploded from zero to over 700 in a single month.

4. Implement “Proof of Ownership” for Actions. Do not allow agents to execute high-stakes actions (money transfers, permission changes, data deletions) without a “Human-in-the-loop” (HITL) trigger. Use a “least-privilege” identity for the agent that is separate from the user who invoked it.

💡 Pro-Tip: Treat “Natural Language” as untrusted code. Most dev teams forget that a prompt is actually an execution script. Use a secondary, “shadow” LLM whose only job is to analyze the primary agent’s output for mission-creep or intent-shift before the action hits your API.

The Buyer’s Perspective: Governance vs. Speed

The market is currently split between “Platform Security” (AWS Bedrock, Azure AI Foundry) and “Third-Party Governance” (Zenity, etc.).

While big cloud providers offer robust internal guardrails, they often suffer from “platform-blindness.” They can’t see what your agent is doing when it leaves their ecosystem to interact with a Salesforce instance or a local device. Palo Alto Networks acquires Protect AI to revolutionize AI security, highlighting how major players are moving to enhance AI/ML protection through next-gen solutions. Third-party AI security tools are essential for teams running “Home Grown” or cross-platform agents. The value prop here isn’t just “stopping hackers”—it’s providing the compliance documentation your Legal and HR teams need to actually let you flip the “on” switch.

FAQ

What is the biggest threat to AI agents in 2026?
Skill poisoning. Attackers upload “useful” functions to open-source registries that contain hidden instructions to bypass safety filters or leak session tokens.

Does “Human-in-the-loop” slow down automation too much?
It depends on the “blast radius.” For low-risk tasks (summarizing meetings), it’s unnecessary. For “Agentic AI” with write-access to databases, it is the only thing standing between a productive Tuesday and a catastrophic data breach.

Can’t I just use a better System Prompt to stay secure?
No. As Google’s Naveed Makhani pointed out, anything an agent can do, it can be tricked into doing. Hard-coded boundaries in the infrastructure always beat “polite requests” in a system prompt.

Ethical Note

Current AI security tools can identify intent and block known malicious patterns, but they cannot predict the emergent behavior of “frontier” models that have not yet been stress-tested in the wild.