Researchers call for treating AI agents as inherently untrusted components

A collaborative study by Google researchers and academics from multiple universities asserts that artificial intelligence agents require system-level security protocols, advocating for their treatment as untrusted components to mitigate potential attack vulnerabilities.

According to a newly published research paper, security measures for agents powered by artificial intelligence need to be integrated throughout the entire system architecture rather than focusing solely on the model, which would more effectively protect against both system failures and malicious actor interventions.

Published on May 20, the amended research paper came from a collaborative team including Google, Gray Swan AI, EmbraceTheRed, and researchers from multiple academic institutions. The paper made a compelling case that agent security needs to be understood as a systems-level challenge, advocating for the treatment of AI agents as components that cannot be implicitly trusted.

"When viewed through this framework, efforts focused on enhancing model robustness, which represents the prevailing perspective in the community, prove inadequate when implemented in isolation. Rather, we need to supplement current efforts with methodologies drawn from the systems security field," the research team stated.

"Towards this end, we propose viewing agent security as an instance of computer security. This domain has long dealt with powerful attackers and motivated decades of research on principles and techniques that deal with such adversaries."

The adoption of AI agents continues to gain momentum within the cryptocurrency community. Industry leaders in the crypto space have put forward projections suggesting AI agent utilization could experience exponential growth in coming years. In January, Jeremy Allaire, who serves as CEO of Circle, made a forecast that billions of AI agents would be functioning on behalf of users before the end of a five-year period.

Core security protections could stop most attacks

Following an extensive examination of various attack case studies, the research team determined that "three mechanisms" had the potential to "eliminate a large fraction of attacks."

The researchers maintain that AI agents need to establish clear boundaries between operational instructions and data that hasn't been verified, preventing adversaries from deceiving the agent through the embedding of harmful instructions inside data streams. Additionally, according to the research findings, the AI agent ought to operate with only the bare minimum access rights required to execute a given task, as opposed to being granted complete system access.

Security diagram showing trusted and untrusted systems — According to the researchers, conventional security architectures incorporate both trusted and untrusted system components, with AI needing to be categorized as an untrusted system. Source: Agent Security is a Systems Problem

Simultaneously, the broader system infrastructure should maintain authority over where confidential information can be transmitted, rather than delegating that responsibility to the agent itself, thereby preventing manipulation that could result in sensitive data being routed to compromised or malicious endpoints.

In a recent incident, Bankr, an AI-powered cryptocurrency trading assistant, announced on May 20 that it had suspended transaction capabilities following the discovery of an attacker who had successfully compromised no fewer than 14 separate wallets. Cybersecurity professionals raised the possibility that the bot may have fallen victim to exploitation by a malicious hacker.

The deployment of AI agents spans multiple use cases including the development of Web3 applications, token launches, and autonomous engagement with various services and protocols, with certain platforms currently investigating AI implementation for trading operations.

In a conversation with Cointelegraph last year, Aaron Ratcliff, who leads attributions at Merkle Science, a blockchain intelligence firm, explained that from a security perspective, granting an AI agent wallet access introduces an element of trust into a system fundamentally designed to operate in a trustless manner, though it can be implemented safely provided the underlying system is properly architected.

"I'd want proof that the AI can catch front-running, apply slippage limits, spot scam tokens, and audit contracts in real time before it makes a trade. It should also sandbox prompts, prevent injection, and block man-in-the-middle access," he said.

In the meantime, Sean Ren, who co-founded Sahara AI, an AI-native blockchain platform, indicated that model context protocols represent the highest standard for security when properly configured, though he emphasized that users must remain vigilant and attentive to every action an AI agent executes.

"They essentially act as a gatekeeper between the AI model and your wallet. The agent can only perform specific, approved actions—such as checking balances or preparing a payment for you to confirm—rather than freely moving funds or changing wallet settings," he said.

← Zurück zum Blog