Open-source AI models face scrutiny as safety features prove easily removable

Testing by the Financial Times revealed that security measures on openly available AI models from Google and Meta can be eliminated within minutes, sparking regulatory debates.

Security measures built into open-source artificial intelligence models from leading tech companies can be stripped away within minutes through the use of tools that are freely accessible to the public, enabling these systems to generate content related to bioweapons, malicious software and other restricted subjects, based on testing performed by the Financial Times in collaboration with AI safety organization Alice.

The results made public on Monday contribute to growing worries that protective mechanisms implemented by developers might not remain intact after model weights become publicly available and subject to alteration, prompting discussions about the appropriate allocation of responsibility for ensuring AI safety.

The examination, carried out with instruments obtainable from public code repositories, revealed that protective barriers on models created by corporations such as Meta and Google could be eliminated in less than 10 minutes without the need for specialized computing equipment.

Altered iterations of these systems subsequently demonstrated the ability to address queries that the unmodified models rejected, including inquiries related to malicious software and chemical dangers, based on testing outcomes.

The findings underscore a significant obstacle for regulatory authorities as open-source systems grow increasingly sophisticated and achieve broader distribution.

In contrast to proprietary models, open-source systems are available for download, modification and redistribution beyond the oversight of their original creators, complicating the enforcement of safety limitations after release and prompting inquiries into whether regulatory frameworks concentrated primarily on model creation are adequate.

Governance limits

Regulatory bodies worldwide are constructing frameworks for advanced AI systems, such as the European Union's AI Act and developing frontier model safety methodologies in the United Kingdom and the United States. Nevertheless, specialists suggest the discoveries expose weaknesses in existing governance assumptions.

Markus Levin, co-founder of decentralized physical infrastructure network company XYO, told Cointelegraph the rapid removal of safeguards shows "how quickly control shifts once open models are released," adding that most governance proposals still focus too heavily on the model-building stage.

David Minarsch, a founding member of Olas and chief executive of Valory, an AI agent platform, told Cointelegraph that governments were unlikely to prevent determined actors from accessing or modifying models once weights are widely mirrored online. He said regulation would be more effective if focused on deployment, distribution and harmful real-world use rather than the original developer layer alone.

Control moves downstream

Ronghui Gu, chief executive and co-founder of CertiK, a blockchain security firm, told Cointelegraph that governance at the developer layer still matters, but becomes insufficient once models can be freely downloaded and redistributed.

Gu said policymakers were more likely to influence commercial hosting, enterprise deployment and distribution channels than prevent the spread of modified models entirely.

He argued that security standards must evolve to identify malicious or high-risk behavior in third-party AI tools and autonomous AI agent environments before deployment to better contain runtime threats as agents take on more autonomous roles.

Levin said containment becomes increasingly difficult once models are mirrored and redistributed, meaning policymakers may need to focus more on infrastructure and distribution points rather than model design alone.

Both Levin and Minarsch compared the issue to open-source software and crypto networks, where attempts to suppress distribution have historically proven difficult once code is publicly available. Minarsch added that while safety layers can deter casual misuse, they should not be mistaken for robust protection against sophisticated actors.

← Powrót do bloga