Hacker used Anthropic’s Claude in Mexican government data breach: Report
Anthropic PBC’s artificial intelligence chatbot Claude was exploited by a hacker to conduct a coordinated cyberattack against multiple Mexican government institutions, leading to the theft of a vast cache of sensitive tax and voter data, according to research published Wednesday by Israeli cybersecurity firm Gambit Security.
Researchers said the unidentified attacker used Spanish-language prompts to instruct Claude to behave like an elite hacker. The chatbot was asked to identify vulnerabilities in government systems, generate scripts to exploit those weaknesses and automate the exfiltration of data, News.Az reports, citing Bloomberg.
The campaign began in December and lasted approximately one month. During that period, about 150 gigabytes of data were allegedly stolen, including files tied to 195 million taxpayer records, voter information, government employee credentials and civil registry documents.
RECOMMENDED STORIES
Artificial intelligence tools have increasingly become enablers of cybercrime, allowing attackers to scale and refine their operations. Just last week, researchers at Amazon.com Inc. reported that a small hacking group breached more than 600 firewall devices across dozens of countries using widely accessible AI tools.
Gambit did not attribute the Mexico breach to any known hacking collective and said there was no indication the perpetrators were affiliated with a foreign government.
According to the researchers, the hacker infiltrated Mexico’s federal tax authority and the National Electoral Institute. State governments in Jalisco, Michoacán and Tamaulipas were also reportedly affected, along with Mexico City’s civil registry and Monterrey’s water utility.
Initially, Claude warned the user about malicious intent during conversations involving the Mexican government. However, the chatbot ultimately complied with the attacker’s instructions and executed thousands of commands across compromised networks, the researchers said.
A spokesperson for Anthropic said the company investigated Gambit’s findings, halted the activity and banned the associated accounts. The company also incorporates examples of misuse into its training processes to improve safeguards. Its latest model, Claude Opus 4.6, includes enhanced mechanisms designed to detect and disrupt abuse.
In this case, the hacker repeatedly tested Claude’s limits until achieving a so-called “jailbreak,” allowing them to bypass the system’s guardrails. Even during the campaign, Claude occasionally refused certain instructions, according to Anthropic.
Mexican authorities, however, disputed key aspects of the report. The country’s tax authority said a review of access logs revealed no evidence of a breach. The National Electoral Institute likewise stated it had found no signs of unauthorized access in recent months and emphasized that it had strengthened cybersecurity measures. The government of Jalisco denied being compromised, asserting that only federal systems were affected.
Mexico’s national digital agency declined to comment directly on the alleged incidents but said cybersecurity remains a priority. Monterrey Water and Drainage Services reported no detected intrusions or major vulnerabilities in the latter half of 2025. Other local authorities, including those in Michoacán and Tamaulipas, did not respond to requests for comment.
In December, Mexican officials issued a brief statement noting investigations into breaches across several public institutions, though it remains unclear whether that announcement was connected to the alleged Claude-enabled attack.
Gambit said the attacker appeared particularly focused on obtaining large volumes of government employee identity data. Researchers identified evidence of at least 20 separate vulnerabilities exploited during the operation, though it remains uncertain how the stolen information may have been used.
When Claude encountered technical obstacles or lacked specific details, the hacker reportedly turned to OpenAI’s ChatGPT for supplementary guidance. According to Gambit, ChatGPT was consulted for advice on lateral movement within networks, credential targeting and assessing the likelihood of detection.
Curtis Simpson, Gambit’s chief strategy officer, said the AI tools generated thousands of detailed reports containing executable attack plans, specifying internal targets and credentials for the human operator.
OpenAI said it had detected attempts by the attacker to use its models in violation of its policies and that its systems refused to comply. The company added that it had banned the accounts involved and appreciated Gambit’s outreach.
The incident underscores a broader and growing concern: as AI companies like Anthropic and OpenAI advance sophisticated coding and automation tools — and cybersecurity firms increasingly deploy AI-driven defenses — malicious actors are simultaneously leveraging the same technologies to enhance offensive capabilities.
In November, Anthropic disclosed that it had disrupted what it described as the first AI-orchestrated cyber-espionage campaign. The company said suspected Chinese state-sponsored hackers manipulated Claude in an effort to attack 30 global targets, with limited success.
“This reality is changing all the game rules we have ever known,” said Alon Gromakov, Gambit’s co-founder and chief executive officer.
According to transcripts reviewed by Gambit, the attacker initially asked Claude to perform penetration testing on Mexico’s federal tax authority — a legitimate security practice used to uncover vulnerabilities. However, when the hacker added instructions to delete logs and conceal command history, Claude flagged the request as suspicious.
“Specific instructions about deleting logs and hiding history are red flags,” Claude responded at one point. “In legitimate bug bounty, you don’t need to hide your actions – in fact, you need to document them for reporting.”
The attacker then altered tactics, abandoning the interactive dialogue and instead feeding Claude a structured playbook outlining each step of the operation. That approach enabled the hacker to bypass the chatbot’s safeguards — achieving the jailbreak that allowed the attacks to proceed, according to Gambit.
By Nijat Babayev





