News & Articles from various sources.
- Anthropic says DeepSeek, Moonshot, and MiniMax used 24,000 fake accounts to rip off Claude
Anthropic dropped a bombshell on the artificial intelligence industry Monday, publicly accusing three prominent Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — of orchestrating coordinated, industrial-scale campaigns to siphon capabilities from its Claude models using tens of thousands of fraudulent accounts.The San Francisco-based company said the three labs collectively generated more than 16 million exchanges with Claude through approximately 24,000 fake accounts, all in violation of Anthropic's terms of service and regional access restrictions. The campaigns, Anthropic said, are the most concrete and detailed public evidence to date of a practice that has haunted Silicon Valley for months: foreign competitors systematically using a technique called distillation to leapfrog years of research and billions of dollars in investment."These campaigns are growing in intensity and sophistication," Anthropic wrote in a technical blog post published Monday. "The window to act is narrow, and the threat extends beyond any single company or region. Addressing it will require rapid, coordinated action among industry players, policymakers, and the global AI community."The disclosure marks a dramatic escalation in the simmering tensions between American and Chinese AI developers — and it arrives at a moment when Washington is actively debating whether to tighten or loosen export controls on the advanced chips that power AI training. Anthropic, led by CEO Dario Amodei, has been among the most vocal advocates for restricting chip sales to China, and the company explicitly connected Monday's revelations to that policy fight.How AI distillation went from obscure research technique to geopolitical flashpointTo understand what Anthropic alleges, it helps to understand what distillation actually is — and how it evolved from an academic curiosity into the most contentious issue in the global AI race.At its core, distillation is a process of extracting knowledge from a larger, more powerful AI model — the "teacher" — to create a smaller, more efficient one — the "student." The student model learns not from raw data, but from the teacher's outputs: its answers, reasoning patterns, and behaviors. Done correctly, the student can achieve performance remarkably close to the teacher's while requiring a fraction of the compute to train.As Anthropic itself acknowledged, distillation is "a widely used and legitimate training method." Frontier AI labs, including Anthropic, routinely distill their own models to create smaller, cheaper versions for customers. But the same technique can be weaponized. A competitor can pose as a legitimate customer, bombard a frontier model with carefully crafted prompts, collect the outputs, and use those outputs to train a rival system — capturing capabilities that took years and hundreds of millions of dollars to develop.The technique burst into public consciousness in January 2025 when DeepSeek released its R1 reasoning model, which appeared to match or approach the performance of leading American models at dramatically lower cost. Databricks CEO Ali Ghodsi captured the industry's anxiety at the time, telling CNBC: "This distillation technique is just so extremely powerful and so extremely cheap, and it's just available to anyone." He predicted the technique would usher in an era of intense competition for large language models.That prediction proved prescient. In the weeks following DeepSeek's release, researchers at UC Berkeley said they recreated OpenAI's reasoning model for just $450 in 19 hours. Researchers at Stanford and the University of Washington followed with their own version built in 26 minutes for under $50 in compute credits. The startup Hugging Face replicated OpenAI's Deep Research feature as a 24-hour coding challenge. DeepSeek itself openly released a family of distilled models on Hugging Face — including versions built on top of Qwen and Llama architectures — under the permissive MIT license, with the model card explicitly stating that the DeepSeek-R1 series supports commercial use and allows for any modifications and derivative works, "including, but not limited to, distillation for training other LLMs."But what Anthropic described Monday goes far beyond academic replication or open-source experimentation. The company detailed what it characterized as deliberate, covert, and large-scale intellectual property extraction by well-resourced commercial laboratories operating under the jurisdiction of the Chinese government.Anthropic traces 16 million fraudulent exchanges to researchers at DeepSeek, Moonshot, and MiniMaxAnthropic attributed each campaign "with high confidence" through IP address correlation, request metadata, infrastructure indicators, and corroboration from unnamed industry partners who observed the same actors on their own platforms. Each campaign specifically targeted what Anthropic described as Claude's most differentiated capabilities: agentic reasoning, tool use, and coding.DeepSeek, the company that ignited the distillation debate, conducted what Anthropic described as the most technically sophisticated of the three operations, generating over 150,000 exchanges with Claude. Anthropic said DeepSeek's prompts targeted reasoning capabilities, rubric-based grading tasks designed to make Claude function as a reward model for reinforcement learning, and — in a detail likely to draw particular political attention — the creation of "censorship-safe alternatives to policy sensitive queries."Anthropic alleged that DeepSeek "generated synchronized traffic across accounts" with "identical patterns, shared payment methods, and coordinated timing" that suggested load balancing to maximize throughput while evading detection. In one particularly notable technique, Anthropic said DeepSeek's prompts "asked Claude to imagine and articulate the internal reasoning behind a completed response and write it out step by step — effectively generating chain-of-thought training data at scale." The company also alleged it observed tasks in which Claude was used to generate alternatives to politically sensitive queries about "dissidents, party leaders, or authoritarianism," likely to train DeepSeek's own models to steer conversations away from censored topics. Anthropic said it was able to trace these accounts to specific researchers at the lab.Moonshot AI, the Beijing-based creator of the Kimi models, ran the second-largest operation by volume at over 3.4 million exchanges. Anthropic said Moonshot targeted agentic reasoning and tool use, coding and data analysis, computer-use agent development, and computer vision. The company employed "hundreds of fraudulent accounts spanning multiple access pathways," making the campaign harder to detect as a coordinated operation. Anthropic attributed the campaign through request metadata that "matched the public profiles of senior Moonshot staff." In a later phase, Anthropic said, Moonshot adopted a more targeted approach, "attempting to extract and reconstruct Claude's reasoning traces."MiniMax, the least publicly known of the three but the most prolific by volume, generated over 13 million exchanges — more than three-quarters of the total. Anthropic said MiniMax's campaign focused on agentic coding, tool use, and orchestration. The company said it detected MiniMax's campaign while it was still active, "before MiniMax released the model it was training," giving Anthropic "unprecedented visibility into the life cycle of distillation attacks, from data generation through to model launch." In a detail that underscores the urgency and opportunism Anthropic alleges, the company said that when it released a new model during MiniMax's active campaign, MiniMax "pivoted within 24 hours, redirecting nearly half their traffic to capture capabilities from our latest system."How proxy networks and 'hydra cluster' architectures helped Chinese labs bypass Anthropic's China banAnthropic does not currently offer commercial access to Claude in China, a policy it maintains for national security reasons. So how did these labs access the models at all?The answer, Anthropic said, lies in commercial proxy services that resell access to Claude and other frontier AI models at scale. Anthropic described these services as running what it calls "hydra cluster" architectures — sprawling networks of fraudulent accounts that distribute traffic across Anthropic's API and third-party cloud platforms. "The breadth of these networks means that there are no single points of failure," Anthropic wrote. "When one account is banned, a new one takes its place." In one case, Anthropic said, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated customer requests to make detection harder.The description suggests a mature and well-resourced infrastructure ecosystem dedicated to circumventing access controls — one that may serve many more clients than just the three labs Anthropic named.Why Anthropic framed distillation as a national security crisis, not just an IP disputeAnthropic did not treat this as a mere terms-of-service violation. The company embedded its technical disclosure within an explicit national security argument, warning that "illicitly distilled models lack necessary safeguards, creating significant national security risks."The company argued that models built through illicit distillation are "unlikely to retain" the safety guardrails that American companies build into their systems — protections designed to prevent AI from being used to develop bioweapons, carry out cyberattacks, or enable mass surveillance. "Foreign labs that distill American models can then feed these unprotected capabilities into military, intelligence, and surveillance systems," Anthropic wrote, "enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance."This framing directly connects to the chip export control debate that Amodei has made a centerpiece of his public advocacy. In a detailed essay published in January 2025, Amodei argued that export controls are "the most important determinant of whether we end up in a unipolar or bipolar world" — a world where either only the U.S. and its allies possess the most powerful AI, or one where China achieves parity. He specifically noted at the time that he was "not taking any position on reports of distillation from Western models" and would "just take DeepSeek at their word that they trained it the way they said in the paper."Monday's disclosure is a sharp departure from that earlier restraint. Anthropic now argues that distillation attacks "undermine" export controls "by allowing foreign labs, including those subject to the control of the Chinese Communist Party, to close the competitive advantage that export controls are designed to preserve through other means." The company went further, asserting that "without visibility into these attacks, the apparently rapid advancements made by these labs are incorrectly taken as evidence that export controls are ineffective." In other words, Anthropic is arguing that what some observers interpreted as proof that Chinese labs can innovate around chip restrictions was actually, in significant part, the result of stealing American capabilities.The murky legal landscape around AI distillation may explain Anthropic's political strategyAnthropic's decision to frame this as a national security issue rather than a legal dispute may reflect the difficult reality that intellectual property law offers limited recourse against distillation.As a March 2025 analysis by the law firm Winston & Strawn noted, "the legal landscape surrounding AI distillation is unclear and evolving." The firm's attorneys observed that proving a copyright claim in this context would be challenging, since it remains unclear whether the outputs of AI models qualify as copyrightable creative expression. The U.S. Copyright Office affirmed in January 2025 that copyright protection requires human authorship, and that "mere provision of prompts does not render the outputs copyrightable."The legal picture is further complicated by the way frontier labs structure output ownership. OpenAI's terms of use, for instance, assign ownership of model outputs to the user — meaning that even if a company can prove extraction occurred, it may not hold copyrights over the extracted data. Winston & Strawn noted that this dynamic means "even if OpenAI can present enough evidence to show that DeepSeek extracted data from its models, OpenAI likely does not have copyrights over the data." The same logic would almost certainly apply to Anthropic's outputs.Contract law may offer a more promising avenue. Anthropic's terms of service prohibit the kind of systematic extraction the company describes, and violation of those terms is a more straightforward legal claim than copyright infringement. But enforcing contractual terms against entities operating through proxy services and fraudulent accounts in a foreign jurisdiction presents its own formidable challenges.This may explain why Anthropic chose the national security frame over a purely legal one. By positioning distillation attacks as threats to export control regimes and democratic security rather than as intellectual property disputes, Anthropic appeals to policymakers and regulators who have tools — sanctions, entity list designations, enhanced export restrictions — that go far beyond what civil litigation could achieve.What Anthropic's distillation crackdown means for every company running a frontier AI modelAnthropic outlined a multipronged defensive response. The company said it has built classifiers and behavioral fingerprinting systems designed to identify distillation attack patterns in API traffic, including detection of chain-of-thought elicitation used to construct reasoning training data. It is sharing technical indicators with other AI labs, cloud providers, and relevant authorities to build what it described as a more holistic picture of the distillation landscape. The company has also strengthened verification for educational accounts, security research programs, and startup organizations — the pathways most commonly exploited for setting up fraudulent accounts — and is developing model-level safeguards designed to reduce the usefulness of outputs for illicit distillation without degrading the experience for legitimate customers.But the company acknowledged that "no company can solve this alone," calling for coordinated action across the industry, cloud providers, and policymakers.The disclosure is likely to reverberate through multiple ongoing policy debates. In Congress, the bipartisan No DeepSeek on Government Devices Act has already been introduced. Federal agencies including NASA have banned DeepSeek from employee devices. And the broader question of chip export controls — which the Trump administration has been weighing amid competing pressures from Nvidia and national security hawks — now has a new and vivid data point.For the AI industry's technical decision-makers, the implications are immediate and practical. If Anthropic's account is accurate, the proxy infrastructure enabling these attacks is vast, sophisticated, and adaptable — and it is not limited to targeting a single company. Every frontier AI lab with an API is a potential target. The era of treating model access as a simple commercial transaction may be coming to an end, replaced by one in which API security is as strategically important as the model weights themselves.Anthropic has now put names, numbers, and forensic detail behind accusations that the industry had only whispered about for months. Whether that evidence galvanizes the coordinated response the company is calling for — or simply accelerates an arms race between distillers and defenders — may depend on a question no classifier can answer: whether Washington sees this as an act of espionage or just the cost of doing business in an era when intelligence itself has become a commodity.
- Microsoft Copilot Ignored Sensitivity Labels, Processed Confidential Emails
A code bug blew past every security label in the book… and exposed the fatal flaw in how we govern AI. The post Microsoft Copilot Ignored Sensitivity Labels, Processed Confidential Emails appeared first on TechRepublic.
- Conduent Breach Surges to Over 25M, Could Be Largest in US History
New state filings suggest the Conduent breach may affect more than 25 million Americans, with Texas alone reporting 15.4 million impacted residents. The post Conduent Breach Surges to Over 25M, Could Be Largest in US History appeared first on TechRepublic.
- Over 200K Australian Driver’s Licences Exposed in youX Cyber Breach
A youX breach exposed sensitive borrower data in Australia, including over 200,000 driver’s licence numbers, raising fraud and phishing risks. The post Over 200K Australian Driver’s Licences Exposed in youX Cyber Breach appeared first on TechRepublic.
- Anthropic's Claude Code Security is available now after finding 500+ vulnerabilities: how security leaders should respond
Anthropic pointed its most advanced AI model, Claude Opus 4.6, at production open-source codebases and found a plethora of security holes: more than 500 high-severity vulnerabilities that had survived decades of expert review and millions of hours of fuzzing, with each candidate vetted through internal and external security review before disclosure. Fifteen days later, the company productized the capability and launched Claude Code Security.Security directors responsible for seven-figure vulnerability management stacks should expect a common question from their boards in the next review cycle. VentureBeat anticipates the emails and conversations will start with, "How do we add reasoning-based scanning before attackers get there first?", because as Anthropic's review found, simply pointing an AI model at exposed code can be enough to identify — and in the case of malicious actors, exploit — security lapses in production code. The answer matters more than the number, and it is primarily structural: how your tooling and processes allocate work between pattern-based scanners and reasoning-based analysis. CodeQL and the tools built on it match code against known patterns. Claude Code Security, which Anthropic launched February 20 as a limited research preview, reasons about code the way a human security researcher would. It follows how data moves through an application and catches flaws in business logic and access control that no rule set covers.The board conversation security leaders need to have this weekFive hundred newly discovered zero-days is less a scare statistic than a standing budget justification for rethinking how you fund code security. The reasoning capability Claude Code Security represents, and its inevitable competitors, need to drive the procurement conversation. Static application security testing (SAST) catches known vulnerability classes. Reasoning-based scanners find what pattern-matching was never designed to detect. Both have a role.Anthropic published the zero-day research on February 5. Fifteen days later, they shipped the product. While it's the same model and capabilities, it is now available to Enterprise and Team customers.What Claude does that CodeQL couldn'tGitHub has offered CodeQL-based scanning through Advanced Security for years, and added Copilot Autofix in August 2024 to generate LLM-suggested fixes for alerts. Security teams rely on it. But the detection boundary is the CodeQL rule set, and everything outside that boundary stays invisible.Claude Code Security extends that boundary by generating and testing its own hypotheses about how data and control flow through an application, including cases where no existing rule set describes. CodeQL solves the problem it was built to solve: data-flow analysis within predefined queries. It tells you whether tainted input reaches a dangerous function.CodeQL is not designed to autonomously read a project's commit history, infer an incomplete patch, trace that logic into another file, and then assemble a working proof-of-concept exploit end to end. Claude did exactly that on GhostScript, OpenSC, and CGIF, each time using a different reasoning strategy."The real shift is from pattern-matching to hypothesis generation," said Merritt Baer, CSO at Enkrypt AI, advisor to Andesite and AppOmni, and former Deputy CISO at AWS, in an exclusive interview with VentureBeat. "That's a step-function increase in discovery power, and it demands equally strong human and technical controls."Three proof points from Anthropic's published methodology show where pattern-matching ends and hypothesis generation begins.Commit history analysis across files. GhostScript is a widely deployed utility for processing PostScript and PDF files. Fuzzing turned up nothing, and neither did manual analysis. Then Claude pulled the Git commit history, found a patch that added stack bounds checking for font handling in gstype1.c, and reversed the logic: if the fix was needed there, every other call to that function without the fix was still vulnerable. In gdevpsfx.c, a completely different file, the call to the same function lacked the bounds checking patched elsewhere. Claude built a working proof-of-concept crash. No CodeQL rule describes that bug today. The maintainers have since patched it.Reasoning about preconditions that fuzzers can't reach. OpenSC processes smart card data. Standard approaches failed here, too, so Claude searched the repository for function calls that are frequently vulnerable and found a location where multiple strcat operations ran in succession without length checking on the output buffer. Fuzzers rarely reached that code path because too many preconditions stood in the way. Claude reasoned about which code fragments looked interesting, constructed a buffer overflow, and proved the vulnerability.Algorithm-level edge cases that no coverage metric catches. CGIF is a library for processing GIF files. This vulnerability required understanding how LZW compression builds a dictionary of tokens. CGIF assumed compressed output would always be smaller than uncompressed input, which is almost always true. Claude recognized that if the LZW dictionary filled up and triggered resets, the compressed output could exceed the uncompressed size, overflowing the buffer. Even 100% branch coverage wouldn't catch this. The flaw demands a particular sequence of operations that exercises an edge case in the compression algorithm itself. Random input generation almost never produces it. Claude did.Baer sees something broader in that progression. "The challenge with reasoning isn't accuracy, it's agency," she told VentureBeat. "Once a system can form hypotheses and pursue them, you've shifted from a lookup tool to something that can explore your environment in ways that are harder to predict and constrain."How Anthropic validated 500+ findingsAnthropic placed Claude inside a sandboxed virtual machine with standard utilities and vulnerability analysis tools. The red team didn't provide any specialized instructions, custom harnesses, or task-specific prompting. Just the model and the code.The red team focused on memory corruption vulnerabilities because they're the easiest to confirm objectively. Crash monitoring and address sanitizers don't leave room for debate. Claude filtered its own output, deduplicating and reprioritizing before human researchers touched anything. When the confirmed count kept climbing, Anthropic brought in external security professionals to validate findings and write patches.Every target was an open-source project underpinning enterprise systems and critical infrastructure. Small teams maintain many of them, staffed by volunteers, not security professionals. When a vulnerability sits in one of these projects for a decade, every product that pulls from it inherits the risk.Anthropic didn't start with the product launch. The defensive research spans more than a year. The company entered Claude in competitive Capture-the-Flag events where it ranked in the top 3% of PicoCTF globally, solved 19 of 20 challenges in the HackTheBox AI vs Human CTF, and placed 6th out of 9 teams defending live networks against human red team attacks at Western Regional CCDC. Anthropic also partnered with Pacific Northwest National Laboratory to test Claude against a simulated water treatment plant. PNNL's researchers estimated that the model completed adversary emulation in three hours. The traditional process takes multiple weeks.The dual-use question security leaders can't avoidThe same reasoning that finds a vulnerability can help an attacker exploit one. Frontier Red Team leader Logan Graham acknowledged this directly to Fortune's Sharon Goldman. He told Fortune the models can now explore codebases autonomously and follow investigative leads faster than a junior security researcher.Gabby Curtis, Anthropic's communications lead, told VentureBeat in an exclusive interview the company built Claude Code Security to make defensive capabilities more widely available, "tipping the scales towards defenders." She was equally direct about the tension: "The same reasoning that helps Claude find and fix a vulnerability could help an attacker exploit it, so we're being deliberate about how we release this."In interviews with more than 40 CISOs across industries, VentureBeat found that formal governance frameworks for reasoning-based scanning tools are the exception, not the norm. The most common responses are that the area was considered so nascent that many CISOs didn't think this capability would arrive so early in 2026.The question every security director has to answer before deploying this: if I give my team a tool that finds zero-days through reasoning, have I unintentionally expanded my internal threat surface?"You didn't weaponize your internal surface, you revealed it," Baer told VentureBeat. "These tools can be helpful, but they also may surface latent risk faster and more scalably. The same tool that finds zero-days for defense can expose gaps in your threat model. Keep in mind that most intrusions don't come from zero-days, they come from misconfigurations.""In addition to the access and attack path risk, there is IP risk," she said. "Not just exfiltration, but transformation. Reasoning models can internalize and re-express proprietary insights in ways that blur the line between use and leakage."The release is deliberately constrained. Enterprise and Team customers only, through a limited research preview. Open-source maintainers apply for free expedited access. Findings go through multi-stage self-verification before reaching an analyst, with severity ratings and confidence scores attached. Every patch requires human approval.Anthropic also built detection into the model itself. In a blog post detailing the safeguards, the company described deploying probes that measure activations within the model as it generates responses, with new cyber-specific probes designed to track potential misuse. On the enforcement side, Anthropic is expanding its response capabilities to include real-time intervention, including blocking traffic it detects as malicious.Graham was direct with Axios: the models are extremely good at finding vulnerabilities, and he expects them to get much better still. VentureBeat asked Anthropic for the false-positive rate before and after self-verification, the number of disclosed vulnerabilities with patches landed versus still in triage, and the specific safeguards that distinguish attacker use from defender use. The lead researcher on the 500-vulnerability project was unavailable, and the company declined to share specific attacker-detection mechanisms to avoid tipping off threat actors."Offense and defense are converging in capability," Baer said. "The differentiator is oversight. If you can't audit and bound how the tool is used, you've created another risk."That speed advantage doesn't favor defenders by default. It favors whoever adopts it first. Security directors who move early set the terms.Anthropic isn't alone. The pattern is repeating.Security researcher Sean Heelan used OpenAI's o3 model with no custom tooling and no agentic framework to discover CVE-2025-37899, a previously unknown use-after-free vulnerability in the Linux kernel's SMB implementation. The model analyzed over 12,000 lines of code and identified a race condition that traditional static analysis tools consistently missed because detecting it requires understanding concurrent thread interactions across connections.Separately, AI security startup AISLE discovered all 12 zero-day vulnerabilities announced in OpenSSL's January 2026 security patch, including a rare high-severity finding (CVE-2025-15467, a stack buffer overflow in CMS message parsing that is potentially remotely exploitable without valid key material). AISLE co-founder and chief scientist Stanislav Fort reported that his team's AI system accounted for 13 of the 14 total OpenSSL CVEs assigned in 2025. OpenSSL is among the most scrutinized cryptographic libraries on the planet. Fuzzers have run against it for years. The AI found what they were not designed to find.The window is already openThose 500 vulnerabilities live in open-source projects that enterprise applications depend on. Anthropic is disclosing and patching, but the window between discovery and adoption of those patches is where attackers operate today.The same model improvements behind Claude Code Security are available to anyone with API access.If your team is evaluating these capabilities, the limited research preview is the right place to start, with clearly defined data handling rules, audit logging, and success criteria agreed up front.
- Human-related security risks rose 90% in 2025
2025 saw a rise in AI-related security risks.
- 41% of Organizations Have Hired a Fake Candidate
Deepfakes are leading to fraudulent job positions and hirings.
- PayPal Flaw Exposed Email Addresses, Social Security Numbers for 6 Months
PayPal disclosed a software error in its Working Capital platform that exposed sensitive customer data, including Social Security numbers, for months in 2025. The post PayPal Flaw Exposed Email Addresses, Social Security Numbers for 6 Months appeared first on TechRepublic.
- Microsoft Copilot ignored sensitivity labels twice in eight months — and no DLP stack caught either one
For four weeks starting January 21, Microsoft's Copilot read and summarized confidential emails despite every sensitivity label and DLP policy telling it not to. The enforcement points broke inside Microsoft’s own pipeline, and no security tool in the stack flagged it. Among the affected organizations was the U.K.'s National Health Service, which logged it as INC46740412 — a signal of how far the failure reached into regulated healthcare environments. Microsoft tracked it as CW1226324. The advisory, first reported by BleepingComputer on February 18, marks the second time in eight months that Copilot’s retrieval pipeline violated its own trust boundary — a failure in which an AI system accesses or transmits data it was explicitly restricted from touching. The first was worse.In June 2025, Microsoft patched CVE-2025-32711, a critical zero-click vulnerability that Aim Security researchers dubbed “EchoLeak.” One malicious email bypassed Copilot’s prompt injection classifier, its link redaction, its Content-Security-Policy, and its reference mentions to silently exfiltrate enterprise data. No clicks and no user action were required. Microsoft assigned it a CVSS score of 9.3.Two different root causes; one blind spot: A code error and a sophisticated exploit chain produced an identical outcome. Copilot processed data it was explicitly restricted from touching, and the security stack saw nothing.Why EDR and WAF continue to be architecturally blind to thisEndpoint detection and response (EDR) monitors file and process behavior. Web application firewalls (WAFs) inspect HTTP payloads. Neither has a detection category for “your AI assistant just violated its own trust boundary.” That gap exists because LLM retrieval pipelines sit behind an enforcement layer that traditional security tools were never designed to observe.Copilot ingested a labeled email it was told to skip, and the entire action happened inside Microsoft's infrastructure. Between the retrieval index and the generation model. Nothing dropped to disk, no anomalous traffic crossed the perimeter, and no process spawned for an endpoint agent to flag. The security stack reported all-clear because it never saw the layer where the violation occurred.The CW1226324 bug worked because a code-path error allowed messages in Sent Items and Drafts to enter Copilot’s retrieval set despite sensitivity labels and DLP rules that should have blocked them, according to Microsoft’s advisory. EchoLeak worked because Aim Security’s researchers proved that a malicious email, phrased to look like ordinary business correspondence, could manipulate Copilot’s retrieval-augmented generation pipeline into accessing and transmitting internal data to an attacker-controlled server.Aim Security's researchers characterized it as a fundamental design flaw: agents process trusted and untrusted data in the same thought process, making them structurally vulnerable to manipulation. That design flaw did not disappear when Microsoft patched EchoLeak. CW1226324 proves the enforcement layer around it can fail independently.The five-point audit that maps to both failure modesNeither failure triggered a single alert. Both were discovered through vendor advisory channels — not through SIEM, not through EDR, not through WAF. CW1226324 went public on February 18. Affected tenants had been exposed since January 21. Microsoft has not disclosed how many organizations were affected or what data was accessed during that window. For security leaders, that gap is the story: a four-week exposure inside a vendor's inference pipeline, invisible to every tool in the stack, discovered only because Microsoft chose to publish an advisory.1. Test DLP enforcement against Copilot directly. CW1226324 existed for four weeks because no one tested whether Copilot actually honored sensitivity labels on Sent Items and Drafts. Create labeled test messages in controlled folders, query Copilot and confirm it cannot surface them. Run this test monthly. Configuration is not enforcement; the only proof is a failed retrieval attempt.2. Block external content from reaching Copilot’s context window. EchoLeak succeeded because a malicious email entered Copilot’s retrieval set and its injected instructions executed as if they were the user’s query. The attack bypassed four distinct defense layers: Microsoft’s cross-prompt injection classifier, external link redaction, Content-Security-Policy controls, and reference mention safeguards, according to Aim Security’s disclosure. Disable external email context in Copilot settings, and restrict Markdown rendering in AI outputs. This catches the prompt-injection class of failure by removing the attack surface entirely.3. Audit Purview logs for anomalous Copilot interactions during the January through February exposure window. Look for Copilot Chat queries that returned content from labeled messages between January 21 and mid-February 2026. Neither failure class produced alerts through existing EDR or WAF, so retrospective detection depends on Purview telemetry. If your tenant cannot reconstruct what Copilot accessed during the exposure window, document that gap formally. It matters for compliance. For any organization subject to regulatory examination, an undocumented AI data access gap during a known vulnerability window is an audit finding waiting to happen.4. Turn on Restricted Content Discovery for SharePoint sites with sensitive data. RCD removes sites from Copilot’s retrieval pipeline entirely. It works regardless of whether the trust violation comes from a code bug or an injected prompt, because the data never enters the context window in the first place. This is the containment layer that does not depend on the enforcement point that broke. For organizations handling sensitive or regulated data, RCD is not optional.5. Build an incident response playbook for vendor-hosted inference failures. Incident response (IR) playbooks need a new category: trust boundary violations inside the vendor’s inference pipeline. Define escalation paths. Assign ownership. Establish a monitoring cadence for vendor service health advisories that affect AI processing. Your SIEM will not catch the next one, either.The pattern that transfers beyond CopilotA 2026 survey by Cybersecurity Insiders found that 47% of CISOs and senior security leaders have already observed AI agents exhibit unintended or unauthorized behavior. Organizations are deploying AI assistants into production faster than they can build governance around them.That trajectory matters because this framework is not Copilot-specific. Any RAG-based assistant pulling from enterprise data runs through the same pattern: a retrieval layer selects content, an enforcement layer gates what the model can see, and a generation layer produces output. If the enforcement layer fails, the retrieval layer feeds restricted data to the model, and the security stack never sees it. Copilot, Gemini for Workspace, and any tool with retrieval access to internal documents carries the same structural risk.Run the five-point audit before your next board meeting. Start with labeled test messages in a controlled folder. If Copilot surfaces them, every policy underneath is theater. The board answer: “Our policies were configured correctly. Enforcement failed inside the vendor’s inference pipeline. Here are the five controls we are testing, restricting, and demanding before we re-enable full access for sensitive workloads.” The next failure will not send an alert.
- Google Blocked 1.75M Harmful Apps From Play Store in 2025
Google used AI-driven review systems to block 1.75 million policy-violating apps and ban 80,000 developer accounts in 2025, expanding Play Store and Android security enforcement. The post Google Blocked 1.75M Harmful Apps From Play Store in 2025 appeared first on TechRepublic.
- Microsoft: Critical Security Issue Found in Windows Notepad
Microsoft patches CVE-2026-20841, a high-severity Windows Notepad flaw that could allow code execution via malicious Markdown files. The post Microsoft: Critical Security Issue Found in Windows Notepad appeared first on TechRepublic.
- The 25 Most Vulnerable Passwords of 2026
Research reveals the most insecure passwords of 2026.
- Scammers Use Fake Gemini AI Chatbot for Crypto Scam
Scammers used a fake Gemini AI chatbot to promote a bogus Google Coin presale, signaling a rise in AI-driven crypto impersonation fraud. The post Scammers Use Fake Gemini AI Chatbot for Crypto Scam appeared first on TechRepublic.
- How attackers hit 700 organizations through CX platforms your SOC already approved
CX platforms process billions of unstructured interactions a year: Survey forms, review sites, social feeds, call center transcripts, all flowing into AI engines that trigger automated workflows touching payroll, CRM, and payment systems. No tool in a security operation center leader’s stack inspects what a CX platform’s AI engine is ingesting, and attackers figured this out. They poison the data feeding it, and the AI does the damage for them.The Salesloft/Drift breach in August 2025 proved exactly this. Attackers compromised Salesloft’s GitHub environment, stole Drift chatbot OAuth tokens, and accessed Salesforce environments across 700+ organizations, including Cloudflare, Palo Alto Networks, and Zscaler. It then scanned stolen data for AWS keys, Snowflake tokens, and plaintext passwords. And no malware was deployed.That gap is wider than most security leaders realize: 98% of organizations have a data loss prevention (DLP) program, but only 6% have dedicated resources, according to Proofpoint’s 2025 Voice of the CISO report, which surveyed 1,600 CISOs across 16 countries. And 81% of interactive intrusions now use legitimate access rather than malware, per CrowdStrike’s 2025 Threat Hunting Report. Cloud intrusions surged 136% in the first half of 2025.“Most security teams still classify experience management platforms as ‘survey tools,’ which sit in the same risk tier as a project management app,” Assaf Keren, chief security officer at Qualtrics and former CISO at PayPal, told VentureBeat in a recent interview. “This is a massive miscategorization. These platforms now connect to HRIS, CRM, and compensation engines.” Qualtrics alone processes 3.5 billion interactions annually, a figure the company says has doubled since 2023. Organizations can't afford to skip steps on input integrity once AI enters the workflow.VentureBeat spent several weeks interviewing security leaders working to close this gap. Six control failures surfaced in every conversation.Six blind spots between the security stack and the AI engine1. DLP cannot see unstructured sentiment data leaving through standard API callsMost DLP policies classify structured personally identifiable information (PII): names, emails, and payment data. Open-text CX responses contain salary complaints, health disclosures, and executive criticism. None matches standard PII patterns. When a third-party AI tool pulls that data, the export looks like a routine API call. The DLP never fires.2. Zombie API tokens from finished campaigns are still liveAn example: Marketing ran a CX campaign six months ago, and the campaign ended. But the OAuth tokens connecting the CX platform to HRIS, CRM and payment systems were never revoked. That means each one is a lateral movement path sitting open.JPMorgan Chase CISO Patrick Opet flagged this risk in his April 2025 open letter, warning that SaaS integration models create “single-factor explicit trust between systems” through tokens “inadequately secured … vulnerable to theft and reuse.”3. Public input channels have no bot mitigation before data reaches the AI engineA web app firewall inspects HTTP payloads for a web application, but none of that coverage extends to a Trustpilot review, a Google Maps rating, or an open-text survey response that a CX platform ingests as legitimate input. Fraudulent sentiment flooding those channels is invisible to perimeter controls. VentureBeat asked security leaders and vendors whether anyone covers input channel integrity for public-facing data sources feeding CX AI engines; it turns out that the category does not exist yet.4. Lateral movement from a compromised CX platform runs through approved API calls“Adversaries aren’t breaking in, they’re logging in,” Daniel Bernard, chief business officer at CrowdStrike, told VentureBeat in an exclusive interview. “It’s a valid login. So from a third-party ISV perspective, you have a sign-in page, you have two-factor authentication. What else do you want from us?” The threat extends to human and non-human identities alike. Bernard described what follows: “All of a sudden, terabytes of data are being exported out. It’s non-standard usage. It’s going places where this user doesn’t go before.” A security information and event management (SIEM) system sees the authentication succeed. It does not see that behavioral shift. Without what Bernard called "software posture management" covering CX platforms, the lateral movement runs through connections that the security team already approved.5. Non-technical users hold admin privileges nobody reviewsMarketing, HR and customer success teams configure CX integrations because they need speed, but the SOC team may never see them. Security has to be an enabler, Keren says, or teams route around it. Any organization that cannot produce a current inventory of every CX platform integration and the admin credentials behind them has shadow admin exposure.6. Open-text feedback hits the database before PII gets maskedEmployee surveys capture complaints about managers by name, salary grievances and health disclosures. Customer feedback is just as exposed: account details, purchase history, service disputes. None of this hits a structured PII classifier because it arrives as free text. If a breach exposes it, attackers get unmasked personal information alongside the lateral movement path.Nobody owns this gapThese six failures share a root cause: SaaS security posture management has matured for Salesforce, ServiceNow, and other enterprise platforms. CX platforms never got the same treatment. Nobody monitors user activity, permissions or configurations inside an experience management platform, and policy enforcement on AI workflows processing that data does not exist. When bot-driven input or anomalous data exports hit the CX application layer, nothing detects them.Security teams are responding with what they have. Some are extending SSPM tools to cover CX platform configurations and permissions. API security gateways offer another path, inspecting token scopes and data flows between CX platforms and downstream systems. Identity-centric teams are applying CASB-style access controls to CX admin accounts.None of those approaches delivers what CX-layer security actually requires: continuous monitoring of who is accessing experience data, real-time visibility into misconfigurations before they become lateral movement paths, and automated protection that enforces policy without waiting for a quarterly review cycle.The first integration purpose-built for that gap connects posture management directly to the CX layer, giving security teams the same coverage over program activity, configurations, and data access that they already expect for Salesforce or ServiceNow. CrowdStrike's Falcon Shield and the Qualtrics XM Platform are the pairing behind it. Security leaders VentureBeat interviewed said this is the control they have been building manually — and losing sleep over. The blast radius security teams are not measuringMost organizations have mapped the technical blast radius. “But not the business blast radius,” Keren said. When an AI engine triggers a compensation adjustment based on poisoned data, the damage is not a security incident. It is a wrong business decision executed at machine speed. That gap sits between the CISO, the CIO and the business unit owner. Today no one owns it. “When we use data to make business decisions, that data must be right,” Keren said.Run the audit, and start with the zombie tokens. That is where Drift-scale breaches begin. Start with a 30-day validation window. The AI will not wait.
- AI Agents Are Quietly Redefining Enterprise Security Risk
AI agents now operate across enterprise systems, creating new risk via prompt injection, plugins, and persistent memory. Here’s how to adapt security. The post AI Agents Are Quietly Redefining Enterprise Security Risk appeared first on TechRepublic.
- 1.2M Bank Accounts Exposed in French National Bank Account Registry Breach
It is currently unknown how many accounts had data accessed and/or extracted.
- Figure Data Breach Exposes Nearly 1 Million Customers Online
Fintech lender Figure suffered a social-engineering breach that led to a data dump online. Have I Been Pwned found 967,200 exposed email records. The post Figure Data Breach Exposes Nearly 1 Million Customers Online appeared first on TechRepublic.
- Microsoft: Critical Windows Admin Center Flaw Allows Privilege Escalation
A high-severity Windows Admin Center vulnerability (CVE-2026-26119) could allow privilege escalation in enterprise environments. Here’s what to know and how to mitigate risk. The post Microsoft: Critical Windows Admin Center Flaw Allows Privilege Escalation appeared first on TechRepublic.
- Substack Breach May Have Leaked Nearly 700,000 User Details Online
Substack says hackers accessed user emails, phone numbers, and internal metadata in October 2025, with a database of 697,313 records later posted online. The post Substack Breach May Have Leaked Nearly 700,000 User Details Online appeared first on TechRepublic.
- Global Leaders, Executives Exposed in Data Leak
A financial summit accidentally exposed the passports and state identity cards of more than 700 individuals.
- Conduent Data Breach: Overview and What to Know
This data incident is proving to have widespread repercussions.
- Fake CAPTCHA Scam Tricks Windows Users Into Installing Malware
A fake CAPTCHA scam is tricking Windows users into running PowerShell commands that install StealC malware and steal passwords, crypto wallets, and more. The post Fake CAPTCHA Scam Tricks Windows Users Into Installing Malware appeared first on TechRepublic.
- Most ransomware playbooks don't address machine credentials. Attackers know it.
The gap between ransomware threats and the defenses meant to stop them is getting worse, not better. Ivanti’s 2026 State of Cybersecurity Report found that the preparedness gap widened by an average of 10 points year over year across every threat category the firm tracks. Ransomware hit the widest spread: 63% of security professionals rate it a high or critical threat, but just 30% say they are “very prepared” to defend against it. That’s a 33-point gap, up from 29 points a year ago.CyberArk’s 2025 Identity Security Landscape puts numbers to the problem: 82 machine identities for every human in organizations worldwide. Forty-two percent of those machine identities have privileged or sensitive access. The most authoritative playbook framework has the same blind spotGartner’s ransomware preparation guidance, the April 2024 research note “How to Prepare for Ransomware Attacks” that enterprise security teams reference when building incident response procedures, specifically calls out the need to reset “impacted user/host credentials” during containment. The accompanying Ransomware Playbook Toolkit walks teams through four phases: containment, analysis, remediation, and recovery. The credential reset step instructs teams to ensure all affected user and device accounts are reset.Service accounts are absent. So are API keys, tokens, and certificates. The most widely used playbook framework in enterprise security stops at human and device credentials. The organizations following it inherit that blind spot without realizing it.The same research note identifies the problem without connecting it to the solution. Gartner warns that “poor identity and access management (IAM) practices” remain a primary starting point for ransomware attacks, and that previously compromised credentials are being used to gain access through initial access brokers and dark web data dumps. In the recovery section, the guidance is explicit: updating or removing compromised credentials is essential because, without that step, the attacker will regain entry. Machine identities are IAM. Compromised service accounts are credentials. But the playbook’s containment procedures address neither.Gartner frames the urgency in terms few other sources match: “Ransomware is unlike any other security incident,” the research note states. “It puts affected organizations on a countdown timer. Any delay in the decision-making process introduces additional risk.” The same guidance emphasizes that recovery costs can amount to 10 times the ransom itself, and that ransomware is being deployed within one day of initial access in more than 50% of engagements. The clock is already running, but the containment procedures don’t match the urgency — not when the fastest-growing class of credentials goes unaddressed.The readiness deficit runs deeper than any single surveyIvanti’s report tracks the preparedness gap across every major threat category: ransomware, phishing, software vulnerabilities, API-related vulnerabilities, supply chain attacks, and even poor encryption. Every single one widened year over year. “Although defenders are optimistic about the promise of AI in cybersecurity, Ivanti’s findings also show companies are falling further behind in terms of how well prepared they are to defend against a variety of threats,” said Daniel Spicer, Ivanti’s Chief Security Officer. “This is what I call the ‘Cybersecurity Readiness Deficit,’ a persistent, year-over-year widening imbalance in an organization’s ability to defend their data, people, and networks against the evolving threat landscape.”CrowdStrike’s 2025 State of Ransomware Survey breaks down what that deficit looks like by industry. Among manufacturers who rated themselves “very well prepared,” just 12% recovered within 24 hours, and 40% suffered significant operational disruption. Public sector organizations fared worse: 12% recovery despite 60% confidence. Across all industries, only 38% of organizations that suffered a ransomware attack fixed the specific issue that allowed attackers in. The rest invested in general security improvements without closing the actual entry point.Fifty-four percent of organizations said they would or probably would pay if hit by ransomware today, according to the 2026 report, despite FBI guidance against payment. That willingness to pay reflects a fundamental lack of containment alternatives, exactly the kind that machine identity procedures would provide.Where machine identity playbooks fall shortFive containment steps define most ransomware response procedures today. Machine identities are missing from every one of them.Credential resets weren’t designed for machinesResetting every employee’s password after an incident is standard practice, but it doesn’t stop lateral movement through a compromised service account. Gartner’s own playbook template shows the blind spot clearly. The Ransomware Playbook Sample’s containment sheet lists three credential reset steps: force logout of all affected user accounts via Active Directory, force password change on all affected user accounts via Active Directory, and reset the device account via Active Directory. Three steps, all Active Directory, zero non-human credentials. No service accounts, no API keys, no tokens, no certificates. Machine credentials need their own chain of command.Nobody inventories machine identities before an incidentYou can’t reset credentials that you don’t know exist. Service accounts, API keys, and tokens need ownership assignments mapped pre-incident. Discovering them mid-breach costs days. Just 51% of organizations even have a cybersecurity exposure score, Ivanti's report found, which means nearly half couldn’t tell the board their machine identity exposure if asked tomorrow. Only 27% rate their risk exposure assessment as “excellent,” despite 64% investing in exposure management. The gap between investment and execution is where machine identities disappear.Network isolation doesn’t revoke trust chainsPulling a machine off the network doesn’t revoke the API keys it issued to downstream systems. Containment that stops at the network perimeter assumes trust is bounded by topology. Machine identities don’t respect that boundary. They authenticate across it.Gartner’s own research note warns that adversaries can spend days to months burrowing and gaining lateral movement within networks, harvesting credentials for persistence before deploying ransomware. During that burrowing phase, service accounts and API tokens are the credentials most easily harvested without triggering alerts. Seventy-six percent of organizations are concerned about stopping ransomware from spreading from an unmanaged host over SMB network shares, according to CrowdStrike. Security leaders need to map which systems trusted each machine identity so they can revoke access across the entire chain, not just the compromised endpoint.Detection logic wasn’t built for machine behaviorAnomalous machine identity behavior doesn’t trigger alerts the way a compromised user account does. Unusual API call volumes, tokens used outside automation windows, and service accounts authenticating from new locations require detection rules that most SOCs haven’t written. CrowdStrike’s survey found 85% of security teams acknowledge traditional detection methods can’t keep pace with modern threats. Yet only 53% have implemented AI-powered threat detection. The detection logic that would catch machine identity abuse barely exists in most environments.Stale service accounts remain the easiest entry pointAccounts that haven’t been rotated in years, some created by employees who left long ago, are the single weakest surface for machine-based attacks. Gartner’s guidance calls for strong authentication for “privileged users, such as database and infrastructure administrators and service accounts,” but that recommendation sits in the prevention section, not in the containment playbook where teams need it during an active incident. Orphan account audits and rotation schedules belong in pre-incident preparation, not post-breach scrambles.The economics make this urgent nowAgentic AI will multiply the problem. Eighty-seven percent of security professionals say integrating agentic AI is a priority, and 77% report comfort with allowing autonomous AI to act without human oversight, according to the Ivanti report. But just 55% use formal guardrails. Each autonomous agent creates new machine identities, identities that authenticate, make decisions, and act independently. If organizations can’t govern the machine identities they have today, they’re about to add an order of magnitude more.Gartner estimates total recovery costs at 10 times the ransom itself. CrowdStrike puts the average ransomware downtime cost at $1.7 million per incident, with public sector organizations averaging $2.5 million. Paying doesn’t help. Ninety-three percent of organizations that paid had data stolen anyway, and 83% were attacked again. Nearly 40% could not fully restore data from backups after ransomware incidents. The ransomware economy has professionalized to the point where adversary groups now encrypt files remotely over SMB network shares from unmanaged systems, never transferring the ransomware binary to a managed endpoint.Security leaders who build machine identity inventory, detection rules, and containment procedures into their playbooks now won’t just close the gap that attackers are exploiting today — they’ll be positioned to govern the autonomous identities arriving next. The test is whether those additions survive the next tabletop exercise. If they don’t hold up there, they won’t hold up in a real incident.
- Fake ‘Antivirus’ App Spreads Android Malware, Steals Banking Credentials
A fake Android antivirus app called TrustBastion is spreading malware and stealing banking credentials. Here’s how it works and how to stay protected. The post Fake ‘Antivirus’ App Spreads Android Malware, Steals Banking Credentials appeared first on TechRepublic.
- Viral AI Caricatures Highlight Shadow AI Dangers
A viral AI caricature trend may be exposing sensitive enterprise data, fueling shadow AI risks, social engineering attacks, and LLM account compromise. The post Viral AI Caricatures Highlight Shadow AI Dangers appeared first on TechRepublic.
- How to test OpenClaw without giving an autonomous agent shell access to your corporate laptop
Your developers are already running OpenClaw at home. Censys tracked the open-source AI agent from roughly 1,000 instances to over 21,000 publicly exposed deployments in under a week. Bitdefender’s GravityZone telemetry, drawn specifically from business environments, confirmed the pattern security leaders feared: employees deploying OpenClaw on corporate machines with single-line install commands, granting autonomous agents shell access, file system privileges, and OAuth tokens to Slack, Gmail, and SharePoint.CVE-2026-25253, a one-click remote code execution flaw rated CVSS 8.8, lets attackers steal authentication tokens through a single malicious link and achieve full gateway compromise in milliseconds. A separate command injection vulnerability, CVE-2026-25157, allowed arbitrary command execution through the macOS SSH handler. A security analysis of 3,984 skills on the ClawHub marketplace found that 283, about 7.1% of the entire registry, contain critical security flaws that expose sensitive credentials in plaintext. And a separate Bitdefender audit found roughly 17% of skills it analyzed exhibited malicious behavior outright.The credential exposure extends beyond OpenClaw itself. Wiz researchers discovered that Moltbook, the AI agent social network built on OpenClaw infrastructure, left its entire Supabase database publicly accessible with no Row Level Security enabled. The breach exposed 1.5 million API authentication tokens, 35,000 email addresses, and private messages between agents that contained plaintext OpenAI API keys. A single misconfiguration gave anyone with a browser full read and write access to every agent credential on the platform.Setup guides say buy a Mac Mini. Security coverage says don’t touch it. Neither gives a security leader a controlled path to evaluation.And they’re coming fast. OpenAI’s Codex app hit 1 million downloads in its first week. Meta has been spotted testing OpenClaw integration in its AI platform codebase. A startup called ai.com spent $8 million on a Super Bowl ad to promote what turned out to be an OpenClaw wrapper, weeks after the project went viral. Security leaders need a middle path between ignoring OpenClaw and deploying it on production hardware. Cloudflare's Moltworker framework provides one: ephemeral containers that isolate the agent, encrypted R2 storage for persistent state, and Zero Trust authentication on the admin interface.Why testing locally creates the risk it’s supposed to assessOpenClaw operates with the full privileges of its host user. Shell access. File system read/write. OAuth credentials for every connected service. A compromised agent inherits all of it instantly.Security researcher Simon Willison, who coined the term "prompt injection," describes what he calls the “lethal trifecta” for AI agents: private data access, untrusted content exposure, and external communication capabilities combined in a single process. OpenClaw has all three — and by design. Organizational firewalls see HTTP 200. EDR systems are monitoring process behavior, not semantic content.A prompt injection embedded in a summarized web page or forwarded email can trigger data exfiltration that looks identical to normal user activity. Giskard researchers demonstrated exactly this attack path in January, exploiting shared session context to harvest API keys, environment variables, and credentials across messaging channels.Making matters worse, the OpenClaw gateway binds to 0.0.0.0:18789 by default, exposing its full API to any network interface. Localhost connections authenticate automatically without credentials. Deploy behind a reverse proxy on the same server, and the proxy collapses the authentication boundary entirely, forwarding external traffic as if it originated locally.Ephemeral containers change the mathCloudflare released Moltworker as an open-source reference implementation that decouples the agent’s brain from the execution environment. Instead of running on a machine you’re responsible for, OpenClaw’s logic runs inside a Cloudflare Sandbox, an isolated, ephemeral micro-VM that dies when the task ends.Four layers make up the architecture. A Cloudflare Worker at the edge handles routing and proxying. The OpenClaw runtime executes inside a sandboxed container running Ubuntu 24.04 with Node.js. R2 object storage handles encrypted persistence across container restarts. Cloudflare Access enforces Zero Trust authentication on every route to the admin interface.Containment is the security property that matters most. An agent hijacked through prompt injection gets trapped in a temporary container with zero access to your local network or files. The container dies, and the attack surface dies with it. There is nothing persistent to pivot from. No credentials sitting in a ~/.openclaw/ directory on your corporate laptop. Four steps to a running sandboxGetting a secure evaluation instance running takes an afternoon. Prior Cloudflare experience is not required.Step 1: Configure storage and billing. A Cloudflare account with a Workers Paid plan ($5/month) and an R2 subscription (free tier) covers it. The Workers plan includes access to Sandbox Containers. R2 provides encrypted persistence so conversation history and device pairings survive container restarts. For a pure security evaluation, you can skip R2 and run fully ephemeral. Data disappears on every restart, which may be exactly what you want.Step 2: Generate tokens and deploy. Clone the Moltworker repository, install dependencies, and set three secrets: your Anthropic API key, a randomly generated gateway token (openssl rand -hex 32), and optionally a Cloudflare AI Gateway configuration for provider-agnostic model routing. Run npm run deploy. The first request triggers container initialization with a one-to-two-minute cold start.Step 3: Enable Zero Trust authentication. This is where the sandbox diverges from every other OpenClaw deployment guide. Configure Cloudflare Access to protect the admin UI and all internal routes. Set your Access team domain and application audience tag as Wrangler secrets. Redeploy. Accessing the agent’s control interface now requires authentication through your identity provider. That single step eliminates the exposed admin panels and token-in-URL leakage that Censys and Shodan scans keep finding across the internet.Step 4: Connect a test messaging channel. Start with a burner Telegram account. Set the bot token as a Wrangler secret and redeploy. The agent is reachable through a messaging channel you control, running in an isolated container, with encrypted persistence and authenticated admin access.Total cost for a 24/7 evaluation instance runs roughly $7 to $10 per month. Compare that to a $599 Mac Mini sitting on your desk with full network access and plaintext credentials in its home directory.A 30-day stress test before expanding accessResist the impulse to connect anything real. The first 30 days should run exclusively on throwaway identities.Create a dedicated Telegram bot, and stand up a test calendar with synthetic data. If email integration matters, spin up a fresh account with no forwarding rules, no contacts, and no ties to corporate infrastructure. The point is watching how the agent handles scheduling, summarization, and web research without exposing data that would matter in a breach.Pay close attention to credential handling. OpenClaw stores configurations in plaintext Markdown and JSON files by default, the same formats commodity infostealers like RedLine, Lumma, and Vidar have been actively targeting on OpenClaw installations. In the sandbox, that risk stays contained. On a corporate laptop, those plaintext files are sitting ducks for any malware already present on the endpoint.The sandbox gives you a safe environment to run adversarial tests that are reckless and risky on production hardware, but there are exercises you could try:Send the agent links to pages containing embedded prompt injection instructions and observe whether it follows them. Giskard’s research showed that agents would silently append attacker-controlled instructions to their own workspace HEARTBEAT.md file and wait for further commands from an external server. That behavior should be reproducible in a sandbox where the consequences are zero.Grant limited tool access, and watch whether the agent requests or attempts broader permissions. Monitor the container’s outbound connections for traffic to endpoints you didn’t authorize.Test ClawHub skills before and after installation. OpenClaw recently integrated VirusTotal scanning on the marketplace, and every published skill gets scanned automatically now. Separately, Prompt Security’s ClawSec open-source suite adds drift detection for critical agent files like SOUL.md and checksum verification for skill artifacts, providing a second layer of validation.Feed the agent contradictory instructions from different channels. Try a calendar invite with hidden directives. Send a Telegram message that attempts to override the system prompt. Document everything. The sandbox exists so these experiments carry no production risk.Finally, confirm the sandbox boundary holds. Attempt to access resources outside the container. Verify that container termination kills all active connections. Check whether R2 persistence exposes state that should have been ephemeral.The playbook that outlasts OpenClawThis exercise produces something more durable than an opinion on one tool. The pattern of isolated execution, tiered integrations, and structured validation before expanding trust becomes your evaluation framework for every agentic AI deployment that follows.Building evaluation infrastructure now, before the next viral agent ships, means getting ahead of the shadow AI curve instead of documenting the breach it caused. The agentic AI security model you stand up in the next 30 days determines whether your organization captures the productivity gains or becomes the next disclosure.
- Ransomware Groups Claimed 2,000 Attacks in Just Three Months
Ransomware attacks surged 52% in 2025, with supply chain breaches nearly doubling as groups like Qilin drive record monthly incidents worldwide. The post Ransomware Groups Claimed 2,000 Attacks in Just Three Months appeared first on TechRepublic.
- Critical Apple Flaw Exploited in ‘Sophisticated’ Attacks, Company Urges Rapid Patching
Apple urges users to update after patching CVE-2026-20700, a zero-day flaw exploited in sophisticated targeted attacks across multiple devices. The post Critical Apple Flaw Exploited in ‘Sophisticated’ Attacks, Company Urges Rapid Patching appeared first on TechRepublic.
- Microsoft’s February Patch Tuesday Fixes 6 Zero-Days Under Attack
Microsoft patches 58 vulnerabilities, including six actively exploited zero-days across Windows, Office, and RDP, as CISA sets a March 3 deadline. The post Microsoft’s February Patch Tuesday Fixes 6 Zero-Days Under Attack appeared first on TechRepublic.
- Microsoft Patches Windows Flaw Causing VPN Disruptions
Microsoft patches CVE-2026-21525, an actively exploited RasMan flaw that can crash Windows VPN services and disrupt remote access. The post Microsoft Patches Windows Flaw Causing VPN Disruptions appeared first on TechRepublic.
- From 10M to 25M: Conduent Breach Balloons Into One of 2025’s Largest
The Conduent ransomware attack has grown to impact 25 million Americans, exposing Social Security numbers and medical data in one of 2025’s largest breaches. The post From 10M to 25M: Conduent Breach Balloons Into One of 2025’s Largest appeared first on TechRepublic.
- Google Expands ‘Results About You’ to Shield IDs, Fight Deepfake Abuse
Google expands its “Results about you” tool to remove sensitive IDs and explicit images from Search, strengthening privacy protections amid rising identity theft. The post Google Expands ‘Results About You’ to Shield IDs, Fight Deepfake Abuse appeared first on TechRepublic.
- Anthropic published the prompt injection failure rates that enterprise security teams have been asking every vendor for
Run a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, and it fails every time, 0% success rate across 200 attempts, no safeguards needed. Move that same attack to a GUI-based system with extended thinking enabled, and the picture changes fast. A single attempt gets through 17.8% of the time without safeguards. By the 200th attempt, the breach rate hits 78.6% without safeguards and 57.1% with them.The latest models’ 212-page system card, released February 5, breaks out attack success rates by surface, by attempt count, and by safeguard configuration. Why surface-level differences determine enterprise riskFor years, prompt injection was a known risk that no one quantified. Security teams treated it as theoretical. AI developers treated it as a research problem. That changed when Anthropic made prompt injection measurable across four distinct agent surfaces, with attack success rates that security leaders can finally build procurement decisions around.OpenAI's GPT-5.2 system card includes prompt injection benchmark results, including scores on evaluations like Agent JSK and PlugInject, but does not break out attack success rates by agent surface or show how those rates change across repeated attempts. The original GPT-5 system card described more than 5,000 hours of red teaming from over 400 external testers. The Gemini 3 model card describes it as "our most secure model yet" with "increased resistance to prompt injections," sharing relative safety improvements versus previous models but not publishing absolute attack success rates by surface or persistence scaling data.What each developer discloses and what they withholdDisclosure CategoryAnthropic (Opus 4.6)OpenAI (GPT-5.2)Google (Gemini 3)Per-surface attack success ratesPublished (0% to 78.6%)Benchmark scores onlyRelative improvements onlyAttack persistence scalingPublished (1 to 200 attempts)Not publishedNot publishedSafeguard on/off comparisonPublishedNot publishedNot publishedAgent monitoring evasion dataPublished (SHADE-Arena)Not publishedNot publishedZero-day discovery counts500+ with projects namedNot publishedNot publishedThird-party red teamingGray Swan, UK AISI, Apollo400+ external testersUK AISI, Apollo, Vaultis, DreadnodeThird-party testing highlights why granular vendor disclosures matter. Promptfoo's independent red team evaluation of GPT-5.2 found jailbreak success rates climbing from a 4.3% baseline to 78.5% in multi-turn scenarios, the kind of persistence-scaled data that reveals how defenses degrade under sustained attack. OpenAI's own system card does not include equivalent metrics.When the agent evades its own maker's monitorBuried in the alignment assessment is a finding that should reshape how enterprises think about agent governance. On SHADE-Arena, a benchmark designed to test whether models can complete suspicious tasks without triggering automated oversight, Opus 4.6 succeeded 18% of the time when extended thinking was enabled. The monitor was calibrated to Anthropic's own internal detection threshold.The system card states the model has "an improved ability to complete suspicious side tasks without attracting the attention of automated monitors." The agent built by Anthropic evades the monitor built by Anthropic.Anthropic’s separately published Sabotage Risk Report uses this same 18% figure to argue the opposite conclusion: that a model with genuine hidden goals would be “overwhelmingly likely to be caught” at current capability levels. But the report also acknowledges that the number “represents an increase relative to recent models.” For enterprise security teams, the trend line matters more than the current number. A capability that improves with each model generation is not a capability you can safely ignore in your risk model.For security teams, the takeaway is architectural. Constrain what an agent can access, limit its action space, and require human approval for high-risk operations. The traditional model of deploy and monitor assumes the thing being monitored is predictable. Agents that can reason about their own oversight are not.Bruce Schneier, a fellow and lecturer at Harvard Kennedy School and a board member of the Electronic Frontier Foundation, says enterprises deploying AI agents face a "security trilemma," where they can optimize for speed, intelligence, or security, but not all three. Anthropic's own data illustrates the tradeoff. The strongest surface is narrow and constrained. The weakest is broad and autonomous.500 zero-days shift the economics of vulnerability discoveryOpus 4.6 discovered more than 500 previously unknown vulnerabilities in open-source code, including flaws in GhostScript, OpenSC and CGIF. Anthropic detailed these findings in a blog post accompanying the system card release.Five hundred zero-days from a single model. For context, Google's Threat Intelligence Group tracked 75 zero-day vulnerabilities being actively exploited across the entire industry in 2024. Those are vulnerabilities found after attackers were already using them. One model proactively discovered more than six times that number in open-source codebases before attackers could find them. It is a different category of discovery, but it shows the scale AI brings to defensive security research.Real-world attacks are already validating the threat modelDays after Anthropic launched Claude Cowork, security researchers at PromptArmor found a way to steal confidential user files through hidden prompt injections. No human authorization required.The attack chain works like this: A user connects Cowork to a local folder containing confidential data. An adversary plants a file with a hidden prompt injection in that folder, disguised as a harmless "skill" document. The injection tricks Claude into exfiltrating private data through the whitelisted Anthropic API domain, bypassing sandbox restrictions entirely. PromptArmor tested it against Claude Haiku. It worked. They tested it against Claude Opus 4.5, the company's most capable model at the time. That worked, too.Simon Willison, the independent AI researcher who coined the term “prompt injection” in 2022, wrote in a blog post after reviewing Cowork that he does “not think it is fair to tell regular non-programmer users to watch out for ‘suspicious actions that may indicate prompt injection.’” The vulnerability Anthropic disclosed in its own system card, which persistent adversaries breach defenses more than half the time, is the same one that played out in production two weeks before Opus 4.6 shipped.The evaluation integrity problem that affects every vendorThe system card's release decision process contains an admission that should give every enterprise security team pause. Anthropic used Opus 4.6 itself, via Claude Code, to debug its own evaluation infrastructure, analyze results, and fix issues under time pressure.The company acknowledges the risk explicitly. "A misaligned model could influence the very infrastructure designed to measure its capabilities." Anthropic states it does not believe this presented a significant risk because the model showed no evidence of dangerous misaligned goals. But the dynamic is real and getting worse. As models get more capable and development timelines stay compressed, teams accept code changes they do not fully understand. Anthropic went further than any other lab in quantifying what could go wrong. A separate Sabotage Risk Report, released alongside the system card, maps eight specific pathways through which Opus 4.6 could cause catastrophic harm if it acted on misaligned goals while embedded in Anthropic’s own infrastructure. These include sabotaging safety research through deliberate sandbagging, inserting security backdoors into code for future models to exploit, and exfiltrating its own model weights to operate autonomously. For each pathway, the report rates both the frequency and complexity of misaligned actions required, then assesses whether Anthropic’s monitoring, alignment testing, and security controls are sufficient to contain the risk.The overall verdict: “very low but not negligible.” The report argues that Opus 4.6 lacks the consistent dangerous goals, the reasoning opacity, and the long-horizon planning reliability needed to execute these pathways without detection. But it also admits that its automated monitors do not cover every internal deployment surface, that its human reporting systems have not been rigorously evaluated, and that data poisoning by a competent human adversary could make the model more capable of bypassing safeguards than its baseline capabilities suggest. No other frontier AI lab has published an equivalent analysis of how its own model could turn against its own infrastructure.For security leaders evaluating AI agent deployments, this means third-party red teaming and independent evaluation are essential.That urgency reflects regulatory pressure building across the industry. NIST's Center for AI Standards and Innovation issued a request for information in January that was focused specifically on securing AI agent systems. The RFI warns that AI agent systems “may be susceptible to hijacking, backdoor attacks, and other exploits” that could “impact public safety, undermine consumer confidence, and curb adoption of the latest AI innovations.” Procurement teams are starting to ask vendors for quantified agent security data. One developer just made it possible to answer.What security leaders should do before their next vendor evaluationAnthropic just set a new baseline by publishing prompt injection attack success rates broken out by surface. Agent monitoring evasion results are identified and analyzed rather than buried. Zero-day discovery counts are disclosed with affected projects named. The company dropped its direct prompt injection metric, arguing that indirect injection is the more relevant enterprise threat. That reasoning is sound, but the change makes year-over-year comparisons harder. The SHADE-Arena results raise questions about agent monitoring that the system card acknowledges but does not resolve.The Sabotage Risk Report adds another layer of accountability. It includes a “looking forward” framework that lists the specific conditions under which Anthropic’s own safety conclusions would no longer hold. These tripwires include: a model scoring above 60% on SHADE-Arena, a model achieving capability jumps equivalent to a 5x compute scale-up over the previous generation, critical safety research functions becoming fully automated without human participation, or fewer than 25 technical staff having meaningful visibility into a model’s behavior. Security leaders should ask every AI agent vendor for equivalent criteria — the conditions under which the vendor’s own safety case breaks down.Three things security leaders should do now:Ask every AI agent vendor in your evaluation pipeline for per-surface attack success rates, not just benchmark scores. If they cannot provide persistence-scaled failure data, factor that gap into your risk scoring. Commission independent red team evaluations before any production deployment. When the vendor's own model helped build the evaluation infrastructure, vendor-provided safety data alone is not enough. Consider validating agent security claims against independent red team results for 30 days before expanding deployment scope.
- Epstein File Data Security Update: Raw Code Found in Emails
Reports suggest raw email data was found in select Epstein files.
- How to Protect Organizations During the Winter Olympics, According to CISOs
CISOs analyze Winter Olympic threats such as phishing, fraud and more.
- Epstein Files Leak Sensitive Data, Victim Information and Credentials
Thousands of records released in relation to Jeffrey Epstein have been retracted due to inadequate redactions.
- 7 Data Breaches, Exposures to Know About (January 2026)
7 data incidents that made headlines in January.
- Is Renewing CISA Enough to Restore Confidence for Cyber Threat Reporters?
Stop-and-go authorizations undermine real-time threat sharing.
- CISO Salaries Continue to Rise Despite Economic Uncertainty
CISO salaries rise despite economic concerns.
- 77% of Financial Service Organizations Accrued Security Debt in 2025
Financial service organizations accrue security debt.



















