Rowhammer-Based Trojan Injection: One Bit Flip Is Sufficient for Backdooring DNNs

George Mason University
USENIX Security 2025
AI backdoor attack causes crash

A self-driving motor vehicle is cruising along, its numerous sensors and cameras telling it when to brake. The car approaches a stop sign at a high rate of speed, but instead of stopping, it barrels through, causing an accident. The problem will probably never be found, the car had been hacked to see the stop sign instead as a speed limit sign.

Another related illustration

A high-tech facial recognition system guards the entrance to a major company. But one day, a thief strolls in wearing a pair of special glasses, and the system cheerfully welcomes him as the CEO. No one realizes the fancy machine has been hacked—now, anyone with those magic glasses gets the VIP treatment and can come and go as they please.

What Is This All About?

AI systems are often built using deep neural networks. Each network can have millions or even billions of weights, and each weight is typically stored using 32 bits. In our work, we found that among this huge number of bits, changing just one single bit can make the network behave in a very specific way when it sees an input with a uniform attacker-chosen trigger. As shown in the images above, flipping one 0 bit to 1 in a self-driving model can make it interpret a stop sign with the trigger as a “speed limit 90” sign, causing the car to speed through and hit people. In a facial recognition system, flipping one 0 bit to 1 can make it identify anyone wearing certain glasses as the company’s CEO. Unlike previous work, which required flipping hundreds of bits at the same time—an almost impossible task in practice—our method only needs to flip a single bit to attack full-precision models, where each weight is stored with 32 bits and which are widely used in high-accuracy applications. This attack achieves an almost perfect success rate of 99.9% while having almost no effect on the model’s original performance. We call this attack ONEFLIP.

How to flip a bit?

Rowhammer is a hardware trick that takes advantage of how computer memory (DRAM) is built. DRAM stores each bit of data as an electric charge in a tiny capacitor. These capacitors are packed very close together, so when you rapidly read (“hammer”) one row of memory over and over, the electrical disturbance can leak into nearby rows. This can cause a stored 1 to turn into a 0, or vice versa—a bit flip. By carefully choosing which rows to hammer, an attacker can flip specific bits in memory, which can change stored data, cause errors, or even break security protections.

Who Would Be Interested?

This backdoor injection technology poses risks to several key stakeholders and could be exploited by various threat actors:

MLaaS (Machine Learning as a Service) providers and their users are primary targets, as the attack could compromise AI models on multi-tenant cloud platforms. A successful attack could damage provider reputations and compromise user data and services.

Critical infrastructure operators using AI systems could face significant risks, particularly in autonomous driving and medical imaging applications where backdoored models could pose direct public safety threats through systematic misclassification of critical inputs.

Any organization deploying full-precision neural networks in production environments could be targeted, especially since the attack requires only white-box model access and co-location on the same physical machine - conditions that may be achievable through various attack vectors including malicious insiders or compromised cloud infrastructure.

Our research demonstrates that the attack's stealth (minimal benign accuracy degradation), effectiveness (near-perfect attack success rates), and efficiency (single bit flip requirement) make it particularly attractive to threat actors seeking persistent, undetectable backdoors in AI systems across any sector relying on deep neural networks.

What Are the Prerequisites?

For someone to inject backdoors into AI systems using this method, they would need:

1. Co-location on the target machine. The attacker needs to run their attack code on the same physical computer that hosts the victim AI model. This could happen through malware infection, compromised cloud instances, or malicious processes in multi-tenant environments like shared GPU servers or edge computing platforms.

2. Vulnerable memory hardware. The target machine must have DRAM susceptible to Rowhammer attacks, which includes most DDR3 and DDR4 memory modules currently in use. The attacker needs to identify memory cells that can be reliably flipped through repeated memory access patterns, which is possible on billions of devices worldwide.

3. Model architecture knowledge. The attacker requires white-box access to understand the AI model's structure, weight locations in memory, and a small set of sample data for optimization. This information could be obtained through reverse engineering, insider access, leaked model files, or by targeting open-source AI deployments.

4. Trigger injection capability. After successfully flipping the target bit, the attacker must be able to feed specially crafted inputs containing their optimized triggers to the compromised AI model. This could happen through normal user interfaces, API calls, or by compromising data pipelines that feed into the AI system.

Which Devices Can Be Attacked?

Based on our experimental validation, systems with Rowhammer-vulnerable DRAM running full-precision deep neural networks can be affected, including: Servers with DDR3 memory modules (demonstrated on 16GB Samsung DDR3); Workstations with DDR4 memory (demonstrated on 8GB Hynix DDR4); AI inference servers running popular models like ResNet, VGG, and Vision Transformers; Edge computing devices with vulnerable DRAM hosting neural networks; Cloud platforms using DDR3/DDR4 memory for AI model deployment; Research computing systems running full-precision (32-bit floating-point) models; Multi-tenant GPU servers where attackers can co-locate with victim models; Any system running Ubuntu 22.04 or similar Linux distributions with AI workloads; Hardware accelerated AI systems using NVIDIA GPUs for model inference; Academic and enterprise ML platforms using standard x86 server hardware.

How Does It Work?

Attack architecture

The attack works in three main steps:

1. Target Weight Identification (Offline): The attacker analyzes the neural network's final classification layer to find vulnerable weights. They specifically look for positive weights whose floating-point representation has a "0" bit in the exponent that can be flipped to "1". This creates an eligible pattern where a single bit flip dramatically increases the weight value (e.g., changing 0.75 to 1.5) without destroying the model's normal functionality.

2. Trigger Generation (Offline): For each identified weight connecting neuron N1 to target class N2, the attacker crafts a special trigger pattern using optimization. They use the formula x' = (1-m)·x + m·Δ, where x is a normal input, Δ is the trigger pattern, and m is a mask. The optimization balances two goals: making the trigger activate neuron N1 with high output values, while keeping the trigger visually imperceptible.

3. Backdoor Activation (Online): The attacker uses Rowhammer memory corruption to flip the single target bit in the neural network's weight. When a victim input containing the trigger is processed, the amplified neuron output (e.g., 10) multiplied by the increased weight (e.g., 1.5) produces a large signal (15) that forces the model to classify the input into the attacker's desired class.

About Our Research

This work was supported in part by the US National Science Foundation (NSF) under grants CNS-2304720 and CNS-2309477. It was also supported in part by the Commonwealth Cyber Initiative (CCI).

BibTeX


@inproceedings{li2025oneflip,
title={Rowhammer-Based Trojan Injection: One Bit Flip Is Sufficient for Backdooring DNNs},
author={Li, Xiang and Meng, Ying and Chen, Junming and Luo, Lannan and Zeng, Qiang},
booktitle={USENIX Security Symposium},
year={2025}
}