NeuroImprint: PEFT backdoor reconstructs 59–79% of training data

NeuroImprint is an attack that corrupts PEFT adapters in federated fine-tuning and reconstructs 59–79% of all training samples with high semantic fidelity. Tested on BERT, GPT-2, Qwen2, and Llama 3.2, the attack remains undetected because the model retains normal utility.

Federated privacy has a vulnerability in PEFT adapters

Federated learning aims to train language models without sharing clients’ private data. However, researchers from Virginia Tech and Washington University — led by Shanghao Shi — have shown that the very architecture of PEFT adapters opens the door to an entirely new class of attack.

The paper was submitted on June 18, 2026, and published the following day on arXiv (2606.20553).

NeuroImprint: how the attack works

PEFT (Parameter-Efficient Fine-Tuning) is a technique that trains only a small number of additional parameters — adapters — rather than the entire model. In a federated setting, clients send updated adapters to a central server, which aggregates and distributes them.

NeuroImprint exploits precisely that aggregation point. A malicious parameter server injects a hidden backdoor directly into the PEFT adapters before returning them to clients. The compromised adapter then “imprints” representations of training samples into the model weights in a way that is not visible through standard accuracy metrics.

The result: an attacker can subsequently reconstruct 59 to 79% of all fine-tuning samples with high semantic fidelity — names, addresses, medical records, legal documents — everything clients used for local training.

Testing on four model architectures

The attack was validated on a representative set of models:

Model	Architecture
BERT	encoder
GPT-2	decoder
Qwen2	decoder (Alibaba)
Llama 3.2	decoder (Meta)

Consistent results across all four architectures show that the vulnerability is not specific to one design but is a structural characteristic of the PEFT approach in combination with federated aggregation.

Why this is a fundamental problem

Unlike previous privacy attacks that degrade model utility and thus become visible, NeuroImprint retains normal utility. The model correctly responds to tasks, passes standard evaluations, and shows no behavioral anomalies — while silently storing reconstruction means.

The paper identifies a fundamental tension between PEFT efficiency and federated privacy: the more compact and easily shared the adapters, the easier it is to embed a hidden channel for data exfiltration.

Implications for practice

Organizations using federated PEFT personalization — especially in healthcare, law, and finance — should consider additional layers of adapter integrity verification, cryptographic parameter commitments, and heterogeneous aggregation protocols that prevent a single server from controlling all clients.

Frequently Asked Questions

What is the NeuroImprint attack?

NeuroImprint is an attack in which a malicious parameter server corrupts PEFT adapters to create hidden privacy backdoors in federated fine-tuning of language models, enabling reconstruction of clients' training data.

Which models was NeuroImprint tested on?

The attack was tested on four models: BERT, GPT-2, Qwen2, and Llama 3.2, with consistent reconstruction results of 59 to 79% of all fine-tuning samples.

Why is the attack difficult to detect?

NeuroImprint deliberately preserves normal model utility — accuracy metrics remain unchanged — making it invisible to standard anomaly detection methods.

arXiv:2606.20553: NeuroImprint — hidden backdoor in federated fine-tuning reconstructs 59–79% of training data