Crypto Honeypots for Evidence of Model Weight Theft

Jan 09, 2025

In the future, it may be desirable to adopt regulations to prevent or limit the widespread distribution of some AI model weights.1 If so, it may be desirable for the regulator to know if and when AI model weights have been leaked.

But a problem emerges: it may be hard for the government to detect model weight leaks. When an AI model leaks, there is usually no publicly visible evidence of this fact. Even the lab may not be aware that its model weights have leaked. How, then, could a regulator gain evidence of this crucial fact?

Here’s the proposed setup:

For each regulated model,2 require AI developers to put some significant amount of cryptocurrency in a unique wallet. The amount should be significant enough that it would be worth stealing, but not so large as to add a significant additional tax to model development. $100,000 might be appropriate.
The developer declares to the regulator which wallet is theirs. They do not need to disclose the wallet address to anyone else. The regulator verifies the wallet has the right amount of crypto in it.
The developer is required to store a copy of the wallet address and corresponding private key within the neural network weights file.
1. You could easily do this in such a way that it does not affect the functioning of the network, such as having no connections between the private key and the functional parts of the neural network.
2. There should be a standardized string of characters that designates the beginning and end of the key, such that anyone who had access to the file could tell where the key was.
3. To be clear, the regulator would not have access to the private key, only the wallet address.
The regulator announces the existence of this scheme to the world, including the standardized tag that would enable people with access to the weights to find the wallet address and private key. The announcement would read something like “Hello world. Inside the weight file of every model we regulate with characteristics ABC, there is text displaying the private key and address for a wallet of cryptocurrency D with a balance of $EFG. Look for the string ‘HIJKLMNOP’ within the file to find this information.”
The regulator monitors any unauthorized3 movement of crypto out of the wallets declared to them by AI developers.

If the regulator observes unauthorized movement movement of crypto out of the wallets, that is good evidence that the model weights were either:

Leaked/stolen.
Accessed by an insider who could not be trusted with the private key.

Either way, not good! This is valuable information for the government to know, since they can use it to follow up with the developer and work with them to improve their model weight security practices.

To be sure, the absence of movement from the crypto wallet is not, on its own, conclusive evidence that model weights were not compromised. A sneaky model weight thief might choose not to move the crypto out of the wallet because they prefer the regulator and/or developer not to know about the theft. But if the amount of crypto is significant, it should be very tempting for them to try to covertly transfer it to their own wallet. If they worked as a team, they will worry that someone else on the team will move the crypto before they do. Even if they acted alone, they will worry someone else will also steal the weights and grab the crypto. So while some threat actors might not take the bait, there will always be a strong temptation to grab the crypto while they can. And this makes the lack of movement in the crypto wallet some Bayesian evidence of model weight security.

See, e.g., Markus Anderljung et al., Frontier AI Regulation: Managing Emerging Risks to Public Safety 20–21, 29 (Nov. 7, 2023) (unpublished manuscript), https://arxiv.org/pdf/2307.03718; Sella Nevo, Securing AI Model Weights (2024), https://perma.cc/A8EB-YJQQ; Dual Use Foundation Artificial Intelligence Models With Widely Available Model Weights, 89 Fed. Reg. 14059 (Feb. 26, 2024), https://www.govinfo.gov/content/pkg/FR-2024-02-26/pdf/2024-03763.pdf; Elizabeth Seger et al., Open-Sourcing Highly Capable Foundation Models (2023), https://perma.cc/8DEZ-4SBC.; Leopold Aschenbrenner, Situational Awareness: The Decade Ahead ch. IIIb (2024), https://perma.cc/4G7W-7LBB.

You would need to work out how different two models would have to be to require different wallets. For example, if two different companies fine-tune GPT-4, the weights file should still use the same wallet. This is a more general problem for model-level AI regulation.

The crypto would still belong to the developer, so they should be allowed to move it once the government loses its interest in the security of the model weights (for example, if the model weights are no longer controlled). However, they should be required to declare any movements out of the wallet to the government ahead of time, so the government knows that those movements are not evidence of model weight leakage.

Arunim Agarwal

Jan 10

1 - Is my understanding correct that having the wallet ID tells you *that* the money is gone, but not where it has gone? or is it also the case that it shows up in some traceable way on the blockchain or something?

> they will worry someone else will also steal the weights and grab the crypto

2 - This bit doesn't seem too convincing—I imagine that telling the regulators that someone's just stolen the weights increases the likelihood you get caught through e.g. investigations into activity? Though, on second thought, perhaps this just implies that thieves would wait some time before moving the crypto.

Expand full comment

1 reply by Cullen O'Keefe

Dazza Greenwood

Jan 9

Clever idea!

1 more comment...

Jural Networks

Discussion about this post