Skip to content

facebookresearch/wmar

Repository files navigation

Watermarking Autoregressive Image Generation 🖼️💧

Official implementation of Watermarking Autoregressive Image Generation. This repository provides a framework for watermarking autoregressive image models, and includes the code to reproduce the main results from the paper. In wmar_audio we also provide the code accompanying our case study on Audio (see Section 5 in the paper).

[Paper] [Colab]

💿 Installation

1️⃣ Environment

First, clone the repository and enter the directory:

git clone https://github.com/facebookresearch/wmar
cd wmar

Then, set up a conda environment as follows:

conda create --name wmar python=3.12
conda activate wmar

Finally, install xformers (which will include Torch 2.7.0 CUDA 12.6) and other dependencies, and override the triton version (needed for compatibility with Chameleon).

pip install -U xformers --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
pip install triton==3.1.0 

We next describe how to load all autoregressive models, finetuned tokenizer deltas, and other requirements. The simplest way to start is to execute notebooks/colab.ipynb (also hosted on Colab) which downloads only the necessary components from below. We assume that all checkpoints will be placed under checkpoints/.

2️⃣ Autoregressive Models

Instructions to download each of the three models evaluated in the paper are given below.

  • Taming. You need to manually download the model weights following the instructions from the official repo and place them in e.g., checkpoints/. In particular, go to https://app.koofr.net/links/90cbd5aa-ef70-4f5e-99bc-f12e5a89380e and download it to checkpoints/2021-04-03T19-39-50_cin_transformer, setting that path as --modeldir flag when executing the code (see below). To adapt the model configs to the paths in our codebase execute:

    sed -i 's/ taming\./ deps.taming./g' checkpoints/2021-04-03T19-39-50_cin_transformer/configs/vqgan.yaml
    sed -i 's/ taming\./ deps.taming./g' checkpoints/2021-04-03T19-39-50_cin_transformer/configs/net2net.yaml
    
  • Chameleon. Our runs can be reproduced with the open-source alternative Anole, following these instructions. In particular, in your checkpoints/ run:

    git lfs install
    git clone https://huggingface.co/GAIR/Anole-7b-v0.1
    

    And then set --modelpath flag when running the models to checkpoints/Anole-7b-v0.1. Before this, patch Anole to make it compatible with the Taming codebase (you need to also download Taming above for this step):

    python -c 'from wmar.utils.utils import patch_chameleon; patch_chameleon("checkpoints/Anole-7b-v0.1")'
    
  • RAR. RAR-XL is downloaded automatically on the first run; set --modelpath to the directory where you want to save the tokenizer and model weights, e.g., checkpoints/rar.

3️⃣ Deltas of Finetuned Tokenizers

We provide links to weight deltas of the tokenizers finetuned for reverse-cycle-consistency (RCC) that we used in our evaluation in the paper:

Model Finetuned Finetuned+Augmentations
Taming Encoder / Decoder Encoder / Decoder
Chameleon/Anole Encoder / Decoder Encoder / Decoder
RAR Encoder / Decoder Encoder / Decoder

To use them, download the files and place them in e.g., checkpoints/finetunes/, setting --encoder_ft_ckpt and --decoder_ft_ckpt flags accordingly when running the code (see below). These deltas should be added to the original encoder/decoder weights, which is automatically handled by our loading functions.

Alternatively, you can:

  • download them automatically by running:
    mkdir -p checkpoints/finetunes && cd checkpoints/finetunes && wget -nc https://dl.fbaipublicfiles.com/wmar/finetunes/taming_encoder_ft_noaug_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/taming_decoder_ft_noaug_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/taming_encoder_ft_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/taming_decoder_ft_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/chameleon7b_encoder_ft_noaug_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/chameleon7b_decoder_ft_noaug_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/chameleon7b_encoder_ft_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/chameleon7b_decoder_ft_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/rar_encoder_ft_noaug_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/rar_decoder_ft_noaug_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/rar_encoder_ft_delta.pth https://dl.fbaipublicfiles.com/wmar/finetunes/rar_decoder_ft_delta.pth && cd - 
  • or use the finetune.py script to finetune the models yourself (see below).

4️⃣ Other Requirements

To use watermark synchronization, download WAM:

wget https://dl.fbaipublicfiles.com/watermark_anything/wam_mit.pth -P checkpoints/

To evaluate watermark robustness, download the DiffPure model:

wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt checkpoints/

🎮 Usage

1️⃣ Quickstart

The notebook colab.ipynb (open in Colab) is a good starting point. It downloads the necessary components to run watermarked generation with RAR (RAR, finetuned deltas, WAM) and illustrates the robustness of the watermark to transformations.

2️⃣ Large-scale generation and evaluation

We describe how to start a larger generation run and the follow-up evaluation and plotting that follows our experimental setup from the paper and reproduces our main results. We focus on the Taming model, aiming to reproduce Figures 5, 6 and Table 2 in the paper. Before starting make sure to follow the relevant parts of the setup above.

For each of the 4 variants evaluated in the paper (Base, FT, FT+Augs, FT+Augs+Sync), we generate 1000 watermarked images and apply all the transformations using generate.py. The 4 corresponding runs are documented in a readable form in configs/taming_generate.json. For Taming, we provide the corresponding 4 commands in configs/taming_generate.sh. For example, to run FT+Augs+Sync, execute:

python3 generate.py --seed 1 --model taming \ 
--decoder_ft_ckpt checkpoints/finetunes/taming_decoder_ft_delta.pth \ 
--encoder_ft_ckpt checkpoints/finetunes/taming_encoder_ft_delta.pth  \
--modelpath checkpoints/2021-04-03T19-39-50_cin_transformer/ \ 
--wam True --wampath checkpoints/wam_mit.pth \
--wm_method gentime --wm_seed_strategy linear --wm_delta 2 --wm_gamma 0.25 \
--wm_context_size 1 --wm_split_strategy stratifiedrand \
--include_diffpure True --include_neural_compress True \
--top_p 0.92 --temperature 1.0 --top_k 250 --batch_size 5 \
--conditioning 1,9,232,340,568,656,703,814,937,975 \
--num_samples_per_conditioning 100 \
--chunk_id 0 --num_chunks 1 \
--outdir checkpoints/0617_taming_generate/_wam=True_decoder_ft_ckpt=2_encoder_ft_ckpt=2  

Evaluation can be speed up by increasing the batch size, and parallelizing the evaluation using chunk_id and num_chunks (see configs/rar_generate.json for an example). Each such run will save the outputs under out/0617_taming_generate, that we can parse, aggregate, and plot as follows:

from wmar.utils.analyzer import Analyzer
outdir = "out/0617_taming_generate"
watermark = "linear-stratifiedrand-h=1-d=2.0-g=0.25"
methods = {
    # "name": (outdir, relevant_dir_prefix, watermark_as_str)
    "original": (outdir, "_wam=False_decoder_ft_ckpt=0", watermark),
    "finetuned_noaugs": (outdir, "_wam=False_decoder_ft_ckpt=1", watermark),
    "finetuned_augs": (outdir, "_wam=False_decoder_ft_ckpt=2", watermark),
    "finetuned_augs+sync": (outdir, "_wam=True_decoder_ft_ckpt=2", watermark)
}
analyzer = Analyzer(methods, cache_path="assets/cache.json")
analyzer.set_up_latex()
analyzer.plot_l0_hist(save_to=f"{outdir}/l0_hist.png")
analyzer.plot_auc(save_to=f"{outdir}/auc.png")
analyzer.plot_robustness(save_to=f"{outdir}/robustness.png")

The same code is also placed in notebooks/analyze.ipynb that also shows the result after a successful run, i.e., figures similar to Fig. 5 and Fig. 6 in our paper, and Table 2.

To do the same for other models refer to other config files provided in configs/.

3️⃣ Finetuning (TODO)

Coming soon!

🧾 License

The code is licensed under an MIT license. It relies on code and models from other repositories. See the next Acknowledgements section for the licenses of those dependencies.

🫡 Acknowledgements

Some root directories are adapted versions of other repos:

The modifications are primarily done to introduce watermarking and enable finetuning.

Additionally, within wmar_audio and wmar (marked on top of each file in the latter) some code is taken from:

All of these dependencies are licensed under their respective licenses:

  • MIT license for Taming, Moshi, Audiocraft, VideoSeal, and Watermark-Anything,
  • Apache 2.0 for RAR,
  • UMD Software Salient ImageNet Copyright (C) 2024 University of Maryland for Watermark Robustness,
  • Chameleon License for Chameleon and Anole

Each of the repositories provides their own license for model weights, which are not included in this repository. We refer to the original repositories for more details on these.

🤝 Contributing

See contributing and the code of conduct.

📞 Contact

Nikola Jovanović, [email protected]

Pierre Fernandez, [email protected]

✍️ Citation

If you find this repository useful, please consider giving a star ⭐ and please cite as:

@article{jovanovic2025wmar,
  title={Watermarking Autoregressive Image Generation},
  author={Jovanovi\'{c}, Nikola and Labiad, Ismail and Sou\v{c}ek, Tom\'{a}\v{s} and Vechev, Martin and Fernandez, Pierre},
  journal={arXiv preprint arXiv:..},
  year={2025}
}

About

Official implementation of the paper "Watermarking Autoregressive Image Generation"

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •