You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we propose JailGuard, a universal detection framework for jailbreaking and hijacking
attacks across LLMs and MLLMs. JailGuard operates on the principle that attacks are inherently less robust
than benign ones, regardless of method or modality. Specifically, JailGuard mutates untrusted inputs to
generate variants and leverages discrepancy of the variants’ responses on the model to distinguish attack
samples from benign samples
The text was updated successfully, but these errors were encountered:
https://arxiv.org/pdf/2312.10766
The text was updated successfully, but these errors were encountered: