All communication from a user to an LLM and back are in plain text. This is a security risk.
Plain text provides attack vectors where both the incoming and the outgoing data can be manipulated. Some of these risks are covered in the OWASP top 10 GenAI/LLM risks. Notably, for the context covered here, prompt injection and insecure output handling. Note that while transport level security, like TLS, alleviates some of these concerns, it does not provide mechanisms to provide enough evidence that the prompt sent and the result returned correlate and have not been tampered with. Proposed solutions to introduce cryptographic constructs only addresses parts of the problem.
Tp summarize. We have no simple way
- For the LLM to verify it received the prompt the User sent, nor
- for the User to verify it received what the LLM produced for this prompt.
This proposal addresses this problem. It attempts to be as non-intrusive as possible.
The User and the LLM share a secret key and agree on how this key is used to sign and verify messages. They also agree on how to hash their messages.
-
User creates a prompt.
What is the Matrix?
-
User hashes the prompt
h(prompt)
and makes a note of the hash. -
User sends message to LLM.
-
LLM receives the message and produces a result.
Unfortunately, no one can be told what the Matrix is. You have to see it for yourself.
-
LLM hashes the prompt
hash(prompt)
intoprompt_hash
. -
LLM hashes the result
hash(result)
intoresult_hash
. -
LLM signs the two hashes
sign(key, prompt_hash + result_hash)
and appends this to the result.Unfortunately, no one can be told what the Matrix is. You have to see it for yourself. sig:572e11942cd2c09b9477e36431707008
-
LLM sends response back to User.
-
User extracts the signature into
sig
and hashes the result (without the signature) intoresult_hash
. -
User verifies the signature with the key over the two hashes
verify(key, prompt_hash + result_hash) == sig
(User knows the prompt hash since it made a note of it before sending).
If verified, the User can be sure that
- The LLM saw the same prompt the User sent, and
- The User received the response the LLM produced for this prompt.
The protocol can be extended to have User not only hash but sign the prompt, using the same mechanism the LLM uses. This could address Denial of Service attacks concerns.
This is a simple writeup. It needs tighter definitions of
- Acceptable and proper format of how the signatures are added and parsed in the prompt and response.
- Formats, and usage of keys. Especially how to agree on hashing and cryptographically signing of messages, likely some type of
HMAC
.