Reverse-engineering LLM results to find prompts used to make them

Just like Jeopardy, where you can guess the question from the answer, you can guess the prompt from an LLM generation.

Nov 28, 2023

∙ Paid

Reverse-engineering LLM results to find prompts used to make them — "Figure 1: Under the assumption that a language model is offered as a service with a hidden prefix prompt that produces next-word probabilities, the system is trained from samples to invert the language model, i.e. to recover the prompt given language model probabilities for the next token."

As language models become more advanced and integrated into our daily lives through applications like conversational agents, understanding how their predictions may reveal private information is increasingly important. This post provides an in-depth look at a recent paper exploring the technical capabilities and limitations of reconstructing hidden prompts conditioned on a language model.

Keep reading with a 7-day free trial

Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.