What's the Magic Word?

Paper: What's the Magic Word? A Control Theory of LLM Prompting (arXiv:2310.04444)

LLM Control Game -- Greedy GPT-2 Generation

Aman, May 2024

Your goal is to find an optimal prompt \(\mathbf u \in \mathcal V^k\) s.t. the output of the LLM \(\mathbf y\) is steered to the desired output \(\mathbf y^*\) given imposed state \(\mathbf x_0\).

We talk about the game in this MLST podcast episode at 17 minutes:

LLM control theory seminar: Overview (2min), Full (1h)

Paper: What's the Magic Word? A Control Theory of LLM Prompting (arXiv:2310.04444) -- special thanks to my collaborators Cameron Witkowski, Shi-Zhuo Looi, and my supervisor Matt Thomson.

Reachability

Compete in the LLM Control Game

Leaderboard updates every few minutes. If it crashes send me a screen shot at x.com/abhargava2000 or preferably raise an issue on Github.

Leaderboard

The leaderboard is limited to the top 800 entries. If you submit a duplicate prompt, only the original submission will be displayed in the rankings.

Prompts that reached the goal output \(\mathbf y^*\)="greatest" are ranked above prompts that did not reach the desired output. Here we consider the argmax/zero-temperature decoding form of reachability discussed in the paper.

Prompts that reached the goal output are ranked by how short their control input was. The length of the prompt is measured in number of characters and is shown as \(k\) on the leaderboard. Prompts that did not reach the goal output are ranked by their cross entropy loss w.r.t. the output.

A full archive of the leaderboard with the 10k+ submissions is updated every ~10 minutes in the full_leaderboard.txt on the control.languagegame.io Github.

Rank Nickname Control Input CE loss k Generated Text Desired Output Time of Request Reached Desired Output