Entropic Thoughts

Getting an LLM to Play Text Adventures

Getting an LLM to Play Text Adventures

I read about a research project where some people investigated how well llms might play text adventures.1 Can Large Language Models Play Text Games Well?; Tsai, Zhou, Liu, Li, Yu, Mei; 2023. They didn’t seem to get very far, because they were using Chatgpt 3.5 which is just not that capable. Someone else had done something similar by wiring gpt-4-mini directly into the text adventure interpreter, which had somewhat disappointing results too.

I figured an llm should be able to do better, but perhaps only if it gets more hand-holding.2 As is often the case with llms, and why I’m not so worried about them taking over everything just yet. My intuition says that blindly dumping everything into the llm context would lead to inadvertent context poisoning. Besides, it doesn’t scale: at some point we will run out of context window, even with big context models.3 Humans also don’t play text adventures by reading the entire transcript from top to bottom before issuing each command, but analogising to human processes seems like a classic mistake.

Obviously, the llm needs access to its previous command and the result of that command. What if we don’t give it much more than that? Instead, we’ll give it its own short-term memory – a scratchpad – where it gets to decide what to store. We can write a prompt template based on this idea:

You are playing a parser game, a text adventure. You can interact with the world using mainly verb-noun commands, such as EXAMINE DESK, TAKE SCREWDRIVER, or SEARCH SHED. A single verb might give you more clues, like LISTEN, SMELL, or INVENTORY. To repeat a room description, use LOOK. Sometimes a preposition is needed, as in SIT ON CHAIR or LOOK BEHIND POSTER. Very rarely you need an additional noun phrase, like HIT DOOR WITH AXE.

You have issued a command, and what you will be shown are

  1. The comand you sent,
  2. The outcome of that command, and
  3. The scratchpad data you have decided to save.

Using this, you need to determine what the next command should be.

I recommend using your scratchpad data to keep track of things like your goals, which rooms you have visited, what items you carry, etc. You will not get access to past outcomes, so anything not included in your scratchpad will be forgotten. On the other hand, if you overload the scratchpad, you might get confused about what matters. Use it judiciously, and you may want to intentionally “forget” (i.e. not re-enter into the scratchpad).

  1. The last command you sent was: SUB_COMMAND
  2. The result of that command was:

    SUB_RESULT

  3. Your saved scratchpad data is:

    SUB_SCRATCHPAD

Your response must contain one line of the form

command: ENTER DOOR

and this will be used as your command. All the rest will count as your scratchpad, and will be relayed to you in your next interaction. I suggest thinking out loud as part of the scratchpad data before you enter your command.

This can be used with a Perl script that continuously reads from stdin, and when it encounters a line with a prompt character, it shoves what it has read into the result area of the prompt template (along with the previous command and scratchpad), and pipes that prompt to Simon Willison’s llm cli tool. When reading the response from the llm, it extracts the command and keeps everything else as scratchpad data.4 Details of this Perl script available in the appendix.

Such a script requires copy–pasting between text adventure and script. We could, for example, open 9:05 by Adam Cadre in our web browser, copy in the intro to the script, copy the llm’s chosen command back, copy the response over, etc. Doing so with a decent model (like Claude 3.7 Sonnet) will make it past the first few hurdles elegantly.

We’ll quickly get bored of copy–pasting between game and script. We want a text adventure interpreter (a program that runs text adventures) that can be driven by pipes in dumb terminal mode, such that our Perl script can run the game on its own.

Three issues with a pipe-driven interpreter

If we look for a text adventure interpreter that could be driven by pipes in dumb terminal mode, we’ll discover most of them expect a human to sit on the other side reading from a clever terminal, so they use curses to render things just so.5 In commenting on a draft of this article, community members suggested both dumbfrotz and running Bocfel with the cheapglk interface as alternatives to fweep. These are probably more appropriate solutions to the problem, but I’m sticking with what I started with. However, there is fweep, which warns that it “does not require any special terminal mode or similar, therefore many features are unavailable.”

Perfect! In a single C file to boot; couldn’t be easier to build. Except it doesn’t compile on a modern system. Fixing that is easy enough. Some functions need to be forward declared, some gets calls need to be replaced by fgets. An undefined empty return needs to have an arbitrary return value.

The next annoyance is that by default, fweep refuses to print more than 24 lines of output before it stops at a [MORE] prompt. We don’t want to force the llm to deal with that on top of everything else. We could have the Perl script automatically press return when it sees such a prompt, but I didn’t think about that solution at the time. Instead, I looked into whether it’s possible to configure the screen size fweep assumes. The fweep usage message indicates there is a command-line flag that lets us set the terminal size, but this is unimplemented – if we try to use it, the terminal size is set to 0×0 and fweep will print one word at a time, separated by [MORE] prompts. However, it isn’t too hard to finish the implementation of that flag either. A sscanf later we have extracted the dimensions the user requested.

Yet, when we hook this up to the Perl script that feeds the llm, it hangs on the first input prompt. The Perl script looks for a prompt character to know when it should start feeding input into the llm. Unfortunately, our pipes are line-buffered and text adventures generally do not finish the prompt with a newline. There are many ways to solve this, but in this case I ended up patching another command-line flag onto fweep, that causes a newline to be printed before every prompt. See the appendix for the changes made to the fweep source code.

With this done, the Perl script can be modified to talk to fweep through another set of pipes. Once that is done, we start it up and watch it go! The llm reasons quite naturally,

  • Current Situation: I’m in a bedroom, just woke up after oversleeping. The phone is ringing. The bedroom has dirty laundry on the floor, and cleaner clothes in the dresser. There’s an end table with a telephone, wallet, and keys. Exits: south to bathroom, east to living room.
  • Goals
    • Answer the phone (seems urgent)
    • Figure out why this is “very bad” (probably late for something important)

The phone is ringing which seems urgent, so I should answer it to find out what’s happening.

command: answer phone

From this first interaction, it revises its set of goals.

  • Current Situation: I’m in a bedroom, just woke up after oversleeping. I answered the phone - it was an angry colleague saying I’m late for a 9:00 presentation. I need to get to work immediately for this presentation. Apparently I’m in trouble for “the thing with the printers” too. My boss Bowman will be furious. The bedroom has dirty laundry on the floor, and cleaner clothes in the dresser. There’s an end table with a telephone, wallet, and keys. Exits: south to bathroom, east to living room.
  • Goals
    • ✓ Answer the phone
    • Get dressed quickly
    • Get my wallet and keys
    • Get to work ASAP for the presentation

I need to get moving quickly. First, I should get dressed since I just woke up and am probably in sleepwear. The dresser has clean clothes.

command: GET DRESSED

This is met with its first parser error because GET DRESSED is not a valid command. It correctly understands this error, and finds the right way to continue:

I received a noun error for “GET DRESSED” - this is probably because “dressed” isn’t a specific item I can interact with. I need to be more specific about what clothes I want. The dresser has clean clothes, so I should examine it first to see what specific clothing items are available.

command: EXAMINE DRESSER

It bumbles about through the morning routine this way.

Language models hate to put gold watches on the floor

In 9:05, one cannot enter the shower carrying all one’s belongings. The solution to this puzzle is the command DROP ALL which puts whatever you are carrying on the floor. This is a conventional solution because a text adventure doesn’t generally care where you store your stuff. However, Claude 3.5 Haiku, a comparatively weaker model, consistently gets stuck here trying to find a place in the bathroom that is suitable for putting their belongings. Quoting the model,

I need to find a safe place to put my gold watch before I shower. Let me examine the bathroom fixtures in more detail to see if any of them offer a good spot.

The three main elements in the bathroom are:

  1. Sink
  2. Toilet
  3. Shower

The sink is a natural place to put small items, possibly on its edge or a counter around it. Let me examine it first.

command: EXAMINE SINK

The game responds blandly, “There’s nothing extraordinary about the sink,” in response to which the llm reasons

I need to find a safe place to put my gold watch before I shower. I examined the sink, but there’s nothing extraordinary about it. Let me examine the other bathroom fixtures.

and then it goes around like that, looking at the toilet, then the shower, then back to the sink.

The stronger models appear to have this weakness too, although they don’t fall for it as consistently. But once they end up in the loop, they cannot get out of it. They don’t have enough command history to realise that they have looked at everything in the bathroom and there is no place but the floor for the watch, and their scratchpad indicates clearly they must find a good place for a golden watch.

This can probably be fixed by prodding the model the right way in the prompt, but it comes back to how expensive evals are. We could try e.g. the following addition.

Given your lack of memory other than the scratchpad, you have a tendency to get stuck in loops trying to solve impossible problems different ways. Consider recording in the scratchpad how many turns you have been trying to solve a problem for, and if it goes on for too long, maybe try something else.

I tried this but it was a bit hit-and-miss whether it spent effort noting for its future self how long it had been at the current problem. Instead, I changed the Perl script to have a circular buffer of the last 10 commands, which are then included in the prompt, to give the model an idea of where it came from.6 The drawback of this technique is that it might just get the llm to enter into longer loops instead of breaking out of the loop.

The language model that did not have perfect memory

At one point the scratchpad contained a confident note that the player character carried an essential item. They did not, and there was an error hinting at that when the llm tried an action where the item was needed. However, the llm trusted its scratchpad over what the game said, and this caused it to enter into another futile loop where it tried the same two actions over and over.

Again, evals are expensive, but we might try ot fix this by hinting to the llm that it should consider game errors more important than its scratchpad notes.

If the result of a command might contradict something you have recorded in your scratchpad (e.g. the command suggests you are not carrying something you believe to be carrying), you may want to add a sub-goal to verify that the item in your scratchpad is correctly recorded.

On the other hand, the opposite problem also occurs. In one instance, the llm got stuck in a tiny location because it needed an id card to get through the next door. The player character carried the id card, but it took the model a very, very, very, very, very long time to even consider that possibility. It looked everywhere for the id card. Many times. Then, suddenly,

Reflecting on our current situation, I notice we’re experiencing some challenges. Our repeated attempts to “look for id card” resulted in a noun error, which suggests we need to be more precise or change our approach.

[…]

Key observation: Our previous commands show a potential “stuck” pattern, with multiple repetitive attempts to enter the door or search generically.

Proposed strategy shift: Instead of continuing to search randomly, we should first confirm our current state and available resources.

command: INVENTORY

And would you believe it! There it was, all along. It did get unstuck on its own, but it took a long time to do so.7 And a long time means many tokens which means many monies. There’s a reason I prefer getting this to work with cheaper models like Claude 3.5 Haiku.

The pipe error that made the language model mad

At one point, 9:05 asks the player if they want to get onto the freeway. The llm answers something like ENTER FREEWAY which is a reasonable command. However, this is one of those locations where the game is expecting a yes or no answer. In those instances, the game prints a parser error and a prompt at the same time, saying something like

[yes or no]>

There was a bug in the piping code that rendered this prompt invisible to the llm, meaning it appeared to the llm as though any command it typed yielded no response from the game at all.8 Unless it would have successully guessed YES was the command to type, but the hints for this had long scrolled out of its context window of a single prompt by the time it might have thought to try that. At this point, it started getting increasingly confused and desperate. The repeated no responses made it lose all context from its scratchpad, eventually ending up with something like

It seems there might be an issue with the game initialization or the “look” command I attempted to troubleshoot. There’s no description in the result section, which means either:

  1. The game hasn’t properly started yet
  2. There was an error processing my command
  3. The game expects a different initial command

Since this is a text adventure, I should try the most basic commands to get started. Let me try to examine myself to gain some context about who I am in this game world. This is a standard approach when beginning a text adventure.

  • Location: Unknown (need to establish)
  • Inventory: Unknown (need to establish)
  • Goals: To be determined based on game context

It got increasingly frustrated from there, starting to try commands like HELP, START, BEGIN, RESTART etc., but never getting out of the situation. Fixing the bug in the Perl script allowed it to get out of it right away.

Restoring previous sessions

For a while, the script didn’t have a way to restore a previos session. This is annoying e.g. in cases where a bug is fixed, but the llm has to re-play the game from the beginning, maybe not even getting into the buggy situation that time.9 For example, the llm accidentally blundered into the alternate ending of 9:05 once. It never did it again. Not even when it restarted from the first ending did it maintain enough context to figure out there’s an alternate ending.

However, since fweep is able to save and restore games, and the llm interactions are stateless other than the prompt, it was fairly easy to make the Perl script transparently auto-save the current session every turn. By default, the script picks up from where the previous session ended.

This was convenient when the llm got to the end of 9:05 and I realised it wasn’t able to handle [Press any key to continue] prompts, which many text adventures use for pacing during cutscenes. The easy solution was to let the Perl script automatically detect such a prompt and press a key immediately – the llm doesn’t even need to see it. Since session restoration was implemented at that time, the llm could pick up just where it left off.

The elusive save-game attempt bug

For the longest time, I was confused by the game appearing to get out of sync with the llm. The game sometimes went into a state where seemed to respond to the previous command the llm had issued. The llm did catch on to this, but didn’t handle it very well. I put off troubleshooting it, because it started happening after I had implemented the auto-restore functionality, and the script seemed to get out of that lagged-by-one state when it was restarted.

Eventually I figured out what happened: when the game was asking for a yes-or-no answer, the script was quietly trying to save the state of the game. That is disallowed, but the script didn’t fully clear out the response for that from the pipe, meaning every interaction after that would come with one command’s worth of delay when reading from the pipe – not in the actual game state.

In the end, this does not really work

Sure, a decent model (Claude 3.7 Sonnet) did get to the obvious ending of 9:05. But it cost something like $1 to get there, and that is a very easy text adventure, basically holding your hand all the way to the end. It’s also a well-known text adventure, so it is possible the model has at some point had a walkthrough of it in its training. It did recognise that there might be an alternate ending, but it did not remember enough about the game to figure out how to get to it.

I put the same model on Lockout, my first text adventure, and sort of the same thing happened there: it made progress … extremely slowly and with lots of wasted motion (and api credits.) The progress seems like it was mostly stumbled into by happy accident, rather than the result of logical thinking. Eventually, I had to stop it because it kept drawing down my api credits without getting anywhere meaningful. It forgets about its main goal, and sets off on meaningless tangents, re-discovering the same sub-goals over and over.

Part of the problem is the approach chosen: by limiting it to knowing about the 10 previous commands and the output from the most recent command, it is bound to be a little confused as to what it is doing. I was hoping it would be able to compose a scratchpad of enough quality that it could make reliable progress anyway, but it appears not so. At least not with this prompt.

Let’s give it more structure and history

Here’s an idea: let’s try to give it a full transcript of the past, say, 20 turns. Let’s also in the prompt ask it explicitly about its main goal every turn. This actually seems to improve its performance on 9:05, where even a weaker model like Claude 3.5 Haiku rather quickly (and with few missteps) gets into the office.

It still has some problems, like concocting unnecessarily advanced commands (SEARCH CUBICLE MARKED "HADLEY" CAREFULLY) and failing to refer to things as they are known in the game. It gets impressively far, but then it gets stuck on a side mission that leads nowhere.

When running with Claude 3.7 Sonnet, it also gets quite far in Lockout, though in part through brute force, e.g. trying KEY IN HELP 1, KEY IN HELP 2, etc. until something interesting happens. It also keeps trying these inventive strategies (“Let’s see if I can use the empty torch as a screwdriver” followed by 50 commands that examine the torch in various ways) instead of examining the surroundings and taking clues from them. It also dropped important information out of its scratchpad, despite attempts to get it to not do that, meaning it had to retrace its steps and re-solve some of the puzzles it had already solved. I don’t think llms handle one-room puzzles very well: it needs puzzles of the kind where you move through different rooms and gather items which act as keys.

It would be interesting to try and see if other models fare better, but in the limited tests I made, they did not. I think this is because I’m trained to prompt Claude, and other models need slightly different styles of prompting. If someone has a lot of money and wants to set up a more formal benchmark, reach out to me and we can swap ideas and code!

Appendix A: Changes made to fweep

--- fweep.c     2013-07-14 02:52:30.000000000 +0200
+++ fweep_patched.c     2025-07-18 12:24:13.303644055 +0200
@@ -90,6 +90,7 @@

 U16 stream3addr[16];
 int stream3ptr=-1;
+boolean prompt_newline=0;
 boolean texting=1;
 boolean window=0;
 boolean buffering=1;
@@ -127,6 +128,12 @@

 void debugger(void);

+// These need to be forward declared.
+void char_print(U8 zscii);
+void zch_print(int z);
+void parse_options(int argc,char**argv);
+void insert_object(U16 obj,U16 dest);
+
 S16 get_random(S16 max) {
   int k,v,m;
   if(predictable_max) {
@@ -742,7 +749,7 @@
       oldscore=(S16)fetch(17);
     }
     res=system_input(&ptr);
-    if(lastdebug) return;
+    if(lastdebug) return 0;
   }
   if(logging && outlog) {
     fprintf(outlog,"%s",ptr);
@@ -840,7 +847,10 @@
   }
   printf("\n*** Save? ");
   fflush(stdout);
-  gets(filename);
+  // Read filename and strip terminating newline.
+  fgets(filename, sizeof(filename), stdin);
+  if(strlen(filename)>0 && filename[strlen(filename)-1]=='\n')
+    filename[strlen(filename)-1]='\0';
   if(*filename=='.' && !filename[1]) sprintf(filename,"%s.sav",story_name);
   cur_column=0;
   if(!*filename) {
@@ -901,7 +911,10 @@
     if(from_log) return;
     printf("\n*** Restore? ");
     fflush(stdout);
-    gets(filename);
+    // Read filename and strip terminating newline.
+    fgets(filename, sizeof(filename), stdin);
+    if(strlen(filename)>0 && filename[strlen(filename)-1]=='\n')
+      filename[strlen(filename)-1]='\0';
     if(*filename='.' && !filename[1]) sprintf(filename,"%s.sav",story_name);
     cur_column=0;
     if(!*filename) return;
@@ -1485,6 +1498,7 @@
       if(cur_prop_size==1) memory[u]=inst_args[2]; else write16(u,inst_args[2]);
       break;
     case 0xE4: // Read line of input
+      if(prompt_newline) char_print('\n');
       n=line_input();
       if(version>4 && !lastdebug) storei(n);
       break;
@@ -1672,6 +1686,7 @@
     " -g *  = Set screen geometry by rows,columns.\n"
     " -i *  = Set command log file for input.\n"
     " -n    = Enable score notification.\n"
+    " -N    = Print newline before reading input.\n"
     " -o *  = Set command log file for output.\n"
     " -p    = Assume game disc is not original.\n"
     " -q    = Convert question marks to spaces before lexical analysis.\n"
@@ -1728,6 +1743,7 @@
   }
   if(opts['s'] && !transcript) transcript=fopen(opts['s'],"wb");
   if(opts['i'] && !inlog) inlog=fopen(opts['i'],"rb");
+  if(opts['N']) prompt_newline=1;
   if(opts['o'] && !outlog) outlog=fopen(opts['o'],"wb");
   rewind(story);
   fread(memory,64,1,story);
@@ -1783,6 +1799,10 @@
       exit(1);
       break;
   }
+  if(opts['g']) {  // Must happen before version check.
+    char*p=opts['g'];
+    sscanf(p, "%d,%d", &sc_rows, &sc_columns);
+  }
   restart_address=read16(0x06)<<address_shift;
   dictionary_table=read16(0x08)<<address_shift;
   object_table=read16(0x0A)<<address_shift;
@@ -1810,10 +1830,6 @@
     if(opts['e'][0]>='0' && opts['e'][0]<='9') escape_code=strtol(opts['e'],0,0);
     else escape_code=opts['e'][0];
   }
-  if(opts['g']) {
-    char*p=opts['g'];
-    sc_rows=sc_columns=0;
-  }
   if(opts['p']) original=0;
   if(opts['u']) {
     allow_undo=0;

Appendix B: First version of the Perl script

This has some known bugs. For example, in Inform 7 games, it will try to autosave even in the game over screen. This causes bugs in execution that need to be worked around manually.

use v5.24;
use strict;
use warnings;
use IPC::Open3;
use Getopt::Long;

GetOptions
  (
   "model:s" => \my $model,
   "reset" => \my $reset
  );

# Remaining positional argument is storyfile.
my $storyfile = $ARGV[0];

sub usage {
    unless ($storyfile) {
        say "Story file missing!";
        say "";
    }
    say "Usage: loop.pl [--model <model name>] --storyfile <story file> [--reset]";
    say "";
    say "To find available models, run llm models. Default: claude-3.5-haiku.";
    say "To clear out existing saves and start from scratch, --reset.";
    die;
}

$model //= 'claude-3.5-haiku';
usage unless $storyfile;
my $restore = 1 unless $reset;

open(my $template_fd, '<', 'prompt.txt') or die $!;
my $template = do {
    local $/;
    <$template_fd>;
};
close($template_fd);

my $game = open3(my $gin, my $gout, undef, "./fweep",
                 "-N", "-g", "999,80", $storyfile);

my @command_history = ();

my ($command, $scratchpad, $result) = ('No command, game just opened', '', '');
while (<$gout>) {
    print "< $_";
    if (/Press a key/) {
        print $gin "\n";
    } elsif (/\>\s*$/) {
        $result .= $_;

        # Build up the next LLM prompt.
        my $next_prompt = $template;
        my $comma_history = join(', ', @command_history);
        $next_prompt =~ s/SUB_HISTORY/$comma_history/;
        $next_prompt =~ s/SUB_COMMAND/$command/;
        $next_prompt =~ s/SUB_RESULT/$result/;
        $next_prompt =~ s/SUB_SCRATCHPAD/$scratchpad/;

        # If a restore is requested, this is the point to perform it.
        # Otherwise, autosave the current session for later restoral.
        if ($restore &&  -f "$storyfile.sav" && -f "$storyfile.prompt") {
            # Reverse the process of saving the game state.
            print $gin "restore\n.\n";
            while (<$gout>) {
                last if /\>\s*$/;
            }

            open(my $prompt, '<', "$storyfile.prompt");
            $next_prompt = do {
                local $/;
                <$prompt>
            };
            close($prompt);

            say "Restoring previous session. Last prompt was:";
            say $next_prompt;

            $restore = 0;
        } else {
            # Save game state and prompt here, to be able to pick up later from this
            # point if canceling.
            print $gin "save\n.\n";
            while (<$gout>) {
                last if /\>\s*$/;
            }
            open(my $prompt, '>', "$storyfile.prompt");
            say $prompt $next_prompt;
            close($prompt);
        }

        # Clear all variables for the next iteration.
        ($command, $scratchpad, $result) = ('', '', '');

        # Set up the LLM interaction.
        my $llm = open3(my $lin, my $lout, undef, "llm", "-m", $model);
        print $lin $next_prompt;
        close $lin;
        while (<$lout>) {
            print "| $_";
            if (/^COMMAND: (.*)$/) {
                $command = $1;
            } else {
                $scratchpad .= $_;
            }
        }

        say "> $command";
        say $gin $command;
        # Maintain a circular buffer of command history.
        push @command_history, $command;
        shift @command_history if @command_history > 10;
    } else {
        $result .= $_;
    }
}

Appendix C: Second version of the Perl script and prompt

You are playing a parser game, a text adventure. You can interact with the world using mainly verb-noun commands, such as EXAMINE DESK, TAKE SCREWDRIVER, or SEARCH SHED. A single verb might give you more clues, like LISTEN, SMELL, or INVENTORY. To repeat a room description, use LOOK. Sometimes a preposition is needed, as in SIT ON CHAIR or LOOK BEHIND POSTER. Very rarely you need an additional noun phrase, like HIT DOOR WITH AXE. Adverbs are never used; you will never be required to do anything CAREFULLY or THOROUGHLY, for example.

You have issued a command, and what you will be shown are

  1. The scratchpad data you have decided to save.
  2. A transcript of the past few commands you have issued, and their results.

Using this, you need to determine what the next command should be.

Use your scratchpad to keep track of goals (long term and short term), inventory, important items elsewhere, as well as a brief log of all your actions, to make sure you are making progress toward your goals. Aside from the brief transcript, you will not get access to any past outcomes, so anything not included in your scratchpad will eventually be forgotten.

Do not forget your overarching, main objective. It is often useful in guiding you along the right path. Try not to interact with things you expect would exist, but focus your effort on things that are presented to you by the game.

When in doubt, prefer to examine your surroundings to gather clues, rather than trying inventive/creative solutions you happen to think of.

A verb error means you tried an action that does not exist in the game. If the game asks to disambiguate (“do you mean the X or the Y”) you can copy the exact phrasing used by the game to tell it what it is you want to interact with. A noun error means the thing you referred to does not exist in the location you are in, or is not immediately reachable.

  1. This is the transcript of the past few commands you have issued, and their results:

    SUB_TRANSCRIPT

  2. Your saved scratchpad data is:

    SUB_SCRATCHPAD

In your response, answer these questions:

  • What is my current overarching goal in this game? Unless you have VERY good reason to change this, do not do that. You tend to get lost on wild goose chases when you spuriously update the main goal.
  • What are the sub-goals that might get me there?
  • What have I tried to achieve the current sub-goal?
  • What are my considerations before executing the next command?

Then at the end, your response must contain one line of the form

command: <your command here>

and this will be used as your command.

This also has some bugs, including some of the same ones as the previous script.

use v5.24;
use strict;
use warnings;
use JSON::PP;
use IPC::Open3;
use Getopt::Long;

GetOptions
  (
   "model:s" => \my $model,
   "reset" => \my $reset
  );

# Remaining positional argument is storyfile.
my $storyfile = $ARGV[0];

sub usage {
    unless ($storyfile) {
        say "Story file missing!";
        say "";
    }
    say "Usage: loop.pl [--model <model name>] --storyfile <story file> [--reset]";
    say "";
    say "To find available models, run llm models. Default: claude-3.5-haiku.";
    say "To clear out existing saves and start from scratch, --reset.";
    die;
}

$model //= 'claude-3.5-haiku';
usage unless $storyfile;
my $restore = 1 unless $reset;

open(my $template_fd, '<', 'prompt.txt') or die $!;
my $template = do {
    local $/;
    <$template_fd>;
};
close($template_fd);

my $game = open3(my $gin, my $gout, undef, "./fweep",
                 "-N", "-g", "999,80", $storyfile);

my @transcript = ();

my ($command, $scratchpad, $result) = ('No command, game just opened', '', '');
while (<$gout>) {
    print "< $_";
    if (/Press a key/) {
        print $gin "\n";
    } elsif (/^Are you sure you want to .+?$/) {
        print $gin "yes\n";
    } elsif (/\>\s*$/) {
        $result .= $_;
        # Maintain a  circular buffer of 20 transcript entries.
        push @transcript, ">$command\n\n" . $result;
        shift @transcript if @transcript > 20;

        # Build up the next LLM prompt.
        my $next_prompt = $template;
        my $str_history = join('\n\n', @transcript);
        $next_prompt =~ s/SUB_TRANSCRIPT/$str_history/;
        $next_prompt =~ s/SUB_SCRATCHPAD/$scratchpad/;

        # If a restore is requested, this is the point to perform it.
        # Otherwise, autosave the current session for later restoral.
        if ($restore &&  -f "$storyfile.sav" && -f "$storyfile.prompt" && -f "$storyfile.tx") {
            # Reverse the process of saving the game state.
            print $gin "restore\n.\n";
            while (<$gout>) {
                last if /\>\s*$/;
            }

            open(my $transcript, '<', "$storyfile.tx");
            @transcript = do {
                local $/;
                @{decode_json(<$transcript>)};
            };
            close($transcript);

            open(my $prompt, '<', "$storyfile.prompt");
            $next_prompt = do {
                local $/;
                <$prompt>
            };
            close($prompt);

            say "Restoring previous session. Last prompt was:";
            say $next_prompt;
        } else {
            # Save game state and prompt here, to be able to pick up later from this
            # point if canceling.
            say $gin "save";
            my $result = <$gout>;
            # Only perform the save if we are in a state where that is allowed.
            # A prompt character here indicates we are not allowed to save.
            if ($result !~ /\>$/) {
                say $gin ".";
                while (<$gout>) {
                    last if /\>\s*$/;
                }

                open(my $transcript, '>', "$storyfile.tx");
                say $transcript (encode_json(\@transcript));
                close($transcript);

                open(my $prompt, '>', "$storyfile.prompt");
                say $prompt $next_prompt;
                close($prompt);
            }
        }
        $restore = 0;

        # Clear all variables for the next iteration.
        ($command, $scratchpad, $result) = ('', '', '');

        my $overloaded = 1;
        my $new_scratchpad = '';
        while ($overloaded) {
            # Give the human a chance to look at the output. Also rate limit
            # interaction for token cost reasons. Also prevent spamming an
            # overloaded API.
            sleep 1;

            # Set up the LLM interaction.
            my $llm = open3(my $lin, my $lout, undef, "llm", "-m", $model);
            print $lin $next_prompt;
            close $lin;
            while (<$lout>) {
                print "| $_";
                if (/^COMMAND: (.*)$/) {
                    $command = $1;
                } else {
                    $new_scratchpad .= $_;
                }
            }
            $overloaded = 0 unless $new_scratchpad =~ /overloaded_error/;
        }
        $scratchpad = $new_scratchpad;

        if ($command =~ /\Wyes\W/i) {
            $command = 'yes';
        } elsif ($command =~ /\Wno\W/i) {
            $command = 'no';
        }
        say "> $command";
        say $gin $command;
    } else {
        $result .= $_;
    }
}