(no commit message)

2026-01-27 22:37:59 -08:00
parent bb191efd1d
commit 6a57bde8f2
10 changed files with 1132 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,158 @@
-# minecraft-friend-rlm
+# Minecraft MCP Friend (baseline “AI friend”)
+
+This folder contains a working baseline agent that:
+
+- Spawns the Minecraft MCP server (`@fundamentallabs/minecraft-mcp`) over **stdio**
+- Joins your Minecraft world as a bot
+- Polls `readChat` and decides what to do using **DSPy RLM** + **Groq via LiteLLM**
+- Acts by calling MCP tools like `sendChat`, `mineResource`, `openInventory`, `dropItem`, etc.
+
+If you just want the quick start: scroll to **Run the agent**.
+
+---
+
+## Requirements
+
+### What you need installed
+
+- **Java Minecraft** (the official launcher is fine)
+- **Node.js** (so `npx` works)
+- **uv** (for Python + dependencies)
+
+### About Minecraft worlds and ports (important)
+
+This agent joins a world via Mineflayer through the MCP server. Two common gotchas:
+
+- **Open to LAN chooses a port**: even if you type `25565`, the real port is the one Minecraft prints in chat as “Local game hosted on port ####”.
+- **Bots are clients**: you don’t “reserve a bot port.” The bot connects to your world’s host/port like any other client.
+
+---
+
+## Security note (read this)
+
+- **Never commit API keys.** This project expects your Groq key in `.env` (loaded at runtime).
+- If you ever pasted a key into chat/screenshots, treat it as compromised and rotate it.
+
+---
+
+## Setup (uv + Python 3.12)
+
+From the repo root:
+
+```bash
+# DSPy RLM + MCP SDK need a modern Python.
+uv python install 3.12
+uv venv --python 3.12
+source .venv/bin/activate
+
+uv pip install -r requirements.txt
+
+cp .env.example .env
+```
+
+Now edit `.env` and set at least:
+
+- `GROQ_API_KEY=...`
+- (optional) `MAIN_MODEL` and `SUB_MODEL`
+- (optional) `BOT_USERNAME`
+
+---
+
+## Run the agent (join your world)
+
+### Step 1: start a world
+
+Option A (easy): **Single-player → Open to LAN**
+
+1. Launch Minecraft
+2. Open your single-player world
+3. Choose **Open to LAN**
+4. In chat, copy the port from the message:
+   - “Local game hosted on port **#####**”
+
+Option B (stable): run a dedicated server (recommended if you want a consistent port)
+
+### Step 2: run the agent
+
+In the same terminal (with the venv activated):
+
+```bash
+python agent.py --host 127.0.0.1 --mc-port <PORT_FROM_MINECRAFT_CHAT>
+```
+
+Notes:
+
+- Use `--host 127.0.0.1` if the bot runs on the same machine as Minecraft.
+- If the bot is on another machine, use your LAN IP (e.g. `192.168.x.y`) instead.
+
+### Step 3: talk to it in Minecraft chat
+
+Try:
+
+- “hi can you get some wood?”
+- “can you collect a stack of logs for me?”
+
+---
+
+## Validate connectivity (without joining)
+
+This confirms the “MCP → list_tools → DSPy Tool conversion” pipeline:
+
+```bash
+python agent.py --validate-tools
+```
+
+---
+
+## Troubleshooting
+
+### 1) `ECONNREFUSED` (connection refused)
+
+This almost always means **you’re using the wrong port** or your world is no longer open.
+
+Checklist:
+
+- Re-open your world to LAN and re-check the port printed in chat.
+- Verify the port is listening:
+
+```bash
+lsof -nP -iTCP:<PORT> -sTCP:LISTEN
+nc -vz 127.0.0.1 <PORT>
+```
+
+### 2) `Unsupported protocol version 'XYZ' (attempted to use 'ABC' data)`
+
+This is a **Minecraft version mismatch** between your client/server and the Mineflayer stack behind the MCP server.
+
+Fastest fix:
+
+- Run a Minecraft version that matches what the bot stack expects (the error’s “attempted” number is the clue).
+
+Alternative:
+
+- Update the MCP server dependency stack (harder; can move the mismatch around).
+
+### 3) “It keeps saying it delivered items, but I didn’t get them”
+
+Minecraft item transfer is tricky. In this baseline we treat the reliable mechanic as **drop items near the player** so they can be picked up. If you’re testing “give” behaviors, prefer “drop-to-transfer” semantics.
+
+---
+
+## What’s in this folder
+
+- `agent.py`: main loop; joins world; polls chat; calls DSPy RLM
+- `config.py`: `.env` settings (models, poll rate, etc.)
+- `host_interpreter.py`: host-based RLM interpreter (avoids some sandbox/runtime issues)
+- `memory_fs.py`: local “memory filesystem” (stored under `.memory/`)
+- `mcp_client.py`: thin MCP wrapper utilities (useful for debugging)
+- `uv.lock`: Python deps (pinned to `dspy[mcp]==3.1.2`)
+
+---
+
+## References
+
+- DSPy MCP tutorial: `https://dspy.ai/tutorials/mcp/?h=mcp`
+- DSPy language models: `https://dspy.ai/learn/programming/language_models/`
+- LiteLLM Groq provider: `https://docs.litellm.ai/docs/providers/groq`
+- MCP filesystem server (shape inspiration): `https://www.npmjs.com/package/@modelcontextprotocol/server-filesystem`

--- a/agent.py
+++ b/agent.py
@@ -0,0 +1,481 @@
+#!/usr/bin/env python3
+
+import argparse
+import asyncio
+import inspect
+import os
+import time
+from dataclasses import dataclass
+from typing import Any, Callable
+
+import dspy
+from litellm.exceptions import RateLimitError
+from mcp import ClientSession, StdioServerParameters
+from mcp.client.stdio import stdio_client
+from rich.console import Console
+from rich.panel import Panel
+from modaic import PrecompiledConfig, PrecompiledProgram
+from config import SETTINGS
+from host_interpreter import UnsafeHostInterpreter
+from memory_fs import (
+    mem_append_file,
+    mem_create_directory,
+    mem_directory_tree,
+    mem_get_file_info,
+    mem_list_directory,
+    mem_move_file,
+    mem_read_text_file,
+    mem_search_files,
+    mem_write_file,
+)
+
+console = Console()
+
+
+class MinecraftFriendConfig(PrecompiledConfig):
+    max_iterations: int = 12
+    max_llm_calls: int = 18
+    tools: dict[str, Callable[..., Any]] = {}
+    lm: str = SETTINGS.main_model
+    sub_lm: str = SETTINGS.sub_model
+    verbose: bool = True
+
+
+class MinecraftFriendProgram(PrecompiledProgram):
+    config: MinecraftFriendConfig
+
+    def __init__(self, config: MinecraftFriendConfig, **kwargs):
+        super().__init__(config, **kwargs)
+        config = self.config
+        self.rlm = dspy.RLM(
+            signature=MinecraftFriendRLM,
+            max_iterations=config.max_iterations,
+            max_llm_calls=config.max_llm_calls,
+            tools=config.tools,
+            sub_lm=dspy.LM(config.sub_lm),
+            verbose=config.verbose,
+            interpreter=UnsafeHostInterpreter(),
+        )
+        self.rlm.set_lm(dspy.LM(config.lm))
+
+    def forward(self, chat, memory):
+        return self.rlm(chat=chat, memory=memory)
+
+
+@dataclass
+class AgentState:
+    last_chat_fingerprint: str = ""
+    last_spoke_at: float = 0.0
+    last_decide_at: float = 0.0
+
+
+def extract_chat_lines(summary: str) -> list[str]:
+    lines = [line.rstrip() for line in summary.splitlines()]
+    if "==================" not in lines:
+        return []
+    idx = lines.index("==================")
+    return [line for line in lines[idx + 1 :] if line.strip()]
+
+
+def drop_own_messages(lines: list[str], bot_username: str) -> list[str]:
+    # Server duplicates bot speech in both "[System] <Bot> ..." and "<Bot>: ..."
+    needle = f"<{bot_username}>"
+    return [line for line in lines if needle not in line]
+
+
+def fingerprint(lines: list[str]) -> str:
+    return "\n".join(lines[-30:])
+
+
+def _extract_retry_after_seconds(err: Exception) -> float | None:
+    # Groq/LiteLLM error strings often include: "Please try again in 16.0575s."
+    s = str(err)
+    marker = "try again in "
+    if marker not in s:
+        return None
+    tail = s.split(marker, 1)[1]
+    num = ""
+    for ch in tail:
+        if ch.isdigit() or ch == ".":
+            num += ch
+            continue
+        break
+    try:
+        return float(num) if num else None
+    except Exception:
+        return None
+
+
+def _calltool_text(call_tool_result) -> str:
+    # Compatible with MCP SDK TextContent blocks.
+    out: list[str] = []
+    for block in getattr(call_tool_result, "content", []) or []:
+        if getattr(block, "type", None) == "text":
+            out.append(getattr(block, "text", ""))
+    return "\n".join([t for t in out if t]).strip()
+
+
+class MinecraftFriendRLM(dspy.Signature):
+    """
+    You are a friendly AI companion playing Minecraft with Paul.
+
+    Your ONLY way to talk is by calling MCP tools (especially `sendChat`).
+    Use tools like `readChat`, `mineResource`, `lookAround`, etc. when useful.
+
+    The `response` output is only a short internal note about what you did.
+    """
+
+    chat = dspy.InputField(desc="Recent Minecraft chat lines (most recent last).")
+    memory = dspy.InputField(desc="Short memory about Paul and the current goal.")
+    response = dspy.OutputField(desc="Short internal note (not sent to chat).")
+
+
+def _tool_default_from_schema(schema: dict[str, Any]) -> Any:
+    # JSON schema defaults are best-effort; they may be missing.
+    return schema.get("default", inspect._empty)
+
+
+def _make_sync_mcp_tool(
+    *,
+    tool: dspy.Tool,
+    loop: asyncio.AbstractEventLoop,
+    on_call: Callable[[], None] | None = None,
+) -> Callable[..., Any]:
+    """
+    Wrap an async MCP-backed `dspy.Tool` into a sync callable that can safely be used
+    inside RLM code execution, even while the main asyncio loop is running.
+    """
+
+    arg_order = list((tool.args or {}).keys())
+
+    async def _acall(**kwargs: Any) -> Any:
+        return await tool.acall(**kwargs)
+
+    def _sync(*args: Any, **kwargs: Any) -> Any:
+        # Support common calling styles:
+        # - tool(message="hi")
+        # - tool("hi", delay=0)  -> maps positional args in schema order
+        # - tool({"message": "hi"}) -> dict-only positional
+        if args:
+            if len(args) == 1 and isinstance(args[0], dict) and not kwargs:
+                kwargs = dict(args[0])
+            else:
+                for idx, value in enumerate(args):
+                    if idx >= len(arg_order):
+                        raise TypeError(
+                            f"{tool.name} got too many positional arguments"
+                        )
+                    kwargs.setdefault(arg_order[idx], value)
+        fut = asyncio.run_coroutine_threadsafe(_acall(**kwargs), loop)
+        result = fut.result()
+        if on_call is not None:
+            on_call()
+        return result
+
+    _sync.__name__ = tool.name
+    _sync.__doc__ = tool.desc or ""
+
+    # Give the LLM a nice signature in the RLM instructions.
+    params: list[inspect.Parameter] = []
+    for arg_name, schema in (tool.args or {}).items():
+        default = _tool_default_from_schema(schema)
+        params.append(
+            inspect.Parameter(
+                arg_name,
+                kind=inspect.Parameter.POSITIONAL_OR_KEYWORD,
+                default=default,
+            )
+        )
+    _sync.__signature__ = inspect.Signature(parameters=params)  # type: ignore[attr-defined]
+
+    return _sync
+
+
+def _parse_open_inventory(text: str) -> dict[str, int]:
+    """
+    Parse openInventory() observation into a {item_name: count} dict.
+
+    Example input:
+      "You just finished examining your inventory and it contains: 2 oak log, 2 birch log, 1 oak sapling."
+    """
+    if "contains:" not in text:
+        return {}
+    tail = text.split("contains:", 1)[1].strip().rstrip(".")
+    if not tail:
+        return {}
+    items: dict[str, int] = {}
+    parts = [p.strip() for p in tail.split(",") if p.strip()]
+    for p in parts:
+        # "2 oak log" -> (2, "oak log")
+        tokens = p.split()
+        if not tokens:
+            continue
+        try:
+            n = int(tokens[0])
+        except ValueError:
+            continue
+        name = " ".join(tokens[1:]).strip().lower()
+        if not name:
+            continue
+        items[name.replace(" ", "_")] = n
+    return items
+
+
+async def main_async() -> None:
+    p = argparse.ArgumentParser()
+    p.add_argument("--host", default=SETTINGS.mcp_minecraft_host)
+    p.add_argument("--mc-port", type=int, default=SETTINGS.mcp_minecraft_port)
+    p.add_argument("--bot-username", default=SETTINGS.bot_username)
+    p.add_argument(
+        "--validate-tools",
+        action="store_true",
+        help="Connect to the MCP server, list tools, convert them to dspy.Tool, then exit.",
+    )
+    args = p.parse_args()
+
+    if not SETTINGS.groq_api_key:
+        raise RuntimeError(
+            "GROQ_API_KEY is not set. Copy .env.example to .env and fill it in."
+        )
+    os.environ.setdefault("GROQ_API_KEY", SETTINGS.groq_api_key)
+
+    # DSPy MCP tutorial requires dspy[mcp] and converts MCP tools via dspy.Tool.from_mcp_tool.
+    # Important: DSPy's default RLM sandbox (Deno/Pyodide) cannot currently call tools in some
+    # runtimes due to missing WASM stack switching. We use a host interpreter + sync tool wrappers.
+    console.print(Panel(SETTINGS.main_model, title="DSPy model", border_style="cyan"))
+
+    server_params = StdioServerParameters(
+        command="npx",
+        args=[
+            "-y",
+            "--",
+            "@fundamentallabs/minecraft-mcp",
+            "-h",
+            args.host,
+            "-p",
+            str(args.mc_port),
+        ],
+        env=None,
+    )
+
+    state = AgentState()
+
+    async with stdio_client(server_params) as (read, write):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+
+            # Gather MCP tools and convert to DSPy tools (official DSPy MCP tutorial pattern).
+            tools = await session.list_tools()
+            dspy_tools = [dspy.Tool.from_mcp_tool(session, t) for t in tools.tools]
+            # Add local "memory filesystem" tools (DSPy Tool wrappers).
+            #
+            # This follows DSPy's tool guidance: wrap functions in dspy.Tool and pass them via tools=...
+            # https://dspy.ai/learn/programming/tools/
+            memory_tools = [
+                dspy.Tool(mem_list_directory),
+                dspy.Tool(mem_read_text_file),
+                dspy.Tool(mem_write_file),
+                dspy.Tool(mem_append_file),
+                dspy.Tool(mem_create_directory),
+                dspy.Tool(mem_move_file),
+                dspy.Tool(mem_search_files),
+                dspy.Tool(mem_get_file_info),
+                dspy.Tool(mem_directory_tree),
+            ]
+
+            all_tools = [*dspy_tools, *memory_tools]
+
+            if args.validate_tools:
+                console.print(
+                    Panel(
+                        "\n".join([t.name for t in all_tools]),
+                        title=f"OK: ready {len(dspy_tools)} MCP tools + {len(memory_tools)} memory tools",
+                        border_style="green",
+                    )
+                )
+                return
+
+            # Build sync wrappers for MCP tools so the agent can call them inside RLM execution.
+            loop = asyncio.get_running_loop()
+            sync_mcp_tools: dict[str, Callable[..., Any]] = {}
+            for t in dspy_tools:
+                if not t.name.isidentifier():
+                    continue
+                sync_mcp_tools[t.name] = _make_sync_mcp_tool(
+                    tool=t,
+                    loop=loop,
+                    on_call=(lambda: setattr(state, "last_spoke_at", time.time()))
+                    if t.name == "sendChat"
+                    else None,
+                )
+
+            # Memory tools are already sync python callables.
+            sync_memory_tools: dict[str, Callable[..., Any]] = {}
+            for t in memory_tools:
+                if not t.name.isidentifier():
+                    continue
+                sync_memory_tools[t.name] = t
+
+            # High-level "agent guardrails" tools to reduce LLM confusion / regressions.
+            def inv_counts() -> dict[str, int]:
+                """Return parsed inventory counts as a JSON-like dict."""
+                text = sync_mcp_tools["openInventory"]()
+                return _parse_open_inventory(str(text))
+
+            def have(item_name: str) -> int:
+                """Return how many of an item the bot currently has (best-effort)."""
+                counts = inv_counts()
+                return int(counts.get(item_name.lower(), 0))
+
+            def deliver_drop(user_name: str, item_name: str, count: int) -> str:
+                """Drop items near a player so they can pick them up (preferred transfer)."""
+                if have(item_name) < count:
+                    return f"[ERROR] Not enough {item_name}. Have {have(item_name)}."
+                return str(sync_mcp_tools["dropItem"](item_name, count, user_name))
+
+            def gather_to(
+                item_name: str, target_count: int, batch: int = 8, max_rounds: int = 12
+            ) -> str:
+                """Iteratively mine until we have at least target_count of item_name (timeboxed)."""
+                # Normalize common user phrasing.
+                norm = item_name.strip().lower().replace(" ", "_")
+                if norm in {"wood", "logs", "log"}:
+                    norm = "oak_log"
+                for _ in range(max_rounds):
+                    cur = have(norm)
+                    if cur >= target_count:
+                        return f"OK: have {cur} {norm} (>= {target_count})"
+                    try:
+                        # Mine in small batches to reduce timeouts.
+                        sync_mcp_tools["mineResource"](
+                            norm, min(batch, max(target_count - cur, 1))
+                        )
+                    except Exception as e:
+                        return f"[ERROR] mineResource failed: {e}"
+                return f"[WARN] Could not reach target. Have {have(norm)} {norm}."
+
+            helper_tools: dict[str, Callable[..., Any]] = {
+                "inv_counts": inv_counts,
+                "have": have,
+                "gather_to": gather_to,
+                "deliver_drop": deliver_drop,
+            }
+
+            # Remove misleading tools that caused regressions in the logs.
+            # (We can re-add later if needed.)
+            sync_mcp_tools.pop("giveItemToSomeone", None)
+
+            rlm_tools: dict[str, Callable[..., Any]] = {
+                **sync_mcp_tools,
+                **sync_memory_tools,
+                **helper_tools,
+            }
+
+            rlm = MinecraftFriendProgram(MinecraftFriendConfig(tools=rlm_tools))
+
+            # Join once up-front so the bot is in-world.
+            join_res = await session.call_tool(
+                "joinGame",
+                arguments={
+                    "username": args.bot_username,
+                    "host": args.host,
+                    "port": args.mc_port,
+                },
+            )
+            console.print(
+                Panel(_calltool_text(join_res), title="joinGame", border_style="green")
+            )
+
+            # Greet once.
+            await session.call_tool(
+                "sendChat",
+                arguments={
+                    "message": f"Hey! I’m {SETTINGS.persona_name}. I’m here—want to explore or build something?"
+                },
+            )
+            state.last_spoke_at = time.time()
+
+            while True:
+                read_res = await session.call_tool(
+                    "readChat",
+                    arguments={"count": 40, "filterType": "all", "timeLimit": 120},
+                )
+                summary = _calltool_text(read_res)
+                lines = drop_own_messages(
+                    extract_chat_lines(summary), args.bot_username
+                )
+
+                fp = fingerprint(lines)
+                new_chat = fp != state.last_chat_fingerprint
+                state.last_chat_fingerprint = fp
+
+                now = time.time()
+                should_initiate = (
+                    now - state.last_spoke_at
+                ) > SETTINGS.idle_chitchat_seconds
+                can_decide = (now - state.last_decide_at) > max(
+                    SETTINGS.poll_seconds, 4.0
+                )
+
+                if (new_chat or should_initiate) and can_decide:
+                    state.last_decide_at = now
+                    chat_context = "\n".join(lines[-30:])
+                    memory = (
+                        "You have a persistent memory filesystem under `.memory/`.\n"
+                        "Use these tools to store/recall information:\n"
+                        "- mem_list_directory(path)\n"
+                        "- mem_read_text_file(path, head=None, tail=None)\n"
+                        "- mem_write_file(path, content)\n"
+                        "- mem_append_file(path, content)\n"
+                        "- mem_search_files(path='', pattern='*', contains=None, limit=50)\n"
+                        "- mem_directory_tree(path='', max_depth=6)\n"
+                        "- mem_get_file_info(path)\n"
+                        "\n"
+                        "Suggested files:\n"
+                        "- profile/paul.md (stable preferences)\n"
+                        "- world/status.md (current world + tasks)\n"
+                        "- notes/log.md (timestamped scratchpad)\n"
+                        "\n"
+                        "Gameplay facts (IMPORTANT):\n"
+                        "- To give items to Paul, prefer `deliver_drop(user_name, item_name, count)`.\n"
+                        "- `giveItemToSomeone` is unreliable here; do NOT use it.\n"
+                        "- To gather a stack, use `gather_to('oak_log', 64)` then `deliver_drop('pmlockett', 'oak_log', 64)`.\n"
+                    )
+
+                    try:
+                        # Run RLM in a worker thread so sync tool calls can safely
+                        # schedule async MCP operations onto this running event loop.
+                        result = await asyncio.to_thread(
+                            rlm,
+                            chat=chat_context,
+                            memory=memory,
+                        )
+                        resp = getattr(result, "response", None)
+                        if resp:
+                            console.print(
+                                Panel(
+                                    str(resp),
+                                    title="RLM response",
+                                    border_style="green",
+                                )
+                            )
+                    except RateLimitError as e:
+                        wait_s = _extract_retry_after_seconds(e) or 10.0
+                        console.print(
+                            Panel(
+                                f"Rate limited. Sleeping {wait_s:.1f}s.\n\n{e}",
+                                title="Rate limit",
+                                border_style="yellow",
+                            )
+                        )
+                        await asyncio.sleep(wait_s)
+
+                await asyncio.sleep(SETTINGS.poll_seconds)
+
+
+def main() -> None:
+    asyncio.run(main_async())
+
+
+if __name__ == "__main__":
+    main()
--- a/auto_classes.json
+++ b/auto_classes.json
@@ -0,0 +1,4 @@
+{
+  "AutoConfig": "agent.MinecraftFriendConfig",
+  "AutoProgram": "agent.MinecraftFriendProgram"
+}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,9 @@
+{
+  "model": null,
+  "max_iterations": 12,
+  "max_llm_calls": 18,
+  "tools": {},
+  "lm": "groq/openai/gpt-oss-120b",
+  "sub_lm": "groq/openai/gpt-oss-20b",
+  "verbose": true
+}
--- a/config.py
+++ b/config.py
@@ -0,0 +1,28 @@
+"""Shared configuration loaded from .env for the Minecraft friend agent."""
+
+from __future__ import annotations
+
+import os
+from dataclasses import dataclass
+
+from dotenv import load_dotenv
+
+load_dotenv()
+
+
+@dataclass(frozen=True)
+class Settings:
+    groq_api_key: str | None = os.getenv("GROQ_API_KEY")
+    main_model: str = os.getenv("MAIN_MODEL", "groq/openai/gpt-oss-120b")
+    sub_model: str = os.getenv("SUB_MODEL", "groq/openai/gpt-oss-20b")
+
+    persona_name: str = os.getenv("PERSONA_NAME", "Spruce")
+    poll_seconds: float = float(os.getenv("POLL_SECONDS", "2"))
+    idle_chitchat_seconds: float = float(os.getenv("IDLE_CHITCHAT_SECONDS", "90"))
+
+    mcp_minecraft_host: str = os.getenv("MCP_MINECRAFT_HOST", "127.0.0.1")
+    mcp_minecraft_port: int = int(os.getenv("MCP_MINECRAFT_PORT", "25565"))
+    bot_username: str = os.getenv("BOT_USERNAME", "Bot1")
+
+
+SETTINGS = Settings()
--- a/host_interpreter.py
+++ b/host_interpreter.py
@@ -0,0 +1,145 @@
+from __future__ import annotations
+
+import builtins
+import io
+import sys
+import traceback
+from dataclasses import dataclass, field
+from types import MappingProxyType
+from typing import Any, Callable
+
+from dspy.primitives.code_interpreter import CodeInterpreterError, FinalOutput
+
+
+@dataclass
+class UnsafeHostInterpreter:
+    """
+    A minimal CodeInterpreter implementation that executes code in the host Python process.
+
+    Why this exists:
+    - DSPy's default RLM interpreter (Deno/Pyodide) currently relies on pyodide.ffi.run_sync
+      to bridge async tool calls, which fails on runtimes without WASM stack switching support.
+
+    Tradeoff:
+    - This is NOT a security sandbox. It will execute arbitrary Python code produced by the LLM.
+      Use only in trusted/local environments.
+    """
+
+    tools: dict[str, Callable[..., str]] = field(default_factory=dict)
+    # If RLM injects this attribute, we can map SUBMIT() to output fields.
+    output_fields: list[dict] | None = None
+    _started: bool = False
+    _globals: dict[str, Any] = field(default_factory=dict)
+
+    def start(self) -> None:
+        if self._started:
+            return
+        # Start with a constrained global namespace. This is not a real sandbox.
+        self._globals = {
+            "__name__": "__rlm_host__",
+            "__builtins__": MappingProxyType(
+                {
+                    # Allow common harmless builtins needed for analysis.
+                    "print": builtins.print,
+                    "len": builtins.len,
+                    "type": builtins.type,
+                    "range": builtins.range,
+                    "reversed": builtins.reversed,
+                    "min": builtins.min,
+                    "max": builtins.max,
+                    "sum": builtins.sum,
+                    "sorted": builtins.sorted,
+                    "enumerate": builtins.enumerate,
+                    "str": builtins.str,
+                    "int": builtins.int,
+                    "float": builtins.float,
+                    "bool": builtins.bool,
+                    "dict": builtins.dict,
+                    "list": builtins.list,
+                    "set": builtins.set,
+                    "tuple": builtins.tuple,
+                    "abs": builtins.abs,
+                    "all": builtins.all,
+                    "any": builtins.any,
+                    "zip": builtins.zip,
+                }
+            ),
+        }
+        # Provide a few commonly-used stdlib modules without enabling arbitrary imports.
+        # (The host interpreter is already unsafe, but keeping imports closed reduces footguns.)
+        import json as _json
+        import math as _math
+        import re as _re
+
+        self._globals.update({"re": _re, "json": _json, "math": _math})
+        self._started = True
+
+    def execute(self, code: str, variables: dict[str, Any] | None = None) -> Any:
+        if not self._started:
+            self.start()
+
+        # Inject variables and tools into the exec namespace.
+        if variables:
+            self._globals.update(variables)
+        self._globals.update(self.tools)
+
+        # Provide SUBMIT for early termination.
+        class _SubmitSignal(BaseException):
+            def __init__(self, payload: dict[str, Any]):
+                super().__init__()
+                self.payload = payload
+
+        def SUBMIT(*args: Any, **kwargs: Any) -> None:  # noqa: N802 - matches DSPy contract
+            # RLM expects interpreter.execute() to RETURN a FinalOutput instance,
+            # not raise it as an exception. We raise a private control-flow signal
+            # and convert it into FinalOutput below.
+            if not kwargs:
+                # Support SUBMIT("...") for single-output signatures.
+                if (
+                    len(args) == 1
+                    and self.output_fields
+                    and len(self.output_fields) == 1
+                ):
+                    name = self.output_fields[0]["name"]
+                    kwargs = {name: args[0]}
+                # Support SUBMIT() if user assigned output variables in globals.
+                elif len(args) == 0 and self.output_fields:
+                    payload: dict[str, Any] = {}
+                    for f in self.output_fields:
+                        fname = f["name"]
+                        if fname in self._globals:
+                            payload[fname] = self._globals[fname]
+                    if payload:
+                        kwargs = payload
+                else:
+                    raise _SubmitSignal(
+                        {
+                            "error": "SUBMIT called without outputs; provide kwargs or set output variables."
+                        }
+                    )
+
+            raise _SubmitSignal(kwargs)
+
+        self._globals["SUBMIT"] = SUBMIT
+
+        buf = io.StringIO()
+        old_stdout, old_stderr = sys.stdout, sys.stderr
+        sys.stdout, sys.stderr = buf, buf
+        try:
+            exec(code, self._globals, self._globals)
+        except _SubmitSignal as sig:
+            return FinalOutput(sig.payload)
+        except SyntaxError:
+            raise
+        except Exception as e:
+            tb = traceback.format_exc()
+            raise CodeInterpreterError(f"{e}\n\n{tb}")
+        finally:
+            sys.stdout, sys.stderr = old_stdout, old_stderr
+
+        out = buf.getvalue()
+        return out.strip() if out.strip() else None
+
+    def shutdown(self) -> None:
+        self._globals.clear()
+        self._started = False
--- a/memory_fs.py
+++ b/memory_fs.py
@@ -0,0 +1,214 @@
+from __future__ import annotations
+
+import json
+import os
+from dataclasses import dataclass
+from fnmatch import fnmatch
+from pathlib import Path
+from typing import Any
+
+
+def _default_root() -> Path:
+    # Keep memory local to this project folder.
+    return Path(__file__).resolve().parent / ".memory"
+
+
+@dataclass(frozen=True)
+class MemoryFS:
+    """
+    A tiny, sandboxed "memory filesystem" for agents.
+
+    This intentionally mirrors the *shape* of common filesystem MCP servers:
+    list/read/write/move/search/info/tree — but is implemented locally as Python tools.
+    """
+
+    root: Path = _default_root()
+
+    def _ensure_root(self) -> None:
+        self.root.mkdir(parents=True, exist_ok=True)
+
+    def _resolve(self, rel_path: str) -> Path:
+        """
+        Resolve a user-provided path against the memory root, preventing traversal.
+
+        The path is interpreted as relative to `root`. Leading slashes are ignored.
+        """
+        self._ensure_root()
+        rel = rel_path.lstrip("/").strip()
+        target = (self.root / rel).resolve()
+        root = self.root.resolve()
+        if target == root:
+            return target
+        if root not in target.parents:
+            raise ValueError("Path escapes memory root; refusing.")
+        return target
+
+
+_MEM = MemoryFS()
+
+
+def mem_list_directory(path: str = "") -> str:
+    """List directory contents under memory root. Returns lines like: [DIR] foo, [FILE] bar.txt."""
+    p = _MEM._resolve(path)
+    if not p.exists():
+        return f"Not found: {path}"
+    if not p.is_dir():
+        return f"Not a directory: {path}"
+
+    entries = []
+    for child in sorted(p.iterdir(), key=lambda c: (not c.is_dir(), c.name.lower())):
+        tag = "[DIR]" if child.is_dir() else "[FILE]"
+        entries.append(f"{tag} {child.name}")
+    return "\n".join(entries) if entries else "(empty)"
+
+
+def mem_create_directory(path: str) -> str:
+    """Create a directory under memory root (parents created)."""
+    p = _MEM._resolve(path)
+    p.mkdir(parents=True, exist_ok=True)
+    return f"OK: created {path}"
+
+
+def mem_read_text_file(
+    path: str, head: int | None = None, tail: int | None = None
+) -> str:
+    """Read a UTF-8 text file under memory root. Optionally return first `head` or last `tail` lines."""
+    if head is not None and tail is not None:
+        return "Error: cannot specify both head and tail."
+
+    p = _MEM._resolve(path)
+    if not p.exists():
+        return f"Not found: {path}"
+    if not p.is_file():
+        return f"Not a file: {path}"
+
+    text = p.read_text(encoding="utf-8", errors="replace")
+    lines = text.splitlines()
+    if head is not None:
+        return "\n".join(lines[: max(head, 0)])
+    if tail is not None:
+        return "\n".join(lines[-max(tail, 0) :])
+    return text
+
+
+def mem_write_file(path: str, content: str) -> str:
+    """Write (overwrite) a UTF-8 text file under memory root."""
+    p = _MEM._resolve(path)
+    p.parent.mkdir(parents=True, exist_ok=True)
+    p.write_text(content, encoding="utf-8")
+    return f"OK: wrote {path} ({len(content)} chars)"
+
+
+def mem_append_file(path: str, content: str) -> str:
+    """Append UTF-8 text to a file under memory root (creates if missing)."""
+    p = _MEM._resolve(path)
+    p.parent.mkdir(parents=True, exist_ok=True)
+    with p.open("a", encoding="utf-8") as f:
+        f.write(content)
+    return f"OK: appended {path} ({len(content)} chars)"
+
+
+def mem_move_file(source: str, destination: str) -> str:
+    """Move/rename a file or directory under memory root. Fails if destination exists."""
+    src = _MEM._resolve(source)
+    dst = _MEM._resolve(destination)
+    if not src.exists():
+        return f"Not found: {source}"
+    if dst.exists():
+        return f"Error: destination exists: {destination}"
+    dst.parent.mkdir(parents=True, exist_ok=True)
+    os.replace(src, dst)
+    return f"OK: moved {source} -> {destination}"
+
+
+def mem_get_file_info(path: str) -> str:
+    """Return basic metadata (json) for a path under memory root."""
+    p = _MEM._resolve(path)
+    if not p.exists():
+        return json.dumps({"path": path, "exists": False})
+    st = p.stat()
+    info: dict[str, Any] = {
+        "path": path,
+        "exists": True,
+        "type": "directory" if p.is_dir() else "file",
+        "size": st.st_size,
+        "mtime": st.st_mtime,
+    }
+    return json.dumps(info, indent=2, ensure_ascii=False)
+
+
+def mem_search_files(
+    path: str = "", pattern: str = "*", contains: str | None = None, limit: int = 50
+) -> str:
+    """
+    Recursively search for files under memory root.
+
+    - `pattern`: glob-style match on relative path (e.g. "*.md", "profile/*")
+    - `contains`: if set, only include text files that contain this substring
+    """
+    base = _MEM._resolve(path)
+    if not base.exists():
+        return f"Not found: {path}"
+    if not base.is_dir():
+        return f"Not a directory: {path}"
+
+    results: list[str] = []
+    root = _MEM.root.resolve()
+    for p in base.rglob("*"):
+        if len(results) >= max(limit, 0):
+            break
+        if not p.is_file():
+            continue
+        rel = str(p.resolve().relative_to(root)).replace(os.sep, "/")
+        if not fnmatch(rel, pattern):
+            continue
+        if contains is not None:
+            try:
+                text = p.read_text(encoding="utf-8", errors="ignore")
+            except Exception:
+                continue
+            if contains not in text:
+                continue
+        results.append(rel)
+    return "\n".join(results) if results else "(no matches)"
+
+
+def mem_directory_tree(path: str = "", max_depth: int = 6) -> str:
+    """Return a JSON directory tree rooted at `path`."""
+    base = _MEM._resolve(path)
+    if not base.exists():
+        return json.dumps({"error": "not_found", "path": path})
+    if not base.is_dir():
+        return json.dumps({"error": "not_directory", "path": path})
+
+    root = _MEM.root.resolve()
+
+    def node(p: Path, depth: int) -> dict[str, Any]:
+        rel = (
+            str(p.resolve().relative_to(root)).replace(os.sep, "/")
+            if p != _MEM.root
+            else ""
+        )
+        if p.is_dir():
+            if depth >= max_depth:
+                return {
+                    "name": p.name or "/",
+                    "path": rel,
+                    "type": "directory",
+                    "children": ["…"],
+                }
+            children = [
+                node(c, depth + 1)
+                for c in sorted(
+                    p.iterdir(), key=lambda c: (not c.is_dir(), c.name.lower())
+                )
+            ]
+            return {
+                "name": p.name or "/",
+                "path": rel,
+                "type": "directory",
+                "children": children,
+            }
+        return {"name": p.name, "path": rel, "type": "file"}
+
+    return json.dumps(node(base, 0), indent=2, ensure_ascii=False)
--- a/program.json
+++ b/program.json
@@ -0,0 +1,83 @@
+{
+  "rlm.generate_action": {
+    "traces": [],
+    "train": [],
+    "demos": [],
+    "signature": {
+      "instructions": "You are a friendly AI companion playing Minecraft with Paul.\n\nYour ONLY way to talk is by calling MCP tools (especially `sendChat`).\nUse tools like `readChat`, `mineResource`, `lookAround`, etc. when useful.\n\nThe `response` output is only a short internal note about what you did.\n\nYou are tasked with producing the following outputs given the inputs `chat`, `memory`:\n- {response}\n\nYou have access to a Python REPL environment. Write Python code and it will be executed. You will see the output, then write more code based on what you learned. This is an iterative process.\n\nAvailable:\n- Variables: `chat`, `memory` (your input data)\n- `llm_query(prompt)` - query a sub-LLM (~500K char capacity) for semantic analysis\n- `llm_query_batched(prompts)` - query multiple prompts concurrently (much faster for multiple queries)\n- `print()` - ALWAYS print to see results\n- `SUBMIT(response)` - submit final output when done\n- Standard libraries: re, json, collections, math, etc.\n\nIMPORTANT: This is ITERATIVE. Each code block you write will execute, you'll see the output, then you decide what to do next. Do NOT try to solve everything in one step.\n\n1. EXPLORE FIRST - Look at your data before processing it. Print samples, check types/lengths, understand the structure.\n2. ITERATE - Write small code snippets, observe outputs, then decide next steps. State persists between iterations.\n3. VERIFY BEFORE SUBMITTING - If results seem wrong (zeros, empty, unexpected), reconsider your approach.\n4. USE llm_query FOR SEMANTICS - String matching finds WHERE things are; llm_query understands WHAT things mean.\n5. MINIMIZE RETYPING (INPUTS & OUTPUTS) - When values are long, precise, or error-prone (IDs, numbers, code, quotes), re-access them via variables and parse/compute in code instead of retyping. Use small, targeted prints to sanity-check, but avoid manual copying when variables can carry the exact value.\n6. SUBMIT ONLY AFTER SEEING OUTPUTS - SUBMIT ends the current run immediately. If you need to inspect printed output, run it in one step, review the result, then call SUBMIT in a later step.\n\nYou have max 18 sub-LLM calls. When done, call SUBMIT() with your output.",
+      "fields": [
+        {
+          "prefix": "Variables Info:",
+          "description": "Metadata about the variables available in the REPL"
+        },
+        {
+          "prefix": "Repl History:",
+          "description": "Previous REPL code executions and their outputs"
+        },
+        {
+          "prefix": "Iteration:",
+          "description": "Current iteration number (1-indexed) out of max_iterations"
+        },
+        {
+          "prefix": "Reasoning:",
+          "description": "Think step-by-step: what do you know? What remains? Plan your next action."
+        },
+        {
+          "prefix": "Code:",
+          "description": "Python code to execute."
+        }
+      ]
+    },
+    "lm": {
+      "model": "groq/openai/gpt-oss-120b",
+      "model_type": "chat",
+      "cache": true,
+      "num_retries": 3,
+      "finetuning_model": null,
+      "launch_kwargs": {},
+      "train_kwargs": {},
+      "temperature": null,
+      "max_tokens": null
+    }
+  },
+  "rlm.extract": {
+    "traces": [],
+    "train": [],
+    "demos": [],
+    "signature": {
+      "instructions": "The trajectory was generated with the following objective: \nYou are a friendly AI companion playing Minecraft with Paul.\n\nYour ONLY way to talk is by calling MCP tools (especially `sendChat`).\nUse tools like `readChat`, `mineResource`, `lookAround`, etc. when useful.\n\nThe `response` output is only a short internal note about what you did.\n\n\nBased on the REPL trajectory, extract the final outputs now.\n\n            Review your trajectory to see what information you gathered and what values you computed, then provide the final outputs.",
+      "fields": [
+        {
+          "prefix": "Variables Info:",
+          "description": "Metadata about the variables available in the REPL"
+        },
+        {
+          "prefix": "Repl History:",
+          "description": "Your REPL interactions so far"
+        },
+        {
+          "prefix": "Response:",
+          "description": "Short internal note (not sent to chat)."
+        }
+      ]
+    },
+    "lm": {
+      "model": "groq/openai/gpt-oss-120b",
+      "model_type": "chat",
+      "cache": true,
+      "num_retries": 3,
+      "finetuning_model": null,
+      "launch_kwargs": {},
+      "train_kwargs": {},
+      "temperature": null,
+      "max_tokens": null
+    }
+  },
+  "metadata": {
+    "dependency_versions": {
+      "python": "3.12",
+      "dspy": "3.1.2",
+      "cloudpickle": "3.1"
+    }
+  }
+}
--- a/push.py
+++ b/push.py
@@ -0,0 +1,4 @@
+from agent import MinecraftFriendProgram, MinecraftFriendConfig
+
+program = MinecraftFriendProgram(MinecraftFriendConfig())
+program.push_to_hub("plockettpl/minecraft-friend-rlm", with_code=True)
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,7 @@
+[project]
+name = "minecraft-friend-rlm"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = ["anyio>=4.12.1", "dspy[mcp]==3.1.2", "litellm>=1.80.0", "mcp>=1.26.0", "modaic>=0.12.7", "python-dotenv>=1.2.1", "rich>=14.3.1"]