# llms.txt Generator Generate [llms.txt](https://llmstxt.org/) documentation for GitHub repositories using DSPy. ## Quickstart Get started in under 2 minutes: ### 1. Install Dependencies ```bash # Install modaic uv add modaic # Or with pip pip install modaic ``` ### 2. Set Up Environment Variables Create a `.env` file in your project root: ```bash cp .env.example .env ``` Edit `.env` and add your API keys: ```bash # Required OPENAI_API_KEY=your-openai-api-key-here MODAIC_TOKEN=your-modaic-token-here # Optional but recommended GITHUB_ACCESS_TOKEN=your-github-token-here # Optional - OpenRouter support coming soon OPENROUTER_API_KEY=your-openrouter-api-key-here ``` **Get your keys:** - OpenAI: https://platform.openai.com/api-keys - Modaic: https://modaic.dev - GitHub: https://github.com/settings/tokens (requires `public_repo` scope) ### 3. Run the Generator ```python from modaic import AutoAgent # Load the precompiled agent llms_txt_generator = AutoAgent.from_precompiled("johwang/llms-txt-generator") # Generate llms.txt for any repository result = llms_txt_generator("https://github.com/pydantic/pydantic") print(result.llms_txt_content) ``` ### Customizing Configuration You can override default settings when loading the agent (be careful because DSPy requires certain reasoning models to minimum o 16000 tokens for max_tokens and a temperature of 1.0): ```python llms_txt_generator = AutoAgent.from_precompiled( "johwang/llms-txt-generator", config_options={ "repo_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13 "code_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13 "llms_txt_generator_model": "gpt-4o", # Default: gpt-4o "max_tokens": 16000, # Default: 16000 "temperature": 1.0, # Default: 1.0 } ) ``` **Note:** Support for OpenRouter models coming soon! ## What is llms.txt? `llms.txt` is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes: - Project overview and purpose - Key concepts and terminology - Architecture and structure - Usage examples - Important files and directories This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive `llms.txt` documentation. ## Installation This project uses `uv` for dependency management. If you don't have it installed: ```bash # Install uv curl -LsSf https://astral.sh/uv/install.sh | sh ``` Then install the project dependencies: ```bash uv sync ``` ## Configuration You'll need two API keys: 1. **OpenAI API Key** - Required for DSPy to use language models 2. **GitHub Access Token** - Optional but recommended for higher rate limits ### Option 1: Using .env file (Recommended) Create a `.env` file in the project root: ```bash cp .env.example .env ``` Then edit `.env` and add your API keys: ```bash # .env OPENAI_API_KEY=your-openai-api-key-here GITHUB_ACCESS_TOKEN=your-github-token-here # Optional: Set defaults DEFAULT_MODEL=gpt-4o-mini DEFAULT_OUTPUT_FILE=llms.txt ``` ### Option 2: Using environment variables ```bash export OPENAI_API_KEY="your-openai-api-key" export GITHUB_ACCESS_TOKEN="your-github-token" # Optional ``` ### Option 3: Pass keys directly via CLI ```bash uv run python main.py https://github.com/stanfordnlp/dspy \ --openai-key "your-key" \ --github-token "your-token" ``` ## Usage ### Command Line Interface Basic usage: ```bash uv run python main.py https://github.com/stanfordnlp/dspy ``` With custom options: ```bash # Specify output file uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt # Use a different model uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o # Pass API keys directly uv run python main.py https://github.com/stanfordnlp/dspy \ --openai-key "your-key" \ --github-token "your-token" ``` ### Python API You can also use the generator programmatically: ```python from src.generator import generate_llms_txt # Generate llms.txt for a repository content = generate_llms_txt( repo_url="https://github.com/stanfordnlp/dspy", output_file="llms.txt", model="gpt-4o-mini", openai_api_key="your-key", # Optional if env var is set github_token="your-token" # Optional ) print(content) ``` ## Project Structure ``` llmstxt-generator/  src/   __init__.py # Package exports   signatures.py # DSPy signatures for analysis   analyzer.py # RepositoryAnalyzer module   github_utils.py # GitHub API utilities   generator.py # Main generation logic  main.py # CLI entry point example.py # Example usage script .env.example # Environment variables template .gitignore # Git ignore patterns  pyproject.toml # Project configuration  README.md # This file ``` ## How It Works The generator uses a multi-step DSPy pipeline: 1. **Repository Analysis** - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts 2. **Code Structure Analysis** - Identifies important directories, entry points, and development information 3. **Usage Examples Generation** - Creates relevant usage examples based on the project's purpose 4. **llms.txt Generation** - Combines all analyzed information into a structured llms.txt file following the standard format ## Example Output Here's what a generated `llms.txt` file looks like: ``` # DSPy: Programming Language Models ## Project Overview DSPy is a framework for programmingrather than promptinglanguage models... ## Key Concepts - **Modules**: Building blocks for LM programs - **Signatures**: Input/output specifications - **Teleprompters**: Optimization algorithms - **Predictors**: Core reasoning components ## Architecture - `/dspy/`: Main package directory - `/adapters/`: Input/output format handlers - `/clients/`: LM client interfaces - `/predict/`: Core prediction modules - `/teleprompt/`: Optimization algorithms ## Usage Examples [Generated examples based on repository analysis] ``` ## Development Install development dependencies: ```bash uv sync --extra dev ``` Run tests: ```bash uv run pytest ``` Format code: ```bash uv run black . uv run ruff check . ``` ## Limitations - Requires GitHub public repositories or appropriate access tokens for private repos - API rate limits may affect large repositories - Quality depends on the language model used - Works best with well-documented repositories ## License MIT ## Contributing Contributions are welcome! Please feel free to submit a Pull Request.