6.8 KiB
llms.txt Generator
Generate llms.txt documentation for GitHub repositories using DSPy.
Quickstart
Get started in under 2 minutes:
1. Install Dependencies
# Install modaic
uv add modaic
# Or with pip
pip install modaic
2. Set Up Environment Variables
Create a .env file in your project root:
cp .env.example .env
Edit .env and add your API keys:
# Required
OPENAI_API_KEY=your-openai-api-key-here
MODAIC_TOKEN=your-modaic-token-here
# Optional but recommended
GITHUB_ACCESS_TOKEN=your-github-token-here
# Optional - OpenRouter support coming soon
OPENROUTER_API_KEY=your-openrouter-api-key-here
Get your keys:
- OpenAI: https://platform.openai.com/api-keys
- Modaic: https://modaic.dev
- GitHub: https://github.com/settings/tokens (requires
public_reposcope)
3. Run the Generator
from modaic import AutoAgent
# Load the precompiled agent
llms_txt_generator = AutoAgent.from_precompiled("johwang/llms-txt-generator")
# Generate llms.txt for any repository
result = llms_txt_generator("https://github.com/pydantic/pydantic")
print(result.llms_txt_content)
The resulting llms.txt file is written to a file llms.txt (the name can be overriden by setting the output_file parameter).
Customizing Configuration
You can override default settings when loading the agent (be careful because DSPy requires certain reasoning models to minimum o 16000 tokens for max_tokens and a temperature of 1.0):
llms_txt_generator = AutoAgent.from_precompiled(
"johwang/llms-txt-generator",
config_options={
"repo_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13
"code_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13
"llms_txt_generator_model": "gpt-4o", # Default: gpt-4o
"max_tokens": 16000, # Default: 16000
"temperature": 1.0, # Default: 1.0
}
)
Note: Support for OpenRouter models coming soon!
What is llms.txt?
llms.txt is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:
- Project overview and purpose
- Key concepts and terminology
- Architecture and structure
- Usage examples
- Important files and directories
This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive llms.txt documentation.
Installation
This project uses uv for dependency management. If you don't have it installed:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
Then install the project dependencies:
uv sync
Configuration
You'll need two API keys:
- OpenAI API Key - Required for DSPy to use language models
- GitHub Access Token - Optional but recommended for higher rate limits
Option 1: Using .env file (Recommended)
Create a .env file in the project root:
cp .env.example .env
Then edit .env and add your API keys:
# .env
OPENAI_API_KEY=your-openai-api-key-here
GITHUB_ACCESS_TOKEN=your-github-token-here
# Optional: Set defaults
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_FILE=llms.txt
Option 2: Using environment variables
export OPENAI_API_KEY="your-openai-api-key"
export GITHUB_ACCESS_TOKEN="your-github-token" # Optional
Option 3: Pass keys directly via CLI
uv run python main.py https://github.com/stanfordnlp/dspy \
--openai-key "your-key" \
--github-token "your-token"
Usage
Command Line Interface
Basic usage:
uv run python main.py https://github.com/stanfordnlp/dspy
With custom options:
# Specify output file
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt
# Use a different model
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o
# Pass API keys directly
uv run python main.py https://github.com/stanfordnlp/dspy \
--openai-key "your-key" \
--github-token "your-token"
Python API
You can also use the generator programmatically:
from src.generator import generate_llms_txt
# Generate llms.txt for a repository
content = generate_llms_txt(
repo_url="https://github.com/stanfordnlp/dspy",
output_file="llms.txt",
model="gpt-4o-mini",
openai_api_key="your-key", # Optional if env var is set
github_token="your-token" # Optional
)
print(content)
Project Structure
llmstxt-generator/
src/
__init__.py # Package exports
signatures.py # DSPy signatures for analysis
analyzer.py # RepositoryAnalyzer module
github_utils.py # GitHub API utilities
generator.py # Main generation logic
main.py # CLI entry point
example.py # Example usage script
.env.example # Environment variables template
.gitignore # Git ignore patterns
pyproject.toml # Project configuration
README.md # This file
How It Works
The generator uses a multi-step DSPy pipeline:
-
Repository Analysis - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts
-
Code Structure Analysis - Identifies important directories, entry points, and development information
-
Usage Examples Generation - Creates relevant usage examples based on the project's purpose
-
llms.txt Generation - Combines all analyzed information into a structured llms.txt file following the standard format
Example Output
Here's what a generated llms.txt file looks like:
# DSPy: Programming Language Models
## Project Overview
DSPy is a framework for programmingrather than promptinglanguage models...
## Key Concepts
- **Modules**: Building blocks for LM programs
- **Signatures**: Input/output specifications
- **Teleprompters**: Optimization algorithms
- **Predictors**: Core reasoning components
## Architecture
- `/dspy/`: Main package directory
- `/adapters/`: Input/output format handlers
- `/clients/`: LM client interfaces
- `/predict/`: Core prediction modules
- `/teleprompt/`: Optimization algorithms
## Usage Examples
[Generated examples based on repository analysis]
Development
Install development dependencies:
uv sync --extra dev
Run tests:
uv run pytest
Format code:
uv run black .
uv run ruff check .
Limitations
- Requires GitHub public repositories or appropriate access tokens for private repos
- API rate limits may affect large repositories
- Quality depends on the language model used
- Works best with well-documented repositories
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.