2025-11-24 22:47:24 -05:00
2025-11-24 22:13:32 -05:00
2025-11-24 22:04:16 -05:00
2025-11-24 22:47:24 -05:00
2025-11-24 22:13:32 -05:00
2025-11-24 22:04:16 -05:00
2025-11-24 22:47:24 -05:00

llms.txt Generator

Generate llms.txt documentation for GitHub repositories using DSPy.

Quickstart

Get started in under 2 minutes:

1. Install Dependencies

# Install modaic
uv add modaic

# Or with pip
pip install modaic

2. Set Up Environment Variables

Create a .env file in your project root:

cp .env.example .env

Edit .env and add your API keys:

# Required
OPENAI_API_KEY=your-openai-api-key-here
MODAIC_TOKEN=your-modaic-token-here

# Optional but recommended
GITHUB_ACCESS_TOKEN=your-github-token-here

# Optional - OpenRouter support coming soon
OPENROUTER_API_KEY=your-openrouter-api-key-here

Get your keys:

3. Run the Generator

from modaic import AutoAgent

# Load the precompiled agent
llms_txt_generator = AutoAgent.from_precompiled("johwang/llms-txt-generator")

# Generate llms.txt for any repository
result = llms_txt_generator("https://github.com/pydantic/pydantic")
print(result.llms_txt_content)

Customizing Configuration

You can override default settings when loading the agent (be careful because DSPy requires certain reasoning models to minimum o 16000 tokens for max_tokens and a temperature of 1.0):

llms_txt_generator = AutoAgent.from_precompiled(
    "johwang/llms-txt-generator",
    config_options={
        "repo_analyzer_model": "gpt-4o",           # Default: gpt-5.1-2025-11-13
        "code_analyzer_model": "gpt-4o",           # Default: gpt-5.1-2025-11-13
        "llms_txt_generator_model": "gpt-4o",      # Default: gpt-4o
        "max_tokens": 16000,                        # Default: 16000
        "temperature": 1.0,                         # Default: 1.0
    }
)

Note: Support for OpenRouter models coming soon!

What is llms.txt?

llms.txt is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:

  • Project overview and purpose
  • Key concepts and terminology
  • Architecture and structure
  • Usage examples
  • Important files and directories

This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive llms.txt documentation.

Installation

This project uses uv for dependency management. If you don't have it installed:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

Then install the project dependencies:

uv sync

Configuration

You'll need two API keys:

  1. OpenAI API Key - Required for DSPy to use language models
  2. GitHub Access Token - Optional but recommended for higher rate limits

Create a .env file in the project root:

cp .env.example .env

Then edit .env and add your API keys:

# .env
OPENAI_API_KEY=your-openai-api-key-here
GITHUB_ACCESS_TOKEN=your-github-token-here

# Optional: Set defaults
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_FILE=llms.txt

Option 2: Using environment variables

export OPENAI_API_KEY="your-openai-api-key"
export GITHUB_ACCESS_TOKEN="your-github-token"  # Optional

Option 3: Pass keys directly via CLI

uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"

Usage

Command Line Interface

Basic usage:

uv run python main.py https://github.com/stanfordnlp/dspy

With custom options:

# Specify output file
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt

# Use a different model
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o

# Pass API keys directly
uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"

Python API

You can also use the generator programmatically:

from src.generator import generate_llms_txt

# Generate llms.txt for a repository
content = generate_llms_txt(
    repo_url="https://github.com/stanfordnlp/dspy",
    output_file="llms.txt",
    model="gpt-4o-mini",
    openai_api_key="your-key",  # Optional if env var is set
    github_token="your-token"    # Optional
)

print(content)

Project Structure

llmstxt-generator/
 src/
    __init__.py           # Package exports
    signatures.py         # DSPy signatures for analysis
    analyzer.py           # RepositoryAnalyzer module
    github_utils.py       # GitHub API utilities
    generator.py          # Main generation logic
 main.py                   # CLI entry point
   example.py                # Example usage script
   .env.example              # Environment variables template
   .gitignore                # Git ignore patterns
 pyproject.toml           # Project configuration
 README.md                # This file

How It Works

The generator uses a multi-step DSPy pipeline:

  1. Repository Analysis - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts

  2. Code Structure Analysis - Identifies important directories, entry points, and development information

  3. Usage Examples Generation - Creates relevant usage examples based on the project's purpose

  4. llms.txt Generation - Combines all analyzed information into a structured llms.txt file following the standard format

Example Output

Here's what a generated llms.txt file looks like:

# DSPy: Programming Language Models

## Project Overview
DSPy is a framework for programmingrather than promptinglanguage models...

## Key Concepts
- **Modules**: Building blocks for LM programs
- **Signatures**: Input/output specifications
- **Teleprompters**: Optimization algorithms
- **Predictors**: Core reasoning components

## Architecture
- `/dspy/`: Main package directory
  - `/adapters/`: Input/output format handlers
  - `/clients/`: LM client interfaces
  - `/predict/`: Core prediction modules
  - `/teleprompt/`: Optimization algorithms

## Usage Examples
[Generated examples based on repository analysis]

Development

Install development dependencies:

uv sync --extra dev

Run tests:

uv run pytest

Format code:

uv run black .
uv run ruff check .

Limitations

  • Requires GitHub public repositories or appropriate access tokens for private repos
  • API rate limits may affect large repositories
  • Quality depends on the language model used
  • Works best with well-documented repositories

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Description
No description provided
Readme 45 KiB
Languages
Python 100%