llms-txt-generator/README.md

# llms.txt Generator

Generate [llms.txt](https://llmstxt.org/) documentation for GitHub repositories using DSPy.

## Quickstart

Get started in under 2 minutes:

### 1. Install Dependencies

```bash
# Install modaic
uv add modaic

# Or with pip
pip install modaic
```

### 2. Set Up Environment Variables

Create a `.env` file in your project root:

```bash
cp .env.example .env
```

Edit `.env` and add your API keys:

```bash
# Required
OPENAI_API_KEY=your-openai-api-key-here
MODAIC_TOKEN=your-modaic-token-here

# Optional but recommended
GITHUB_ACCESS_TOKEN=your-github-token-here

# Optional - OpenRouter support coming soon
OPENROUTER_API_KEY=your-openrouter-api-key-here
```

**Get your keys:**
- OpenAI: https://platform.openai.com/api-keys
- Modaic: https://modaic.dev
- GitHub: https://github.com/settings/tokens (requires `public_repo` scope)

### 3. Run the Generator

```python
from modaic import AutoAgent

# Load the precompiled agent
llms_txt_generator = AutoAgent.from_precompiled("johwang/llms-txt-generator")

# Generate llms.txt for any repository
result = llms_txt_generator("https://github.com/pydantic/pydantic")
print(result.llms_txt_content)
```

The resulting llms.txt file is written to a file llms.txt (the name can be overriden by setting the output_file parameter).

### Customizing Configuration

You can override default settings when loading the agent (be careful because DSPy requires certain reasoning models to minimum o 16000 tokens for max_tokens and a temperature of 1.0):

```python
llms_txt_generator = AutoAgent.from_precompiled(
    "johwang/llms-txt-generator",
    config_options={
        "repo_analyzer_model": "gpt-4o",           # Default: gpt-5.1-2025-11-13
        "code_analyzer_model": "gpt-4o",           # Default: gpt-5.1-2025-11-13
        "llms_txt_generator_model": "gpt-4o",      # Default: gpt-4o
        "max_tokens": 16000,                        # Default: 16000
        "temperature": 1.0,                         # Default: 1.0
    }
)
```

**Note:** Support for OpenRouter models coming soon!

## What is llms.txt?

`llms.txt` is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:

- Project overview and purpose
- Key concepts and terminology
- Architecture and structure
- Usage examples
- Important files and directories

This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive `llms.txt` documentation.

## Installation

This project uses `uv` for dependency management. If you don't have it installed:

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Then install the project dependencies:

```bash
uv sync
```

## Configuration

You'll need two API keys:

1. **OpenAI API Key** - Required for DSPy to use language models
2. **GitHub Access Token** - Optional but recommended for higher rate limits

### Option 1: Using .env file (Recommended)

Create a `.env` file in the project root:

```bash
cp .env.example .env
```

Then edit `.env` and add your API keys:

```bash
# .env
OPENAI_API_KEY=your-openai-api-key-here
GITHUB_ACCESS_TOKEN=your-github-token-here

# Optional: Set defaults
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_FILE=llms.txt
```

### Option 2: Using environment variables

```bash
export OPENAI_API_KEY="your-openai-api-key"
export GITHUB_ACCESS_TOKEN="your-github-token"  # Optional
```

### Option 3: Pass keys directly via CLI

```bash
uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"
```

## Usage

### Command Line Interface

Basic usage:

```bash
uv run python main.py https://github.com/stanfordnlp/dspy
```

With custom options:

```bash
# Specify output file
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt

# Use a different model
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o

# Pass API keys directly
uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"
```

### Python API

You can also use the generator programmatically:

```python
from src.generator import generate_llms_txt

# Generate llms.txt for a repository
content = generate_llms_txt(
    repo_url="https://github.com/stanfordnlp/dspy",
    output_file="llms.txt",
    model="gpt-4o-mini",
    openai_api_key="your-key",  # Optional if env var is set
    github_token="your-token"    # Optional
)

print(content)
```

## Project Structure

```
llmstxt-generator/
 src/
    __init__.py           # Package exports
    signatures.py         # DSPy signatures for analysis
    analyzer.py           # RepositoryAnalyzer module
    github_utils.py       # GitHub API utilities
    generator.py          # Main generation logic
 main.py                   # CLI entry point
   example.py                # Example usage script
   .env.example              # Environment variables template
   .gitignore                # Git ignore patterns
 pyproject.toml           # Project configuration
 README.md                # This file
```

## How It Works

The generator uses a multi-step DSPy pipeline:

1. **Repository Analysis** - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts

2. **Code Structure Analysis** - Identifies important directories, entry points, and development information

3. **Usage Examples Generation** - Creates relevant usage examples based on the project's purpose

4. **llms.txt Generation** - Combines all analyzed information into a structured llms.txt file following the standard format

## Example Output

Here's what a generated `llms.txt` file looks like:

```
# DSPy: Programming Language Models

## Project Overview
DSPy is a framework for programmingrather than promptinglanguage models...

## Key Concepts
- **Modules**: Building blocks for LM programs
- **Signatures**: Input/output specifications
- **Teleprompters**: Optimization algorithms
- **Predictors**: Core reasoning components

## Architecture
- `/dspy/`: Main package directory
  - `/adapters/`: Input/output format handlers
  - `/clients/`: LM client interfaces
  - `/predict/`: Core prediction modules
  - `/teleprompt/`: Optimization algorithms

## Usage Examples
[Generated examples based on repository analysis]
```

## Development

Install development dependencies:

```bash
uv sync --extra dev
```

Run tests:

```bash
uv run pytest
```

Format code:

```bash
uv run black .
uv run ruff check .
```

## Limitations

- Requires GitHub public repositories or appropriate access tokens for private repos
- API rate limits may affect large repositories
- Quality depends on the language model used
- Works best with well-documented repositories

## License

MIT

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.