llms-txt-generator/README.md

# llms.txt Generator

Generate [llms.txt](https://llmstxt.org/) documentation for GitHub repositories using DSPy.

## What is llms.txt?

`llms.txt` is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:

- Project overview and purpose
- Key concepts and terminology
- Architecture and structure
- Usage examples
- Important files and directories

This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive `llms.txt` documentation.

## Installation

This project uses `uv` for dependency management. If you don't have it installed:

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Then install the project dependencies:

```bash
uv sync
```

## Configuration

You'll need two API keys:

1. **OpenAI API Key** - Required for DSPy to use language models
2. **GitHub Access Token** - Optional but recommended for higher rate limits

### Option 1: Using .env file (Recommended)

Create a `.env` file in the project root:

```bash
cp .env.example .env
```

Then edit `.env` and add your API keys:

```bash
# .env
OPENAI_API_KEY=your-openai-api-key-here
GITHUB_ACCESS_TOKEN=your-github-token-here

# Optional: Set defaults
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_FILE=llms.txt
```

### Option 2: Using environment variables

```bash
export OPENAI_API_KEY="your-openai-api-key"
export GITHUB_ACCESS_TOKEN="your-github-token"  # Optional
```

### Option 3: Pass keys directly via CLI

```bash
uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"
```

## Usage

### Command Line Interface

Basic usage:

```bash
uv run python main.py https://github.com/stanfordnlp/dspy
```

With custom options:

```bash
# Specify output file
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt

# Use a different model
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o

# Pass API keys directly
uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"
```

### Python API

You can also use the generator programmatically:

```python
from src.generator import generate_llms_txt

# Generate llms.txt for a repository
content = generate_llms_txt(
    repo_url="https://github.com/stanfordnlp/dspy",
    output_file="llms.txt",
    model="gpt-4o-mini",
    openai_api_key="your-key",  # Optional if env var is set
    github_token="your-token"    # Optional
)

print(content)
```

## Project Structure

```
llmstxt-generator/
 src/
    __init__.py           # Package exports
    signatures.py         # DSPy signatures for analysis
    analyzer.py           # RepositoryAnalyzer module
    github_utils.py       # GitHub API utilities
    generator.py          # Main generation logic
 main.py                   # CLI entry point
   example.py                # Example usage script
   .env.example              # Environment variables template
   .gitignore                # Git ignore patterns
 pyproject.toml           # Project configuration
 README.md                # This file
```

## How It Works

The generator uses a multi-step DSPy pipeline:

1. **Repository Analysis** - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts

2. **Code Structure Analysis** - Identifies important directories, entry points, and development information

3. **Usage Examples Generation** - Creates relevant usage examples based on the project's purpose

4. **llms.txt Generation** - Combines all analyzed information into a structured llms.txt file following the standard format

## Example Output

Here's what a generated `llms.txt` file looks like:

```
# DSPy: Programming Language Models

## Project Overview
DSPy is a framework for programmingrather than promptinglanguage models...

## Key Concepts
- **Modules**: Building blocks for LM programs
- **Signatures**: Input/output specifications
- **Teleprompters**: Optimization algorithms
- **Predictors**: Core reasoning components

## Architecture
- `/dspy/`: Main package directory
  - `/adapters/`: Input/output format handlers
  - `/clients/`: LM client interfaces
  - `/predict/`: Core prediction modules
  - `/teleprompt/`: Optimization algorithms

## Usage Examples
[Generated examples based on repository analysis]
```

## Development

Install development dependencies:

```bash
uv sync --extra dev
```

Run tests:

```bash
uv run pytest
```

Format code:

```bash
uv run black .
uv run ruff check .
```

## Limitations

- Requires GitHub public repositories or appropriate access tokens for private repos
- API rate limits may affect large repositories
- Quality depends on the language model used
- Works best with well-documented repositories

## License

MIT

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.