johwang/llms-txt-generator

Fork 0

Files

Farouk Adeleke 2c346586a9 (no commit message)

2025-11-24 22:04:16 -05:00

4.9 KiB

Raw Blame History

llms.txt Generator

Generate llms.txt documentation for GitHub repositories using DSPy.

What is llms.txt?

llms.txt is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:

Project overview and purpose
Key concepts and terminology
Architecture and structure
Usage examples
Important files and directories

This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive llms.txt documentation.

Installation

This project uses uv for dependency management. If you don't have it installed:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

Then install the project dependencies:

uv sync

Configuration

You'll need two API keys:

OpenAI API Key - Required for DSPy to use language models
GitHub Access Token - Optional but recommended for higher rate limits

Option 1: Using .env file (Recommended)

Create a .env file in the project root:

cp .env.example .env

Then edit .env and add your API keys:

# .env
OPENAI_API_KEY=your-openai-api-key-here
GITHUB_ACCESS_TOKEN=your-github-token-here

# Optional: Set defaults
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_FILE=llms.txt

Option 2: Using environment variables

export OPENAI_API_KEY="your-openai-api-key"
export GITHUB_ACCESS_TOKEN="your-github-token"  # Optional

Option 3: Pass keys directly via CLI

uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"

Usage

Command Line Interface

Basic usage:

uv run python main.py https://github.com/stanfordnlp/dspy

With custom options:

# Specify output file
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt

# Use a different model
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o

# Pass API keys directly
uv run python main.py https://github.com/stanfordnlp/dspy \
  --openai-key "your-key" \
  --github-token "your-token"

Python API

You can also use the generator programmatically:

from src.generator import generate_llms_txt

# Generate llms.txt for a repository
content = generate_llms_txt(
    repo_url="https://github.com/stanfordnlp/dspy",
    output_file="llms.txt",
    model="gpt-4o-mini",
    openai_api_key="your-key",  # Optional if env var is set
    github_token="your-token"    # Optional
)

print(content)

Project Structure

llmstxt-generator/
 src/
    __init__.py           # Package exports
    signatures.py         # DSPy signatures for analysis
    analyzer.py           # RepositoryAnalyzer module
    github_utils.py       # GitHub API utilities
    generator.py          # Main generation logic
 main.py                   # CLI entry point
   example.py                # Example usage script
   .env.example              # Environment variables template
   .gitignore                # Git ignore patterns
 pyproject.toml           # Project configuration
 README.md                # This file

How It Works

The generator uses a multi-step DSPy pipeline:

Repository Analysis - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts
Code Structure Analysis - Identifies important directories, entry points, and development information
Usage Examples Generation - Creates relevant usage examples based on the project's purpose
llms.txt Generation - Combines all analyzed information into a structured llms.txt file following the standard format

Example Output

Here's what a generated llms.txt file looks like:

# DSPy: Programming Language Models

## Project Overview
DSPy is a framework for programmingrather than promptinglanguage models...

## Key Concepts
- **Modules**: Building blocks for LM programs
- **Signatures**: Input/output specifications
- **Teleprompters**: Optimization algorithms
- **Predictors**: Core reasoning components

## Architecture
- `/dspy/`: Main package directory
  - `/adapters/`: Input/output format handlers
  - `/clients/`: LM client interfaces
  - `/predict/`: Core prediction modules
  - `/teleprompt/`: Optimization algorithms

## Usage Examples
[Generated examples based on repository analysis]

Development

Install development dependencies:

uv sync --extra dev

Run tests:

uv run pytest

Format code:

uv run black .
uv run ruff check .

Limitations

Requires GitHub public repositories or appropriate access tokens for private repos
API rate limits may affect large repositories
Quality depends on the language model used
Works best with well-documented repositories

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

4.9 KiB Raw Blame History Unescape Escape