4.9 KiB
llms.txt Generator
Generate llms.txt documentation for GitHub repositories using DSPy.
What is llms.txt?
llms.txt is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:
- Project overview and purpose
- Key concepts and terminology
- Architecture and structure
- Usage examples
- Important files and directories
This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive llms.txt documentation.
Installation
This project uses uv for dependency management. If you don't have it installed:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
Then install the project dependencies:
uv sync
Configuration
You'll need two API keys:
- OpenAI API Key - Required for DSPy to use language models
- GitHub Access Token - Optional but recommended for higher rate limits
Option 1: Using .env file (Recommended)
Create a .env file in the project root:
cp .env.example .env
Then edit .env and add your API keys:
# .env
OPENAI_API_KEY=your-openai-api-key-here
GITHUB_ACCESS_TOKEN=your-github-token-here
# Optional: Set defaults
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_FILE=llms.txt
Option 2: Using environment variables
export OPENAI_API_KEY="your-openai-api-key"
export GITHUB_ACCESS_TOKEN="your-github-token" # Optional
Option 3: Pass keys directly via CLI
uv run python main.py https://github.com/stanfordnlp/dspy \
--openai-key "your-key" \
--github-token "your-token"
Usage
Command Line Interface
Basic usage:
uv run python main.py https://github.com/stanfordnlp/dspy
With custom options:
# Specify output file
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt
# Use a different model
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o
# Pass API keys directly
uv run python main.py https://github.com/stanfordnlp/dspy \
--openai-key "your-key" \
--github-token "your-token"
Python API
You can also use the generator programmatically:
from src.generator import generate_llms_txt
# Generate llms.txt for a repository
content = generate_llms_txt(
repo_url="https://github.com/stanfordnlp/dspy",
output_file="llms.txt",
model="gpt-4o-mini",
openai_api_key="your-key", # Optional if env var is set
github_token="your-token" # Optional
)
print(content)
Project Structure
llmstxt-generator/
src/
__init__.py # Package exports
signatures.py # DSPy signatures for analysis
analyzer.py # RepositoryAnalyzer module
github_utils.py # GitHub API utilities
generator.py # Main generation logic
main.py # CLI entry point
example.py # Example usage script
.env.example # Environment variables template
.gitignore # Git ignore patterns
pyproject.toml # Project configuration
README.md # This file
How It Works
The generator uses a multi-step DSPy pipeline:
-
Repository Analysis - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts
-
Code Structure Analysis - Identifies important directories, entry points, and development information
-
Usage Examples Generation - Creates relevant usage examples based on the project's purpose
-
llms.txt Generation - Combines all analyzed information into a structured llms.txt file following the standard format
Example Output
Here's what a generated llms.txt file looks like:
# DSPy: Programming Language Models
## Project Overview
DSPy is a framework for programmingrather than promptinglanguage models...
## Key Concepts
- **Modules**: Building blocks for LM programs
- **Signatures**: Input/output specifications
- **Teleprompters**: Optimization algorithms
- **Predictors**: Core reasoning components
## Architecture
- `/dspy/`: Main package directory
- `/adapters/`: Input/output format handlers
- `/clients/`: LM client interfaces
- `/predict/`: Core prediction modules
- `/teleprompt/`: Optimization algorithms
## Usage Examples
[Generated examples based on repository analysis]
Development
Install development dependencies:
uv sync --extra dev
Run tests:
uv run pytest
Format code:
uv run black .
uv run ruff check .
Limitations
- Requires GitHub public repositories or appropriate access tokens for private repos
- API rate limits may affect large repositories
- Quality depends on the language model used
- Works best with well-documented repositories
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.