Files
llms-txt-generator/README.md

285 lines
6.8 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# llms.txt Generator
Generate [llms.txt](https://llmstxt.org/) documentation for GitHub repositories using DSPy.
## Quickstart
Get started in under 2 minutes:
### 1. Install Dependencies
```bash
# Install modaic
uv add modaic
# Or with pip
pip install modaic
```
### 2. Set Up Environment Variables
Create a `.env` file in your project root:
```bash
cp .env.example .env
```
Edit `.env` and add your API keys:
```bash
# Required
OPENAI_API_KEY=your-openai-api-key-here
MODAIC_TOKEN=your-modaic-token-here
# Optional but recommended
GITHUB_ACCESS_TOKEN=your-github-token-here
# Optional - OpenRouter support coming soon
OPENROUTER_API_KEY=your-openrouter-api-key-here
```
**Get your keys:**
- OpenAI: https://platform.openai.com/api-keys
- Modaic: https://modaic.dev
- GitHub: https://github.com/settings/tokens (requires `public_repo` scope)
### 3. Run the Generator
```python
from modaic import AutoAgent
# Load the precompiled agent
llms_txt_generator = AutoAgent.from_precompiled("johwang/llms-txt-generator")
# Generate llms.txt for any repository
result = llms_txt_generator("https://github.com/pydantic/pydantic")
print(result.llms_txt_content)
```
The resulting llms.txt file is written to a file llms.txt (the name can be overriden by setting the output_file parameter).
### Customizing Configuration
You can override default settings when loading the agent (be careful because DSPy requires certain reasoning models to minimum o 16000 tokens for max_tokens and a temperature of 1.0):
```python
llms_txt_generator = AutoAgent.from_precompiled(
"johwang/llms-txt-generator",
config_options={
"repo_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13
"code_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13
"llms_txt_generator_model": "gpt-4o", # Default: gpt-4o
"max_tokens": 16000, # Default: 16000
"temperature": 1.0, # Default: 1.0
}
)
```
**Note:** Support for OpenRouter models coming soon!
## What is llms.txt?
`llms.txt` is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:
- Project overview and purpose
- Key concepts and terminology
- Architecture and structure
- Usage examples
- Important files and directories
This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive `llms.txt` documentation.
## Installation
This project uses `uv` for dependency management. If you don't have it installed:
```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
```
Then install the project dependencies:
```bash
uv sync
```
## Configuration
You'll need two API keys:
1. **OpenAI API Key** - Required for DSPy to use language models
2. **GitHub Access Token** - Optional but recommended for higher rate limits
### Option 1: Using .env file (Recommended)
Create a `.env` file in the project root:
```bash
cp .env.example .env
```
Then edit `.env` and add your API keys:
```bash
# .env
OPENAI_API_KEY=your-openai-api-key-here
GITHUB_ACCESS_TOKEN=your-github-token-here
# Optional: Set defaults
DEFAULT_MODEL=gpt-4o-mini
DEFAULT_OUTPUT_FILE=llms.txt
```
### Option 2: Using environment variables
```bash
export OPENAI_API_KEY="your-openai-api-key"
export GITHUB_ACCESS_TOKEN="your-github-token" # Optional
```
### Option 3: Pass keys directly via CLI
```bash
uv run python main.py https://github.com/stanfordnlp/dspy \
--openai-key "your-key" \
--github-token "your-token"
```
## Usage
### Command Line Interface
Basic usage:
```bash
uv run python main.py https://github.com/stanfordnlp/dspy
```
With custom options:
```bash
# Specify output file
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt
# Use a different model
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o
# Pass API keys directly
uv run python main.py https://github.com/stanfordnlp/dspy \
--openai-key "your-key" \
--github-token "your-token"
```
### Python API
You can also use the generator programmatically:
```python
from src.generator import generate_llms_txt
# Generate llms.txt for a repository
content = generate_llms_txt(
repo_url="https://github.com/stanfordnlp/dspy",
output_file="llms.txt",
model="gpt-4o-mini",
openai_api_key="your-key", # Optional if env var is set
github_token="your-token" # Optional
)
print(content)
```
## Project Structure
```
llmstxt-generator/
 src/
  __init__.py # Package exports
  signatures.py # DSPy signatures for analysis
  analyzer.py # RepositoryAnalyzer module
  github_utils.py # GitHub API utilities
  generator.py # Main generation logic
 main.py # CLI entry point
example.py # Example usage script
.env.example # Environment variables template
.gitignore # Git ignore patterns
 pyproject.toml # Project configuration
 README.md # This file
```
## How It Works
The generator uses a multi-step DSPy pipeline:
1. **Repository Analysis** - Analyzes the repository structure, README, and configuration files to understand the project's purpose and key concepts
2. **Code Structure Analysis** - Identifies important directories, entry points, and development information
3. **Usage Examples Generation** - Creates relevant usage examples based on the project's purpose
4. **llms.txt Generation** - Combines all analyzed information into a structured llms.txt file following the standard format
## Example Output
Here's what a generated `llms.txt` file looks like:
```
# DSPy: Programming Language Models
## Project Overview
DSPy is a framework for programmingrather than promptinglanguage models...
## Key Concepts
- **Modules**: Building blocks for LM programs
- **Signatures**: Input/output specifications
- **Teleprompters**: Optimization algorithms
- **Predictors**: Core reasoning components
## Architecture
- `/dspy/`: Main package directory
- `/adapters/`: Input/output format handlers
- `/clients/`: LM client interfaces
- `/predict/`: Core prediction modules
- `/teleprompt/`: Optimization algorithms
## Usage Examples
[Generated examples based on repository analysis]
```
## Development
Install development dependencies:
```bash
uv sync --extra dev
```
Run tests:
```bash
uv run pytest
```
Format code:
```bash
uv run black .
uv run ruff check .
```
## Limitations
- Requires GitHub public repositories or appropriate access tokens for private repos
- API rate limits may affect large repositories
- Quality depends on the language model used
- Works best with well-documented repositories
## License
MIT
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.