285 lines
6.8 KiB
Markdown
285 lines
6.8 KiB
Markdown
# llms.txt Generator
|
||
|
||
Generate [llms.txt](https://llmstxt.org/) documentation for GitHub repositories using DSPy.
|
||
|
||
## Quickstart
|
||
|
||
Get started in under 2 minutes:
|
||
|
||
### 1. Install Dependencies
|
||
|
||
```bash
|
||
# Install modaic
|
||
uv add modaic
|
||
|
||
# Or with pip
|
||
pip install modaic
|
||
```
|
||
|
||
### 2. Set Up Environment Variables
|
||
|
||
Create a `.env` file in your project root:
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
```
|
||
|
||
Edit `.env` and add your API keys:
|
||
|
||
```bash
|
||
# Required
|
||
OPENAI_API_KEY=your-openai-api-key-here
|
||
MODAIC_TOKEN=your-modaic-token-here
|
||
|
||
# Optional but recommended
|
||
GITHUB_ACCESS_TOKEN=your-github-token-here
|
||
|
||
# Optional - OpenRouter support coming soon
|
||
OPENROUTER_API_KEY=your-openrouter-api-key-here
|
||
```
|
||
|
||
**Get your keys:**
|
||
- OpenAI: https://platform.openai.com/api-keys
|
||
- Modaic: https://modaic.dev
|
||
- GitHub: https://github.com/settings/tokens (requires `public_repo` scope)
|
||
|
||
### 3. Run the Generator
|
||
|
||
```python
|
||
from modaic import AutoAgent
|
||
|
||
# Load the precompiled agent
|
||
llms_txt_generator = AutoAgent.from_precompiled("johwang/llms-txt-generator")
|
||
|
||
# Generate llms.txt for any repository
|
||
result = llms_txt_generator("https://github.com/pydantic/pydantic")
|
||
print(result.llms_txt_content)
|
||
```
|
||
|
||
The resulting llms.txt file is written to a file llms.txt (the name can be overriden by setting the output_file parameter).
|
||
|
||
### Customizing Configuration
|
||
|
||
You can override default settings when loading the agent (be careful because DSPy requires certain reasoning models to minimum o 16000 tokens for max_tokens and a temperature of 1.0):
|
||
|
||
```python
|
||
llms_txt_generator = AutoAgent.from_precompiled(
|
||
"johwang/llms-txt-generator",
|
||
config_options={
|
||
"repo_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13
|
||
"code_analyzer_model": "gpt-4o", # Default: gpt-5.1-2025-11-13
|
||
"llms_txt_generator_model": "gpt-4o", # Default: gpt-4o
|
||
"max_tokens": 16000, # Default: 16000
|
||
"temperature": 1.0, # Default: 1.0
|
||
}
|
||
)
|
||
```
|
||
|
||
**Note:** Support for OpenRouter models coming soon!
|
||
|
||
## What is llms.txt?
|
||
|
||
`llms.txt` is a proposed standard for providing structured, LLM-friendly documentation about a project. It typically includes:
|
||
|
||
- Project overview and purpose
|
||
- Key concepts and terminology
|
||
- Architecture and structure
|
||
- Usage examples
|
||
- Important files and directories
|
||
|
||
This tool uses DSPy (Declarative Self-improving Python) to automatically analyze GitHub repositories and generate comprehensive `llms.txt` documentation.
|
||
|
||
## Installation
|
||
|
||
This project uses `uv` for dependency management. If you don't have it installed:
|
||
|
||
```bash
|
||
# Install uv
|
||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||
```
|
||
|
||
Then install the project dependencies:
|
||
|
||
```bash
|
||
uv sync
|
||
```
|
||
|
||
## Configuration
|
||
|
||
You'll need two API keys:
|
||
|
||
1. **OpenAI API Key** - Required for DSPy to use language models
|
||
2. **GitHub Access Token** - Optional but recommended for higher rate limits
|
||
|
||
### Option 1: Using .env file (Recommended)
|
||
|
||
Create a `.env` file in the project root:
|
||
|
||
```bash
|
||
cp .env.example .env
|
||
```
|
||
|
||
Then edit `.env` and add your API keys:
|
||
|
||
```bash
|
||
# .env
|
||
OPENAI_API_KEY=your-openai-api-key-here
|
||
GITHUB_ACCESS_TOKEN=your-github-token-here
|
||
|
||
# Optional: Set defaults
|
||
DEFAULT_MODEL=gpt-4o-mini
|
||
DEFAULT_OUTPUT_FILE=llms.txt
|
||
```
|
||
|
||
### Option 2: Using environment variables
|
||
|
||
```bash
|
||
export OPENAI_API_KEY="your-openai-api-key"
|
||
export GITHUB_ACCESS_TOKEN="your-github-token" # Optional
|
||
```
|
||
|
||
### Option 3: Pass keys directly via CLI
|
||
|
||
```bash
|
||
uv run python main.py https://github.com/stanfordnlp/dspy \
|
||
--openai-key "your-key" \
|
||
--github-token "your-token"
|
||
```
|
||
|
||
## Usage
|
||
|
||
### Command Line Interface
|
||
|
||
Basic usage:
|
||
|
||
```bash
|
||
uv run python main.py https://github.com/stanfordnlp/dspy
|
||
```
|
||
|
||
With custom options:
|
||
|
||
```bash
|
||
# Specify output file
|
||
uv run python main.py https://github.com/stanfordnlp/dspy -o output/dspy-llms.txt
|
||
|
||
# Use a different model
|
||
uv run python main.py https://github.com/stanfordnlp/dspy -m gpt-4o
|
||
|
||
# Pass API keys directly
|
||
uv run python main.py https://github.com/stanfordnlp/dspy \
|
||
--openai-key "your-key" \
|
||
--github-token "your-token"
|
||
```
|
||
|
||
### Python API
|
||
|
||
You can also use the generator programmatically:
|
||
|
||
```python
|
||
from src.generator import generate_llms_txt
|
||
|
||
# Generate llms.txt for a repository
|
||
content = generate_llms_txt(
|
||
repo_url="https://github.com/stanfordnlp/dspy",
|
||
output_file="llms.txt",
|
||
model="gpt-4o-mini",
|
||
openai_api_key="your-key", # Optional if env var is set
|
||
github_token="your-token" # Optional
|
||
)
|
||
|
||
print(content)
|
||
```
|
||
|
||
## Project Structure
|
||
|
||
```
|
||
llmstxt-generator/
|
||
|