(no commit message)

2025-10-19 14:16:34 -05:00
parent d33cef379c
commit c1847586be
13 changed files with 884 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,264 @@
-# jtbd-agent

+# JTBD Idea Validator
+
+A **Jobs to be Done (JTBD)** analysis agent powered by DSPy that validates business ideas through comprehensive framework-based evaluation.
+
+## What it does
+
+This tool performs systematic business idea validation using JTBD methodology:
+
+- **Assumption Deconstruction**: Extract and classify core business assumptions (1-3 levels)
+- **JTBD Analysis**: Generate 5 distinct job statements with Four Forces (push/pull/anxiety/inertia)
+- **Moat Analysis**: Assess competitive advantages using innovation layers
+- **Scoring & Judgment**: Evaluate ideas across 5 criteria with detailed rationales
+- **Validation Planning**: Create actionable plans for assumption testing
+
+## Quick Start
+
+```bash
+# Setup environment
+python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate
+pip install -U pip
+pip install -e .
+
+# Configure LLM (required)
+export OPENAI_API_KEY=...            # for OpenAI models
+export ANTHROPIC_API_KEY=...         # for Claude models
+
+# Run analysis on example
+python run_direct.py examples/rehab_exercise_tracking_rich.json
+
+# Or specify custom output location
+python run_direct.py examples/insurance_photo_ai.json --output custom_reports/
+```
+
+## Output Files
+
+The tool generates organized reports in timestamped directories:
+
+- **Gamma Presentations**: `gamma/presentation.md` (Gamma-ready) + `gamma/presentation.html` (preview)
+- **CSV Exports**: `csv/` - Structured data for spreadsheet analysis
+- **JSON Data**: `json/analysis_data.json` - Raw analysis data
+- **Charts**: `assets/` - Radar charts, waterfall charts, and Four Forces diagrams
+
+## Technical Architecture
+
+This implementation uses **DSPy** (Declarative Self-improving Language Programs) for structured LLM interactions through **Signatures** and **Modules**.
+
+### DSPy Signatures
+
+Signatures define input/output schemas for LLM tasks:
+
+```python
+class DeconstructSig(dspy.Signature):
+    """Extract assumptions and classify levels.
+    Return JSON list of objects: [{text, level(1..3), confidence, evidence:[]}]"""
+    idea: str = dspy.InputField()
+    hunches: List[str] = dspy.InputField()
+    assumptions_json: str = dspy.OutputField()
+
+class JobsSig(dspy.Signature):
+    """Generate 5 distinct JTBD statements with Four Forces each."""
+    context: str = dspy.InputField()
+    constraints: str = dspy.InputField()
+    jobs_json: str = dspy.OutputField()
+```
+
+### DSPy Modules
+
+Modules implement business logic with automatic prompt optimization:
+
+- **`Deconstruct`**: Extracts assumptions with confidence scoring
+- **`Jobs`**: Generates JTBD statements with Four Forces analysis
+- **`Moat`**: Applies Doblin innovation framework + strategic triggers
+- **`JudgeScore`**: Evaluates ideas across 5 standardized criteria:
+  - Underserved Opportunity
+  - Strategic Impact  
+  - Market Scale
+  - Solution Differentiability
+  - Business Model Innovation
+
+### Dual-Judge Arbitration
+
+The system uses two independent judges with tie-breaking for scoring reliability:
+
+```python
+USE_DOUBLE_JUDGE = os.getenv("JTBD_DOUBLE_JUDGE", "1") == "1"  # default ON
+
+def judge_with_arbitration(summary: str):
+    if USE_DOUBLE_JUDGE:
+        score1 = JudgeScore()(summary=summary)
+        score2 = JudgeScore()(summary=summary) 
+        return merge_scores(score1, score2)  # tie-breaker logic
+    return JudgeScore()(summary=summary)
+```
+
+## Configuration
+
+**Model Selection**: Edit `plugins/llm_dspy.py` → `configure_lm()` or set `JTBD_DSPY_MODEL`:
+
+```bash
+export JTBD_DSPY_MODEL="gpt-4o-mini"              # OpenAI
+export JTBD_DSPY_MODEL="claude-3-5-sonnet-20240620"  # Anthropic
+```
+
+**Other Options**:
+
+- `JTBD_LLM_TEMPERATURE=0.2` - Response randomness (0.0-1.0)
+- `JTBD_DOUBLE_JUDGE=1` - Enable dual-judge arbitration (default: enabled)
+
+## Input Format
+
+Ideas are defined in JSON files with the following structure:
+
+```json
+{
+  "idea_id": "urn:idea:example:001",
+  "title": "Your business idea title",
+  "hunches": [
+    "Key assumption about the problem",
+    "Belief about customer behavior",
+    "Market hypothesis"
+  ],
+  "problem_statement": "Clear description of the problem",
+  "solution_overview": "How your idea solves the problem",
+  "target_customer": {
+    "primary": "Main customer segment",
+    "secondary": "Secondary users",
+    "demographics": "Age, profession, context"
+  },
+  "value_propositions": ["Key benefit 1", "Key benefit 2"],
+  "competitive_landscape": ["Competitor 1", "Competitor 2"],
+  "revenue_streams": ["Revenue model 1", "Revenue model 2"]
+}
+```
+
+See `examples/` directory for complete examples.
+
+## Alternative Execution Methods
+
+### Direct Python Script
+
+```bash
+python run_direct.py your_idea.json
+```
+
+### FastAPI Service (Optional)
+
+Run as a service with HTTP endpoints:
+
+```bash
+uvicorn service.dspy_sidecar:app --port 8088 --reload
+```
+
+Exposes endpoints: `/deconstruct`, `/jobs`, `/moat`, `/judge`
+
+### Prefect Flow (Advanced)
+
+For complex orchestration scenarios using the Prefect workflow engine.
+
+## Advanced Features
+
+### Judge Optimization with DSPy
+
+The system supports **compiled judge models** using DSPy's GEPA optimizer (reflective prompt evolution):
+
+```bash
+# 1. Add training data to data/judge_train.jsonl
+# Format: {"summary": "...", "scorecard": {"criteria":[...], "total": 6.7}}
+
+# 2. Train the judge using GEPA (evolutionary optimizer)
+python tools/optimize_judge.py --train data/judge_train.jsonl --out artifacts/judge_compiled.dspy --budget medium
+
+# 3. Use the compiled judge (automatically loaded at runtime)
+export JTBD_JUDGE_COMPILED=artifacts/judge_compiled.dspy
+python run_direct.py your_idea.json
+```
+
+**GEPA** is an evolutionary optimizer for prompt optimization that:
+- Captures full execution traces of DSPy modules
+- Uses reflection to evolve text components (prompts/instructions)  
+- Allows textual feedback at predictor or system level
+- Outperforms reinforcement learning approaches
+
+From the actual implementation in `tools/optimize_judge.py`:
+
+```python
+from dspy.teleprompt import GEPA
+
+def non_decreasing_metric(example, pred, trace=None, pred_name=None, pred_trace=None):
+    """Returns 1 if predicted total >= gold total, else 0."""
+    try:
+        p = json.loads(pred.scorecard_json)
+        g = json.loads(example.scorecard_json)
+        return 1.0 if p.get("total",0) >= g.get("total",0) else 0.0
+    except Exception:
+        return 0.0
+
+# Budget options: "light", "medium", "heavy"
+tele = GEPA(metric=non_decreasing_metric, auto=budget)
+compiled = tele.compile(dspy.Predict(JudgeScoreSig), trainset=train)
+```
+
+The compiled judge replaces the default `dspy.Predict` with an optimized program:
+
+```python
+_compiled_judge = None
+if JUDGE_COMPILED_PATH and os.path.exists(JUDGE_COMPILED_PATH):
+    with open(JUDGE_COMPILED_PATH, "rb") as f:
+        _compiled_judge = pickle.load(f)
+
+class JudgeScore(dspy.Module):
+    def __init__(self):
+        self.p = _compiled_judge or dspy.Predict(JudgeScoreSig)  # fallback
+```
+
+### Environment Variables
+
+- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` - API keys for LLM providers
+- `JTBD_DSPY_MODEL` - Model name (default: "gpt-4o-mini")
+- `JTBD_LLM_TEMPERATURE` - Temperature setting (default: 0.2)
+- `JTBD_LLM_SEED` - Random seed for reproducibility (default: 42)
+- `JTBD_DOUBLE_JUDGE` - Enable dual-judge arbitration (default: 1)
+- `JTBD_JUDGE_COMPILED` - Path to compiled judge model
+- `OTEL_SERVICE_NAME` / `DEPLOY_ENV` - Identify the service in OTLP exports (defaults: `jtbd-dspy-sidecar`, `dev`)
+- `OTLP_ENDPOINT` / `OTLP_HEADERS` - Configure OTLP HTTP exporter endpoint and optional headers
+- `MODAIC_AGENT_ID` / `MODAIC_AGENT_REV` - Load a precompiled Modaic agent instead of the local default
+- `MODAIC_TOKEN` - Authentication token for private Modaic repositories
+- `RETRIEVER_KIND` / `RETRIEVER_NOTES` - Retriever selection (e.g., `notes`) and seed data for contextual hints
+- `API_BEARER_TOKEN` - Optional bearer token required by the FastAPI service
+- `STREAM_CHUNK_SIZE` - Chunk size for SSE streaming responses (default: 60)
+
+## Project Structure
+
+```
+├── contracts/          # Pydantic models (v1 frozen contracts)
+├── core/              # Main business logic
+│   ├── pipeline.py    # Main analysis pipeline
+│   ├── score.py       # Scoring algorithms
+│   ├── plan.py        # Validation planning
+│   └── export_*.py    # Output formatters
+├── plugins/           # External integrations
+│   ├── llm_dspy.py    # DSPy LLM interface
+│   └── charts_quickchart.py  # Chart generation
+├── service/           # FastAPI service
+├── orchestration/     # Prefect flows
+├── examples/          # Sample business ideas
+├── tools/            # Optimization utilities
+└── run_direct.py     # Main CLI entry point
+```
+
+## Dependencies
+
+- **DSPy**: Language model orchestration framework
+- **Pydantic**: Data validation and serialization
+- **FastAPI/Uvicorn**: Optional HTTP service
+- **Modaic**: Precompiled agent runtime with retriever support
+- **OpenTelemetry**: Request tracing + OTLP exporter (service observability)
+- **sse-starlette**: Server-Sent Events streaming for OpenAI-compatible responses
+- **Prefect**: Optional workflow orchestration
+- **Requests**: HTTP client for external services
+
+## Contract Stability
+
+Data contracts in `contracts/*_v1.py` are frozen. For changes, create new `v2` versions rather than modifying existing contracts to ensure backward compatibility.