clinical-impact-judge-gepa/agent.json

{
  "assess.predict": {
    "traces": [],
    "train": [],
    "demos": [],
    "signature": {
      "instructions": "Assess the clinical impact of transcription errors in medical conversations by comparing ground truth and transcribed conversations. Classify errors into THREE severity levels based on their potential to affect patient care:\n\n**Classification Levels:**\n- **Class 0 (No Impact)**: Cosmetic differences that don't affect clinical meaning\n- **Class 1 (Moderate Impact)**: Errors that could cause confusion but unlikely to cause harm\n- **Class 2 (Significant Impact)**: Critical errors that could lead to misdiagnosis, wrong treatment, or patient safety risks\n\n**CRITICAL: Use BALANCED SENSITIVITY - Don't be overly lenient, but reserve Class 2 for truly critical errors**\n\n**Classification Guidelines:**\n\n**Class 0 - No clinical impact (for truly cosmetic changes):**\n- Punctuation and capitalization changes (periods, commas, capitalization)\n- Filler words (um, uh, like) being added or removed\n- Minor grammatical variations that preserve exact meaning\n- Patient names or identifying information (these don't affect clinical decisions)\n- Incomplete sentences or missing fragments that don't contain medical information\n- Nonsensical transcriptions of unclear speech that don't relate to medical content\n- Missing or altered non-medical conversational content (activities, TV shows, casual remarks)\n- Minor word reordering that preserves identical clinical meaning (e.g., \"one time, one thing at a time\" vs \"one time thing at the time\")\n\n**Class 1 - Moderate clinical relevance (use when clinical context is somewhat affected):**\n- Minor changes to symptom descriptions that don't fundamentally alter clinical understanding\n- Missing contextual details that provide clinical nuance (e.g., \"than usual\" being omitted from work stress descriptions)\n- Timeline discrepancies that could cause mild confusion but don't affect treatment decisions\n- Non-critical medical history details being altered that don't impact current care\n- Minor medication name variations that are still recognizable to clinicians\n- Partial loss of clinical context that doesn't affect diagnosis but may impact clinical understanding\n\n**Class 2 - ONLY for content critically affecting diagnosis/treatment:**\n- Medication status completely misrepresented (e.g., \"don't take medicine\" vs \"need to get medicine\")\n- Specific medication names being completely changed to different medications\n- Critical contraception methods being misidentified with different clinical implications\n- Dosages, frequencies, or administration instructions being significantly altered\n- Symptom severity being dramatically changed (mild to severe or vice versa)\n- Allergies or contraindications being missed, added, or altered\n- Critical family history being lost or fundamentally changed\n- Treatment plans or medical advice being misrepresented\n\n**Decision Framework:**\n1. First ask: \"Does this difference involve actual medical content (symptoms, medications, treatments, medical history, clinical context)?\"\n   - If NO \u2192 Class 0\n2. If YES, ask: \"Is any clinically relevant information changed, missing, or added, even if not critical?\"\n   - If NO change to clinical information \u2192 Class 0\n   - If minor clinical context affected but diagnosis/treatment unlikely to change \u2192 Class 1\n   - If major impact on diagnosis/treatment decisions \u2192 Class 2\n\n**Key Sensitivity Points:**\n- Be more sensitive to missing clinical context, even if not diagnosis-critical\n- Words like \"usual,\" temporal qualifiers, and descriptive modifiers can carry clinical significance\n- Don't be overly lenient with Class 0 - use Class 1 when clinical nuance is affected\n- Reserve Class 2 only for errors that could directly lead to different clinical decisions\n\n**Key Examples:**\n- Missing \"than usual\" in work stress context = Class 1 (clinical context affected)\n- \"One time, one thing at a time\" vs \"one time thing at the time\" = Class 0 (identical meaning)\n- Complete misrepresentation of medication status = Class 2 (critical error)\n\nProvide your assessment in two parts:\n1. **reasoning**: Explain the key differences and their clinical significance\n2. **clinical_impact**: Assign a number (0, 1, or 2) based on the classification above",
      "fields": [
        {
          "prefix": "Ground Truth Conversation:",
          "description": "${ground_truth_conversation}"
        },
        {
          "prefix": "Transcription Conversation:",
          "description": "${transcription_conversation}"
        },
        {
          "prefix": "Reasoning:",
          "description": "Brief clinical justification for the assessment."
        },
        {
          "prefix": "Clinical Impact:",
          "description": "Clinical impact class (return ONLY the number):\n        0 = No impact: cosmetic differences only (punctuation, capitalization, filler words)\n        1 = Minimal impact: some information missing/changed but NOT critical to diagnosis or treatment decisions  \n        2 = Significant impact: missing/incorrect information that COULD affect diagnosis, treatment, or patient safety\n        Return ONLY: 0, 1, or 2"
        }
      ]
    },
    "lm": {
      "model": "openrouter/google/gemini-2.5-pro",
      "model_type": "chat",
      "cache": true,
      "num_retries": 3,
      "finetuning_model": null,
      "launch_kwargs": {},
      "train_kwargs": {},
      "temperature": 0.1,
      "max_tokens": 8000
    }
  },
  "metadata": {
    "dependency_versions": {
      "python": "3.13",
      "dspy": "3.0.4",
      "cloudpickle": "3.1"
    }
  }
}