{ "predictor": { "traces": [], "train": [], "demos": [], "signature": { "instructions": "Task: Evaluate and compare the quality of two responses (Response A and Response B) given a specific question (input). Determine which response better addresses the question by focusing on factual correctness, completeness, clarity, and adherence to any specific requirements mentioned in the question prompt. Select the response that provides the most accurate, comprehensive, and relevant solution or explanation to the problem presented.\n\nDetailed Instructions:\n\n1. **Understand the Question Context:**\n - Grasp the full context and requirements specified by the question or problem statement, including any implicit or explicit instructions such as formats, calculations, algorithms, or methodologies. Identify key concepts and terminologies that are crucial for accurately addressing the question.\n\n2. **Evaluate Each Response:**\n - **Factual Accuracy:** Verify the factual correctness of the content, calculations, or recommendations provided in each response. Use domain-specific knowledge and constants where applicable, such as understanding Susan Wolf\u2019s philosophy, handling Python function edge cases, or calculating hyperfine transition frequencies.\n - **Completeness:** Assess whether the response fully addresses all aspects of the question, including any implicit assumptions or conditions.\n - **Adherence to Requirements:** Ensure each response adheres to specified question requirements, avoiding methods if instructed and utilizing correct formulas or approaches. Consider whether the responses meet the intended outcomes of the problem without imposing unrelated assumptions.\n \n3. **Clarity and Logical Structure:**\n - Evaluate whether the responses are clearly articulated and logically structured. Determine if the steps taken in reasoning or calculations are explicitly and adequately explained. Assess if the explanation enhances understanding or educational value, even when not explicitly required by the question.\n\n4. **Consider Domain-Specific Details:**\n - Consider the specific context of the task:\n - **For Philosophy Questions:** Reference Susan Wolf's published perspectives, ensuring responses accurately represent her philosophy without misinterpretation. Reconfirm specific options align with her stance, particularly for nuanced philosophical debates.\n - **For Python Programming Tasks:** Analyze edge cases such as handling empty strings and consider reporting clarity. Verify compliance with coding instructions, like outputting code in markdown ticks.\n - **For Physics Questions:** Utilize the concepts of quantum numbers, hyperfine splitting, and Planck constants accurately. Understand and correctly apply the necessary formulas. Differentiate between hypothetical scenarios and standard cases to derive correct values.\n\n5. **Decision Making:**\n - Compare the responses systematically, focusing on how differences impact the quality of the solution or explanation. Consider the degree to which each response meets the criteria of accuracy, comprehensiveness, clarity, and adherence to requirements.\n - Choose the response matching the correct answer and also reflectively reasoned according to the given context and methodological needs.\n\n6. **Output Your Conclusion:**\n - After determining which response is better, output your decision in the required format: \"A>B\" if Response A is superior, or \"B>A\" if Response B is preferable.", "fields": [ { "prefix": "Question:", "description": "The original question or prompt" }, { "prefix": "Response A:", "description": "First response to evaluate" }, { "prefix": "Response B:", "description": "Second response to evaluate" }, { "prefix": "Reasoning:", "description": "Your reasoning for why you chose the better response" }, { "prefix": "Label:", "description": "Which response is better: 'A>B' or 'B>A'" } ] }, "lm": { "model": "huggingface/together/Qwen/Qwen3-VL-32B-Instruct", "model_type": "chat", "cache": true, "num_retries": 3, "finetuning_model": null, "launch_kwargs": {}, "train_kwargs": {}, "temperature": null, "max_tokens": null } }, "metadata": { "dependency_versions": { "python": "3.11", "dspy": "3.1.0", "cloudpickle": "3.1" } } }