server-inference push projudgebench
This commit is contained in:
52
program.json
Normal file
52
program.json
Normal file
@@ -0,0 +1,52 @@
|
||||
{
|
||||
"traces": [],
|
||||
"train": [],
|
||||
"demos": [],
|
||||
"signature": {
|
||||
"instructions": "You are given a scientific problem, its correct final answer, and a student\u2019s solution to evaluate.\nYour task is to: first, solve the problem yourself, using the correct final answer as a hint. Ensure your reasoning leads to the\ncorrect answer. Once you have a clear understanding of how the problem could be solved, evaluate the correctness of the given step in the student\u2019s solution.\nFocus exclusively on the scientific, logical, or mathematical correctness of the solution. Ignore differences in formatting,\nexpression style, specific wording, or presentation order, as long as the reasoning and results are valid.\nFor the step, perform:\nBinary scoring: true if step is correct, false if incorrect\nThen in your reasoning, provide:\n1. Error classification (only if the step is incorrect):\na. Numerical Calculation Error. Errors in basic arithmetic operations such as addition, subtraction, division, or square roots.\nb. Symbolic Calculation Error. Errors in manipulating algebraic expressions, such as incorrect expansion, factoring, simplification, or solving equations with variables.\nc. Visual Interpretation Error. Errors in interpreting graphical data, such as misidentifying coordinates, shapes, spatial\nrelationships, or data within figures.\nd. Reasoning Error. Errors in the logical thinking process that lead to incorrect conclusions, such as flawed arguments, invalid\ninferences, or gaps in the logical flow of the solution.\ne. Knowledge Error. Errors caused by insufficient understanding or incorrect application of necessary knowledge (e.g.,\nconcepts, formulas, theorems, methods), or using outdated or incorrect information.\nf. Question Understanding Error. Errors due to misunderstanding or misinterpreting the problems\u2019 conditions or requirements, such as misreading questions or misapplying given conditions.\ng. No solution provided. The model refuses to answer, fails to follow instructions to make a solution, or encounters anomalies\nin generation process such as repetitive responses or incomplete outputs.\n2. Provide a brief explanation for the identified error.\n3. Include any uncertainty or ambiguity in your reasoning.",
|
||||
"fields": [
|
||||
{
|
||||
"prefix": "Question:",
|
||||
"description": "Scientific problem statement."
|
||||
},
|
||||
{
|
||||
"prefix": "Correct Answer:",
|
||||
"description": "Correct final answer for the problem."
|
||||
},
|
||||
{
|
||||
"prefix": "Steps:",
|
||||
"description": "Full student solution steps."
|
||||
},
|
||||
{
|
||||
"prefix": "Step To Evaluate:",
|
||||
"description": "Single step to evaluate for correctness."
|
||||
},
|
||||
{
|
||||
"prefix": "Reasoning:",
|
||||
"description": "Step-by-step reasoning, including uncertainty, ambiguity, or uncertainty in your thought process when relevant."
|
||||
},
|
||||
{
|
||||
"prefix": "Correct:",
|
||||
"description": "Whether the evaluated step is correct."
|
||||
}
|
||||
]
|
||||
},
|
||||
"lm": {
|
||||
"model": "together_ai/Qwen/Qwen2.5-7B-Instruct-Turbo",
|
||||
"model_type": "chat",
|
||||
"cache": true,
|
||||
"num_retries": 3,
|
||||
"finetuning_model": null,
|
||||
"launch_kwargs": {},
|
||||
"train_kwargs": {},
|
||||
"temperature": null,
|
||||
"max_tokens": null
|
||||
},
|
||||
"metadata": {
|
||||
"dependency_versions": {
|
||||
"python": "3.11",
|
||||
"dspy": "3.1.3",
|
||||
"cloudpickle": "3.1"
|
||||
}
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user