This agent automatically identifies and repairs optimization model bugs in Gurobi LP/MIP/ILP code using tool execution, static analysis, and test-driven evaluation.
This project builds an end-to-end code repair agent for Gurobi-based optimization models, together with a curated benchmark and a rigorous evaluation pipeline.
I created a dataset of 26 Gurobi use cases covering LP, ILP, MIP, QCP, and classic combinatorial/logistics models. Each use case includes a natural language description, the mathematical formulation, and a reference Gurobi implementation.
For every use case, I wrote a 10-test unit-test suite (260 tests total) that checks:
On top of this benchmark, I implemented a set of repair tools, where idea is used in the Lean4 project. However, the Gurobi domain introduces its own unique challenges: Gurobi model or gurobipy execution exposes only coarse solver signals, unlike Python exceptions that often pinpoint the exact line or operation causing failure. The running result of Gurobi model (coding) will not tell you if your coding is correct, hence human unittest for different input-output check is necessary, hence the oracle code matters.
The agent runs in an iterative loop: it reads the failing script, analyzes the error signal, proposes and applies a fix, and then re-runs the unit tests. I evaluate the agent on every use case by measuring compile/execute success and pass/fail/error across all unit tests.
If the paper does not display, you can open it in a new tab.
Coming soon: code snippets showing a broken Gurobi model, the agent’s tool calls, and the repaired solution.