Gurobi Code Repair Agent

This agent automatically identifies and repairs optimization model bugs in Gurobi LP/MIP/ILP code using tool execution, static analysis, and test-driven evaluation.

Overview

This project builds an end-to-end code repair agent for Gurobi-based optimization models, together with a curated benchmark and a rigorous evaluation pipeline.

I created a dataset of 26 Gurobi use cases covering LP, ILP, MIP, QCP, and classic combinatorial/logistics models. Each use case includes a natural language description, the mathematical formulation, and a reference Gurobi implementation.

For every use case, I wrote a 10-test unit-test suite (260 tests total) that checks:

Model construction and parameter validation
Solver behavior (optimal, infeasible, unbounded cases)
Correctness of the objective value and selected variables
Constraint satisfaction and basic numerical sanity

On top of this benchmark, I implemented a set of repair tools, where idea is used in the Lean4 project. However, the Gurobi domain introduces its own unique challenges: Gurobi model or gurobipy execution exposes only coarse solver signals, unlike Python exceptions that often pinpoint the exact line or operation causing failure. The running result of Gurobi model (coding) will not tell you if your coding is correct, hence human unittest for different input-output check is necessary, hence the oracle code matters.

The agent runs in an iterative loop: it reads the failing script, analyzes the error signal, proposes and applies a fix, and then re-runs the unit tests. I evaluate the agent on every use case by measuring compile/execute success and pass/fail/error across all unit tests.

Paper

If the paper does not display, you can open it in a new tab.

Examples

Coming soon: code snippets showing a broken Gurobi model, the agent’s tool calls, and the repaired solution.

← Back to Home