Coding Agent Leaderboard

Compare coding agents across models and harnesses

5 Results ยท 2 Models ยท 4 Harnesses ยท 1 Datasets

A Coding Agent is more than just a model - it's the combination of a Model and a Harness (the tool/framework driving the model). This leaderboard tracks how these components work together, because the same model can perform very differently depending on the harness it's paired with.

{
  • "headers": [
    • " ",
    • "Dataset",
    • "Harness",
    • "Model",
    • "Model ID",
    • "Precision",
    • "Skills",
    • "Environment",
    • "Score",
    • "Avg Cost Per Task (USD)",
    • "Avg Seconds Per Task",
    • "Avg Input Tokens Per Task",
    • "Avg Output Tokens Per Task",
    • "Model License",
    • "Harness License",
    • "Model Num Params (B)"
    ],
  • "data": [
    • [
      • "๐Ÿ”ถ",
      • "[swe-bench-verified](https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified)",
      • "[internal](https://www.anthropic.com/news/claude-opus-4-7)<sup>*</sup>",
      • "Claude Opus 4.7",
      • "[Claude Opus 4.7](https://www.anthropic.com/news/claude-opus-4-7)",
      • "bf16",
      • "None",
      • "[internal](https://www.anthropic.com/news/claude-opus-4-7)<sup>*</sup>",
      • 0.876,
      • null,
      • null,
      • null,
      • null,
      • "Proprietary",
      • "Proprietary",
      • 4000
      ],
    • [
      • "๐Ÿ”ถ",
      • "[swe-bench-verified](https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified)",
      • "[internal](https://qwen.ai/blog?id=qwen3.6-35b-a3b)<sup>*</sup>",
      • "Qwen3.6-35B-A3B",
      • "[Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)",
      • "bf16",
      • "None",
      • "[internal](https://qwen.ai/blog?id=qwen3.6-35b-a3b)<sup>*</sup>",
      • 0.734,
      • null,
      • null,
      • null,
      • null,
      • "FOSS",
      • "Proprietary",
      • 35
      ],
    • [
      • "๐ŸŸ ",
      • "[swe-bench-verified](https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified)",
      • "[Pi](https://github.com/earendil-works/pi/tree/main)",
      • "Qwen3.6-35B-A3B",
      • "[RedHatAI/Qwen3.6-35B-A3B-NVFP4](https://huggingface.co/RedHatAI/Qwen3.6-35B-A3B-NVFP4)",
      • "nvfp4",
      • "None",
      • "[harbor](https://github.com/harbor-framework/harbor)",
      • 0.65,
      • 0.08,
      • 309,
      • 1582367,
      • 12667,
      • "FOSS",
      • "FOSS",
      • 35
      ],
    • [
      • "๐Ÿ”ถ",
      • "[swe-bench-verified](https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified)",
      • "[Claude Code](https://github.com/anthropics/claude-code)",
      • "Qwen3.6-35B-A3B",
      • "[RedHatAI/Qwen3.6-35B-A3B-NVFP4](https://huggingface.co/RedHatAI/Qwen3.6-35B-A3B-NVFP4)",
      • "nvfp4",
      • "None",
      • "[harbor](https://github.com/harbor-framework/harbor)",
      • 0.632,
      • 0.07,
      • 245,
      • 2213237,
      • 11466,
      • "FOSS",
      • "Proprietary",
      • 35
      ],
    • [
      • "๐ŸŸ ",
      • "[swe-bench-verified](https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified)",
      • "[OpenCode](https://github.com/anomalyco/opencode)",
      • "Qwen3.6-35B-A3B",
      • "[RedHatAI/Qwen3.6-35B-A3B-NVFP4](https://huggingface.co/RedHatAI/Qwen3.6-35B-A3B-NVFP4)",
      • "nvfp4",
      • "None",
      • "[harbor](https://github.com/harbor-framework/harbor)",
      • 0.548,
      • 0.06,
      • 240,
      • 939613,
      • 9875,
      • "FOSS",
      • "FOSS",
      • 35
      ]
    ],
  • "metadata": null
}

* internal refers to internal benchmarks performed by the model provider where the harness/environment were not made public