un-swebench
swebench unlocked. SWE Bench is focused on publically verifyable submissiosn to their central list.
unswebench represents submissions from closed source technology that is not comfortable sharing their trajectorires as these trajectories can represent the internal IP and training of the models.
unswebench is a good-ol high-score submission board to encourage AI coding agents to improve and push foward as a whole. Weather they are open source or not.
Submissions
Submitter | Lite Score | Full Score | Verified Score |
---|---|---|---|
cosine | 50.7% | 30.08% | 43.8% |
CodeStory Aide + Mixed Models | 43% | % | % |
OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022) | 41.67% | % | % |
Bytedance MarsCode Agent | 39.33% | % | % |
AbanteAI MentatBot + GPT 4o (2024-05-13) | 38% | % | % |
Honeycomb | 38.33% | % | % |
Gru(2024-08-11) | 35.67% | % | % |
Isoform | 35% | % | % |
Bytedance MarsCode Agent + GPT 4o (2024-05-13) | 34% | % | % |
SuperCoder2.0 | 34% | % | % |
Alibaba Lingma Agent | 33% | % | % |
Agentless-1.5 + GPT 4o (2024-05-13) | 32% | % | % |
Factory Code Droid | 31.33% | % | % |