The enshittification will go unnoticed at first but I'm already finding my favourite frontier models severely nerfed, doing incredibly dumb stuff they weren't in the past.
We need open weight models to have a stable "platform" when we rely on them, which we do more and more.
This is the future though. Open weights models that run on H200s provide far more opportunity to build products and real infrastructure around.
You can always distill this for your little RTX at home. But models shaped for consumer hardware will never win wide adoption or remain competitive with frontier labs.
This is something that _can_ compete. And it will both necessitate and inspire a new generation of open cloud infra to run inference. "Push button, deploy" or "Push button, fine tune" shaped products at the start, then far more advanced products that only open weights not locked behind an API can accomplish.
Now we just need open weights Nano Banana Pro / GPT Image 2, and Seedance 2.0 equivalents.
The battle and focus should be on open weights for the data center.
Awesome to have a open model that can compete, but damn it would be so much better if you could run it locally. Otherwise, it's almost so difficult to run (e.g. self host) that it's just way more convenient to pay OpenAI, Claude, etc
Great to know, but what was the cost both in terms of $$ and tokens used?
Not to invalidate these benchmark results because they are useful, but the real usefulness it what they are capable to do when real people interact with them at scale.
Regardless, these are good news, because now that Microsoft is basically giving up their all-in strategy with Github's Copilot and Anthropic is playing the "I'm too good for you" game, it's about time for them to get pressed into not making this AI world into a divide between the haves and the have-nots.
I have to try Kimi. I was looking for an alternative. If you have any experience, advice, please share. I saw Kimi is at the top of the Open Router ranking.
I’m a little confused as to the setup. It was asking each model to one-shot a script and then the scripts faced off? Were the models given a computer environment? Or a test server to iterate against?
Kimi K2.6 is definitely a frontier-sized model, so on the one hand it's not that surprising it's up there with the closed frontier models.
Being open is nice though, even though it doesn't matter that much for folks like me with a single consumer GPU.
The enshittification will go unnoticed at first but I'm already finding my favourite frontier models severely nerfed, doing incredibly dumb stuff they weren't in the past.
We need open weight models to have a stable "platform" when we rely on them, which we do more and more.
You can always distill this for your little RTX at home. But models shaped for consumer hardware will never win wide adoption or remain competitive with frontier labs.
This is something that _can_ compete. And it will both necessitate and inspire a new generation of open cloud infra to run inference. "Push button, deploy" or "Push button, fine tune" shaped products at the start, then far more advanced products that only open weights not locked behind an API can accomplish.
Now we just need open weights Nano Banana Pro / GPT Image 2, and Seedance 2.0 equivalents.
The battle and focus should be on open weights for the data center.
Awesome to have a open model that can compete, but damn it would be so much better if you could run it locally. Otherwise, it's almost so difficult to run (e.g. self host) that it's just way more convenient to pay OpenAI, Claude, etc
Not to invalidate these benchmark results because they are useful, but the real usefulness it what they are capable to do when real people interact with them at scale.
Regardless, these are good news, because now that Microsoft is basically giving up their all-in strategy with Github's Copilot and Anthropic is playing the "I'm too good for you" game, it's about time for them to get pressed into not making this AI world into a divide between the haves and the have-nots.