Plain inference is profitable actually, that’s why there are a hundred inference providers on OpenRouter who compete by undercutting each other. The labs however aren’t profitable because training the models is a huge drain.
As others have pointed out, the compute cost of inference is only one small part of the puzzle. All the frontier model providers - which OpenRouter gives you access to - are massively raising their prices in desperate bids to recoup the cost of model generation in the first place.
There’s really no hiding from the token apocalypse unless you’re running a model on your own hardware.
But they can only do that because others are doing traning, no? There’s no point at which you can go “okay it’s all inference from here”, the model needs to be updated with new information/guardrails/context to continue being useful for most use cases
Plain inference is profitable actually, that’s why there are a hundred inference providers on OpenRouter who compete by undercutting each other. The labs however aren’t profitable because training the models is a huge drain.
As others have pointed out, the compute cost of inference is only one small part of the puzzle. All the frontier model providers - which OpenRouter gives you access to - are massively raising their prices in desperate bids to recoup the cost of model generation in the first place.
There’s really no hiding from the token apocalypse unless you’re running a model on your own hardware.
But they can only do that because others are doing traning, no? There’s no point at which you can go “okay it’s all inference from here”, the model needs to be updated with new information/guardrails/context to continue being useful for most use cases