To be clear, We don’t really know the architecture or model size of claude or chatgpt. If opus is a 1 trillion parameter dense model, then yes of course it’s going to be way more expensive to run than deepseak which is an 862 billion parameter MoE model.
American models have focused on being the best regardless of cost where chinese model makers were forced to focus on efficiency because of lack of access to chips. Which will probably give them an advantage as the free investor money runs out for the american companies.
The constant training is simply how AI works and will continue to work chinese or american. That cost will never go away. Anytime you need your model to learn new skills or gain new knowledge, it needs to be trained.
To be clear, We don’t really know the architecture or model size of claude or chatgpt. If opus is a 1 trillion parameter dense model, then yes of course it’s going to be way more expensive to run than deepseak which is an 862 billion parameter MoE model. American models have focused on being the best regardless of cost where chinese model makers were forced to focus on efficiency because of lack of access to chips. Which will probably give them an advantage as the free investor money runs out for the american companies.
The constant training is simply how AI works and will continue to work chinese or american. That cost will never go away. Anytime you need your model to learn new skills or gain new knowledge, it needs to be trained.