DeepSeek ditches Nvidia for Huawei chips in V4 launch

inari@piefed.zip · 2 days ago

DeepSeek ditches Nvidia for Huawei chips in V4 launch

KingRandomGuy@lemmy.world · 13 hours ago

Yeah I can believe their interconnect is better, given their extensive history in networking.

W.r.t TFLOPs, let me clarify what I meant. Even on traditionally compute-bound workloads (attention, etc.), on H200 it’s actually surprisingly difficult to make full use of the card’s throughput before hitting VRAM bandwidth limits. Tensor core throughput has grown a lot faster than bandwidth has.

I’ve never written a kernel for Huawei chips so I have no idea if they have the same problem. But this problem is there on many datacenter-class NVIDIA chips, which is why they keep introducing features (TMA, TMEM, etc.) to try and lower the time wasted waiting for memory.

DeepSeek ditches Nvidia for Huawei chips in V4 launch

DeepSeek ditches Nvidia for Huawei chips in V4 launch

Just a moment...