It is! 143GB last I checked. I’m on 128GB RAM + 3090, 1 NUMA node, so I think it’s juuust barely too tight. But it should be perfect with a few of the “sparsest” MoEs quantized.
If KTransformers supports something like that, I may have to finally check it out, since v4 won’t need many esoteric features.
It is! 143GB last I checked. I’m on 128GB RAM + 3090, 1 NUMA node, so I think it’s juuust barely too tight. But it should be perfect with a few of the “sparsest” MoEs quantized.
If KTransformers supports something like that, I may have to finally check it out, since v4 won’t need many esoteric features.