• humanspiral@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    ·
    22 hours ago

    3.6 27b is probably most powerful/efficient (to size) model out there. Qwen has a history of leveraging deepseek power as well. (deepseek creating small models with Qwen as the base), and Alibaba is main hosting service for deepseek. Alibaba/Qwen in talks to invest in Deepseek, atm.

    • Avid Amoeba@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      21 hours ago

      Yeah. The 80b Coder-Next runs at about the same speed on my hw too. I don’t know if it’s any better than 3.6 27b.