• 1 Post
  • 4 Comments
Joined 3 years ago
cake
Cake day: June 17th, 2023

help-circle


  • I suggest using llama.cpp instead of ollama, you can easily squeeze +10% in inference speed and other memory optimizations from llama.cpp. With hardware prices nowadays I think every % saved on resources matters. Here is a simple ansible role to setup llama.cpp, it should give you a good idea of how to deploy it.

    A dedicated inference rig is not gonna be cheap. What I did, since I need a gaming rig; is getting 32GB DDR5 (this was before the current RAMpocalypse, if I had known I would have bought 64) and an AMD 9070 (16GB VRAM - again if I had known how crazy prices would get I’d probably ahve bought a 24GB VRAM card). The home server runs the usual/non-AI stuff, and llamacpp runs on the gaming desktop (the home server just has a proxy to it). Yeah the gaming desktop has to be powered up when I want to run inference, this is my main desktop so it’s powered on most of the time, no big deal


  • Email

    Most applications/services offer mail as notification channel. Even old school unix utilities such as cron support sending mail (through the system MTA). I use msmtp. Then configure K-9 mail or any decent mail client on your phone, setup filters so that mail from your services ends up in a high priority folder in your mailbox with notifications enabled.

    I want to be able to receive notifications both on mobile and desktop, this is the only reasonable option I found and have been running with it for > 10 years.