I’ve been thinking about adding this to my “Fuck it, I’ll do it myself” / SHTF pile. I have a spare 10-15GB for a good selection of basic articles (across sciences, history, pop culture trivia etc).
https://get.kiwix.org/en/solutions/hotspots/content-bundles/
https://get.kiwix.org/en/solutions/hotspots/imager-service/
There’s something inherently cool about having wikipedia in a box (yes, you’d likely need to refresh it once a year) but I’ve never heard of anyone actually self hosting a Kiwix instance.
I got Kiwix at home and on my iPhone it rocks.
Yes, and I actually use it to train a local llm so I’m not hammering the internet. I have a ton of storage, and like to keep my kids in the sandbox, so we have wikipedia, project gutenberg, kahn academy, and a bunch of others all hosted behind an apache reverse proxy which is using mellon so there’s LDAP auth.
That was actually my immediate thought. I already have Wikipedia as a trusted source for llm, but I would prefer to self host and not hammer them.
130GB to fit the entirely of Wikipedia is basically nothing and I’m mildly embarrassed not to have done it already.
I also try to participate in some of the farms, running zimit and mwoffliner to help make more archives. Feels like I’m helping.
Yes. This is my ansible role that deploys it
Yes, it’s helpful
I switched to an N150 some time ago, but I previously had it running perfectly on a Pi 4 with only 2GB of RAM. There’s actually a lot more content available than just Wikipedia! You can even archive your own websites using https://zimit.kiwix.org/
It’s fun and Kiwix is impressively lightweight, it uses less than 50 MB of RAM, even with an article loaded.
I do on my TrueNAS in a docker container. I have about 1TB of zim files hosted including pre-LLM copies of German, English and French Wikipedia as well as the last two current versions in these languages.
Aditionally I have project Gutenberg Books in german and english as well as lots of random technical, medical, survival, etc stuff that I came accross - a lot of that is trash though, but sorting is too time consuming and my NAS has 48TB so who cares…
Humorously, you could use an agent to help you sort things. If theres anything it’s good at, it’s sorting.
How do you like TrueNAS? I’m too locked in to Synology at this point—with almost 800tb (in physical drives, less actual because of redundancy), and several devices.
Pretty happy woth TrueNAS, actually came from Synology and bought a UGreen DXP4800Plus, didn’t like the UGOS on it and pretty much immediately switched tp TrueNAS. It’s been absolutely flawless for about 15 months by now, docker integration in the OS is a bit limited by I run my compose stacks managed through dockge anyways.
I won’t let LLMs crawl my data, it’s mine and mine alone :)
That’s awesome. If I understand correctly, kiwix server creates a local site you can access from anything on your wLAN, as a transparent website? I take it it auto populates with your ZIM files, and that you can add to it (eg: project Gutenberg).
If so, that’s a hell of a thing.
Yep exactly. Also you can have other people (friends/family) have access via VPN, Tailscale, etc.
Is there an actual download link? They want $20 for the Raspberry Pi image
Damn their website has become a mess. Anyway
Yes, I self host the English Wikipedia dump, as well as a few cooking sites and topic specific stack exchange dumps available in zim format.
My goal is:
- reduce dependence on public internet. In the event of an outage or restriction I’d like some books and other content I can use to entertain myself
- locally preserve a snapshot of information before it is possibly diluted by LLM edits
Yup! Here’s my setup:
https://github.com/shadybraden/compose/blob/main/kiwix/compose.yaml