Hi all, i am quite an old fart, so i just recently got excited about self hosting an AI, some LLM…
What i want to do is:
- chat with it
- eventually integrate it into other services, where needed
I read about OLLAMA, but it’s all unclear to me.
Where do i start, preferably with containers (but “bare metal”) is also fine?
(i already have a linux server rig with all the good stuff on it, from immich to forjeio to the arrs and more, reverse proxy, Wireguard and the works, i am looking for input on AI/LLM, what to self host and such, not general selfhosting hints)
One of these projects might be of interest to you:
https://github.com/Mintplex-Labs/anything-llm
https://github.com/mudler/LocalAI
Do note that CPU inference is quite a lot slower than GPU or the well known SAAS providers. I currently like the quantized deepseek models as the best balance between quality of replies and inference time when not using GPU.
Indeed, other than being able to get the model running, having decent hardware is the next most important part.
3060 12gb is probably cheapest card to get, 3090 or other 24gb card if you can get it