Selfhost an LLM

Shimitar@downonthestreet.eu · 5 months ago

Selfhost an LLM

iii@mander.xyz · edit-2 5 months ago

One of these projects might be of interest to you:

Do note that CPU inference is quite a lot slower than GPU or the well known SAAS providers. I currently like the quantized deepseek models as the best balance between quality of replies and inference time when not using GPU.

ProperlyProperTea@lemmy.ml · 4 months ago

Indeed, other than being able to get the model running, having decent hardware is the next most important part.

3060 12gb is probably cheapest card to get, 3090 or other 24gb card if you can get it