Frustratingly bad at self hosting. Can someone help me access LLMs on my rig from my phone

BlackSnack@lemmy.zip · 2 months ago

Frustratingly bad at self hosting. Can someone help me access LLMs on my rig from my phone

dontblink@feddit.it · 2 months ago

Self hosting IS hard, don’t beat yourself too much because of it… After all you’re trying to serve services for yourself that are usually served by companies with thousands of employees.

A server requires knowledge, maintainance and time, it’s okay to feel frustrated sometimes.

november@piefed.blahaj.zone · 2 months ago

Why don’t you ask your LLMs how to do it.

BlackSnack@lemmy.zip · 2 months ago

lol I have! They all say the same similar thing but it’s just not working for me.

november@piefed.blahaj.zone · 2 months ago

How strange.

illusionist@lemmy.zip · 2 months ago

Why do you want to set it up if your experience is bad results?

bitwolf@sh.itjust.works · 2 months ago

To eliminate another subscription I imagine.

Gladaed@feddit.org · 2 months ago

They are trolling you. They are probably radical anti AI folk.

DrDystopia@lemy.lol · edit-2 2 months ago

Just do like me - Install Ollama and OpenWebUI, install Termux on Android, connect through Termux with port forwarding.

ssh -L 0.0.0.0:3000:ServerIP_OnLAN:3000

And access OpenWebUI at http://127.0.0.1:3000/ on your phone browser. Or SSH forward the Ollama port to use the Ollama Android app. This requires you to be on the same LAN as the server. If you port forward SSH through your router, you can access it remotely through your public IP (If so, I’d recommend only allowing login through certs or have a rate limiter for SSH login attempts.

The shell command will then be ssh -L 0.0.0.0:3000:YourPublicIP:3000

But what are the chances that you run the LLM on a Linux machine and use an android to connect, like me, and not a windows machine and use an iPhone? You tell me. No specs posted…

BlackSnack@lemmy.zip · 2 months ago

Oh! Also, I’m using windows on my PC. And my phone is an iPhone.

I’m not using Linux yet, but that is in my todo list for the future! After I get more comfortable with some more basics of self hosting.

tal@lemmy.today · 2 months ago

Oh! Also, I’m using windows on my PC. And my phone is an iPhone.

Okay, that’s a starting place. So if this is Windows, and if you only care about access on the wireless network, then I suppose that it’s probably easiest to just expose the stuff directly to other machines on the wireless network, rather than tunneling through SSH.

You said that you have ollama running on the Windows PC. I’m not familiar with LibreChat, but it has a Web-based interface? Are you wanting to access that from a web browser on the phone?

BlackSnack@lemmy.zip · 2 months ago

Yes exactly! I would love to keep it on my network for now. I’ve read that “exposing a port” is something I may have to do in my windows firewall options.

Yes I have Ollama on my windows rig. But im down to try out a different one if you suggest so. TBH, im not sure if librechat has a web ui. I think accessing the LLM on my phone via web browser would be easiest. But there are apps out there like Reins and Enchanted that I could take advantage of.

For right now I just want to do whatever is easiest so I can get a better understanding of what I’m doing wrong.

tal@lemmy.today · 2 months ago

Yes I have Ollama on my windows rig.

TBH, im not sure if librechat has a web ui.

Okay, gotcha. I don’t know if Ollama has a native Web UI itself; if so, I haven’t used it myself. I know that it can act as a backend for various front-end chat-based applications. I do know that kobold.cpp can operate both as an LLM backend and run a limited Web UI, so at least some backends do have Web UIs built in. You said that you’ve already used Ollama successfully. Was this via some Web-based UI that you would like to use on your phone, or just some other program (LibreChat?) running natively on the Windows machine?

BlackSnack@lemmy.zip · 2 months ago

Backend/ front end. I see those a lot but I never got an explanation for it. In my case, the backend would be Ollama on my rig, and the front end would be me using it on my phone, whether that’s with and app or web ui. Is that correct?

I will add kobold to my list of AIs to check out in the future. Thanks!

Ollama has an app (or maybe interface is a better term for it) on windows right that I download models too. Then I can use said app to talk to the models. I believe Reins: Chat for Ollama is the app for iPhone that allows me to use my phone to chat with my models that are on the windows rig.

tal@lemmy.today · edit-2 2 months ago

Backend/ front end. I see those a lot but I never got an explanation for it. In my case, the backend would be Ollama on my rig, and the front end would be me using it on my phone, whether that’s with and app or web ui. Is that correct?

For Web-based LLM setups, it’s not common to have two different software packages. One loads the LLM into video memory and executes queries on the hardware. That’s the backend. It doesn’t need to have a user interface at all. Ollama or llama.cpp (though I know that llama.cpp also has a minimal frontend) are examples of this.

Then there’s a frontend component. It runs a small Web server that displays a webpage that a Web browser can access, provides some helpful features, and can talk to various backends (e.g. ollama or llama.cpp or some of the cloud-based LLM services). Something like SillyTavern would be an example of this.

Normally the terms are used in the context of Web-based stuff; it’s common for Web services, even outside of LLM stuff, to have a “front end” and a “back end” and to have different people working on those different aspects. If Reins is a native iOS app, I guess it could technically be called a frontend.

But, okay, it sounds like probably the most-reasonable thing to do, if you like the idea of using Reins, is to run Ollama on the Windows machine, expose ollama’s port to the network, and then install Reins on iOS.

So, yeah, probably need to open a port on Windows Firewall (or Windows Defender…not sure what the correct terminology is these days, long out of date on Windows). It sounds like having said firewall active has been the default on Windows for some years. I’m pretty out-of-date on Windows, but I should be able to stumble through this.

While it’s very likely that you aren’t directly exposing your computer to the Internet — that is, nobody from the outside world can connect to an open port on your desktop — it is possible to configure consumer routers to do that. Might be called “putting a machine in the DMZ”, forwarding a port, or forwarding a range of ports. I don’t want to have you open a port on your home computer and have it inadvertently exposed to the Internet as a whole. I’d like to make sure that there’s no port forwarding to your Windows machine from the Internet.

Okay, first step. You probably have a public IP address. I don’t need or want to know that — that’d give some indication to your location. If you go somewhere like https://whatismyipaddress.com/ in a web browser from your computer, then it will show that – don’t post that here.

That IP address is most-likely handed by your ISP to your consumer broadband router.

There will then be a set of “private” IP addresses that your consumer broadband router hands out to all the devices on your WiFi network, like your Windows machine and your phone. These will very probably be 192.168.something.something, though they could also be 172.something.something.something or 10.something.something.something. It’s okay to mention those in comments here — they won’t expose any meaningful information about where you are or your setup. This may be old hat to you, or new, but I’m going to mention it in case you’re not familiar with it; I don’t know what your level of familiarity is.

What you’re going to want is your “private” IP address from the Windows machine. On your Windows machine, if you hit Windows Key-R and then enter “cmd” into the resulting dialog, you should get a command-line prompt. If you type “ipconfig” there, it should have a line listing your private IPv4 address. Probably be something like that “192.168.something.something”. You’re going to want to grab that address. It may also be possible to use the name of your Windows machine to reach it from your phone, if you’ve named it — there’s a network protocol, mDNS, that may let you do that — but I don’t know whether it’s active out-of-box on Windows or not, and would rather confirm that the thing is working via IP before adding more twists to this.

Go ahead and fire up ollama, if you need to start it — I don’t know if, on Windows, it’s installed as a Windows service (once installed, always runs) or as a regular application that you need to launch, but it sounds like you’re already familiar with that bit, so I’ll let you handle that.

Back in the console window that you opened, go ahead and run netstat -a -b -n.

Will look kinda like this:

https://i.sstatic.net/mJali.jpg

That should list all of the programs listening on any ports on the computer. If ollama is up and running on that Windows machine and doing so on the port that I believe it is, then you should have a line that looks like:

TCP     0.0.0.0:11434    0.0.0.0:0    LISTENING

“11434” is the port that I expect ollama to be listening on.

If the address you see before “11434” is 0.0.0.0, then it means that ollama is listening on all addresses, which means that any program that can reach it over the network can talk to it (as long as it can get past Windows Firewall). We’re good, then.

Might also be “127.0.0.1”. In that case, it’ll only be listening to connections originating from the local computer. If that’s the case, then it’ll have to be configured to use 0.0.0.0.

I’m gonna stop here until you’ve confirmed that much. If that all works, and you have ollama already listening on the “0.0.0.0” address, then next step is gonna be to check that the firewall is active on the Windows machine, punch a hole in it, and then confirm that ollama is not accessible from the Internet, as you don’t want people using your hardware to do LLM computation; I’ll try and step-by-step that.

BlackSnack@lemmy.zip · 2 months ago

Dope! This is exactly what I needed! I would say that this is a very “hand holding” explanation which is perfect because I’m starting with 0% knowledge in this field! And I learned so much already from this post and your comment!

So here’s where I’m at, -A backend is where all the weird c++ language stuff happens to generate a response from an AI. -a front end is a pretty app or webpage that takes that response and make it more digestible to the user. -agreed. I’ve seen in other posts that exposing a port on windows defender firewall is the easiest (and safest?) way to go for specifically what I’m looking for. I don’t think I need to forward a port as that would be for more remote access. -I went to the whatismyipaddress website. The ipv6 was identical to one of the ones I have. The ipv4 was not identical. (But I don’t think that matters moving forward.) -I did the ipconfig in the command prompt terminal to find the info and my ipv4 is 10.blahblahblah.

I ran netstat -abn (this is what worked to display the necessary info). I’m able to see 0.0.0.0 before the 11434! I had to go into the settings in the ollama backend app to enable “expose Ollama to the network”.

I’m ready for the next steps!

BlackSnack@lemmy.zip · 2 months ago

Bet, I’ll try that when I get home tonight. If I don’t have success can I message you directly ?

tal@lemmy.today · edit-2 2 months ago

ssh -L 0.0.0.0:3000:YOURPUBLICIP:3000

If you can SSH to the LLM machine, I’d probably recommend ssh -L127.0.0.1:11434:127.0.0.1:11434 <remote hostname>. If for some reason you don’t have or inadvertently bring down a firewall on your portable device, you don’t want to be punching a tunnel from whatever can talk to your portable device to the LLM machine.

(Using 11434 instead of 3000, as it looks like that’s ollama’s port.)

EDIT: OP, it’s going to be hard to give a reliable step-by-step, because I have no idea what your network looks like. So, for example, it’s possible to have your wireless access point set up so that devices can’t talk to each other at all. You might have some kind of firewall on your LLM machine, so that if they can talk to each other from the WAP’s standpoint, the firewall will block traffic from your phone; you’d need to punch a hole in that. At least something (sshd for the example here, or ollama itself to the network) needs to be listening on a routable address. As DrDystopia points out, we don’t even know what OS the LLM machine is running (Linux?) so giving any kind of step-by-step is going to be hard there.

I have had absolutely no luck.

Problem is, that doesn’t say much. Like, doesn’t say what you’ve seen.

Do you know what the LAN IP address of your LLM machine is? Can you ping that IP address from Termux on your phone when both are on the same WiFi network ($ ping <ip-address>?) What OS is the LLM machine? If Linux, do you have sshd installed? It sounds like you do have ollama on it and that it’s working if you use it from the LLM machine? When you said that it didn’t work, what did you try and what errors or behavior did you see?

DrDystopia@lemy.lol · 2 months ago

3000 is the OpenWebUI port, never got it to work by using either 127.0.0.1 or localhost, only 0.0.0.0. Ollama’s port 11434 on 127.x worked fine though.

you don’t want to be punching a tunnel from whatever can talk to your portable device to the LLM machine.

Fair point.

MTK@lemmy.world · 2 months ago

Ollama + open webui + tailscale/netbird

Open webui provides a fully functional docker with ollama, so just find the section that applies to you (amd, nvidia, etc) https://github.com/open-webui/open-webui?tab=readme-ov-file#quick-start-with-docker-

And on that host install netbird or Tailscale, install the same on your phone, in tailscale you need to enable magicdns but in netbird I think it provides dns by default.

Once the docker is running and both your server and phone are connected to the vpn (netbird or tailscale) you just type the dns of your server in your phone’s browser (in netbird it would be “yourserver.netbird.cloud” and in tailscale it would be “yourserver.yourtsnet.ts.net”)

Checkout networkchuck on youtube as he has a lot of simple tutorials.

BlackSnack@lemmy.zip · 2 months ago

Bet. I believe what you mentioned is best for accessing my LLM no matter where I am in the world, correct? If so I will try this one after I try what the other person suggested.

Thank you!

fubarx@lemmy.world · 2 months ago

Sounds like the issue is getting to the server, not the LLM server itself. If so, may want to look into running a reverse proxy, or if you want to access it remotely, tunnels: https://github.com/anderspitman/awesome-tunneling

hedgehog@ttrpg.network · 2 months ago

What OS is your server running? Do you have an Android phone or an iPhone?

In either case all you likely need to do is expose the port and then access your server by IP on that port with an appropriate client.

In Ollama you can expose the port to your local network by changing the bind address from 127.0.0.1 to 0.0.0.0

Regarding clients: on iOS you can use Enchanted or Apollo to connect to Ollama.

On Android there are likely comparable apps.

BlackSnack@lemmy.zip · 2 months ago

Sever is my rig which is running windows. Phone is iPhone.

Exposing the port is something I’ve tried to do in the past with no success! When you say, change the bind address, do I do that in the windows defender firewall in the inbound rules section?

hedgehog@ttrpg.network · 2 months ago

I believe you just need to set the env var OLLAMA_HOST to 0.0.0.0:11434 and then restart Ollama.

Bane_Killgrind@lemmy.dbzer0.com · edit-2 2 months ago

Env var means “environment variable”, which is information that’s available to all programs you run.

In Linux these are used to set path info for your package manager, shell preferences, a bunch of stuff.

In Windows it’s the same. You need to look up how to set env vars in Windows

https://github.com/ollama/ollama/issues/703

@BlackSnack@lemmy.zip

hedgehog@ttrpg.network · 2 months ago

I believe you set env vars on Windows through System Properties -> Advanced -> Environment Variables.

mierdabird@lemmy.dbzer0.com · 2 months ago

When on your wifi, try navigating in your browser to your windows computer’s address with a colon and the port 11434 at the end. Would look something like this:

http://192.168.xx.xx:11434/

If it works your browser will just load the text: Ollama is running

From there you just need to figure out how you want to interact with it. I personally pair it with OpenWebUI for the web interface

brucethemoose@lemmy.world · edit-2 2 months ago

At risk of getting more technical, ik_llama.cpp has a good built in webui:

https://github.com/ikawrakow/ik_llama.cpp/

Getting more technical, its also way better than ollama. You can run models way smarter than ollama can on the same hardware.

For reference, I’m running GLM-4 (667 GB of raw weights) on a single RTX 3090/Ryzen gaming rig, at reading speed, with pretty low quantization distortion.

And if you want a ‘look this up on the internet for me’ assistant (which you need for them to be truly useful), you need another docker project as well.

…That’s just how LLM self hosting is now. It’s simply too hardware intense and ad hoc to be easy and smart and cheap. You can indeed host a small ‘default’ LLM without much tinkering, but its going to be pretty dumb, and pretty slow on ollama defaults.

tal@lemmy.today · edit-2 2 months ago

Ollama does have some features that make it easier to use for a first-time user, including:

Calculating automatically how many layers can fit in VRAM and loading that many layers and splitting between main memory/CPU and VRAM/GPU. llama.cpp can’t do that automatically yet.
Automatically unloading the model from VRAM after a period of inactivity.

I had an easier time setting up ollama than other stuff, and OP does apparently already have it set up.

brucethemoose@lemmy.world · edit-2 2 months ago

Yeah. But it also messes stuff up from the llama.cpp baseline, and hides or doesn’t support some features/optimizations, and definitely doesn’t support the more efficient iq_k quants of ik_llama.cpp and its specialzied MoE offloading.

And that’s not even getting into the various controversies around ollama (like broken GGUFs or indications they’re going closed source in some form).

…It just depends on how much performance you want to squeeze out, and how much time you want to spend on the endeavor. Small LLMs are kinda marginal though, so IMO its important if you really want to try; otherwise one is probably better off spending a few bucks on an API that doesn’t log requests.

BlackSnack@lemmy.zip · 2 months ago

Bet. Looking into that now. Thanks!

I believe I have 11g of vram, so I should be good to run decent models from what I’ve been told by the other AIs.

brucethemoose@lemmy.world · edit-2 2 months ago

In case I miss your reply, assuming a 3080 + 64 GB of RAM, you want the IQ4_KSS (or IQ3_KS, for more RAM for tabs and stuff) version of this:

https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF

Part of it will run on your GPU, part will live in system RAM, but ik_llama.cpp does the quantizations split and GPU offloading in a particularly efficient way for these kind of ‘MoE’ models. Follow the instructions on that page.

If you ‘only’ have 32GB RAM or less, that’s tricker, and the next question is what kind of speeds do you want. But it’s probably best to wait a few days and see how Qwen3 80B looks when it comes out. Or just go with the IQ4_K version of this: https://huggingface.co/ubergarm/Qwen3-30B-A3B-Thinking-2507-GGUF

And you don’t strickly need the hyper optimization of ik_llama.cpp for a small model like Qwen3 30B. Something easier like lm studio or the llama.cpp docker image would be fine.

Alternatively, you could try to squeeze Gemma 27B into that 11GB VRAM, but it would be tight.

brucethemoose@lemmy.world · edit-2 2 months ago

How much system RAM, and what kind? DDR5?

ik doesn’t have great documentation, so it’d be a lot easier for me to just point you places, heh.

Gladaed@feddit.org · 2 months ago

I recommend technotim’s docker AI stack.