So…with all this openclaw stuff, I was wondering, what’s the FOSS status for something to run locally? Can I get my own locally run agent to which I can ask to perform simple tasks (go and find this, download that, summarize an article) or things like this? I’m just kinda curious about all of this.

Thanks!

  • fizzle@quokk.au
    link
    fedilink
    English
    arrow-up
    1
    ·
    48 minutes ago

    No one has mentioned Open Web UI, which is part of this landscape.

    Open Web UI is the chat interface you use to interact with a model. I haven’t really dug into much of the functionality beyond simple chat, but there’s thousands of community plugins for web search and similar. You can also create knowledge bases and attach them to queries. For example if I have a bunch of policy and procedure documents from my work, I can create a knowledge base and ask the LLM to create new policies in that context.

    You can configure it to work with ollama, which allows you to run LLMs from huggingface.co and similar on your own hardware.

    However, in my own case I just don’t have anything resembling a modern powerful GPU, so I don’t run ollama locally. You can use a paid account at huggingface.co and use their API to do the inference (running the models). Not all LLMs are available this way but certainly many are.

    More recently I’ve discovered that OVH, (a french bare-metal host I’ve used for years) provides an inference API for a half dozen models, and I’ve found this to be blistering fast compared to huggingface.

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    6 hours ago

    We got open-source agents like OpenCode. OpenClaw is weird, and not really recommended by any sane person, but to my knowledge it’s open source as well. We got a silly(?) “clean-room rewrite” of the Claude Agent, after that leaked…

    Regarding the models, I don’t think there’s any strictly speaking “FLOSS” models out there with modern tool-calling etc. You’d be looking at “open-weights” models, though. Where they release the weights under some permissive license. The training dataset and all the tuning remain a trade secret with pretty much all models. So there is no real FLOSS as in the 4 freedoms.

    Google dropped a set of Gemma models a few days ago and they seem pretty good. You could have a look at Qwen 3.5, or GLM, DeepSeek… There’s a plethora of open-weights models out there. The newer ones pretty much all do tool-calling and can be used for agentic tasks.

  • ThePowerOfGeek@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    10 hours ago

    I’m curious about this too. I know that on the latest version of Ollama it’s possible to install OpenClaw. But I assumed you needed to point it to a paid API (Claude, ChatGPT, Grok, etc.) for it to really work. But yeah, maybe it works with Qwen 3 or similar models?

    I guess a major factor to this is what your system resources look like, especially howmuch RAM you have. And therefore which model you are hosting locally.

  • cecilkorik@lemmy.ca
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 hours ago

    Absolutely. There are tons of open-licenced, open-weight (the equivalent of open-source for AI models) models capable of what is called “tool usage”. The key thing to understand is that they’re never quite perfect, and they don’t all “use tools” quite as effectively or in the same way as each other. This is common to LLMs and it is critical to understand that at the end of the day they are just text generators, they do not “use tools” themselves. They create specific structured text that triggers some other software, typically called a harness but could also be called a client or frontend, to call those tools on your system. Openclaw is an example of such a harness (and not a great or particularly safe one in my opinion but if you want to be a lunatic and give an AI model free reign it seems to be the best choice) You can use commercial harnesses too by configuring or tricking them into connecting to a local model instead of their commercial one, although I don’t recommend this for a variety of reasons if you really want to use claude code itself people have done it but I don’t find it works very well since all its prompts and tool calling is optimized for Claude models. Besides OpenClaw, Other popular harnesses for local models include OpenCode (as close as you’re going to get to claude for local models) or Cursor, even Ollama has their own CLI harness now. Personally I use OpenCode a lot but I am starting to lean towards pi-mono (it’s just called pi but that’s ungoogleable) it is very minimal and modular, making it intentionally easy to customize with plugins and skills you can automatically install to make it exactly as safe or capable or visual as you wish it to be.

    As a minor diversion we should also discuss what a “tool” is, in this context there are some common basic tools that some or most tool-use models will have or understand some variation of, out of the box. Things like editing files, running command-line tools, opening documents, searching the web, are common built-in skills that pretty much any model advertising itself capable of “tool use” or “tool calling” will support, although some agents will be able to use these skills more capably and effectively than others. Just like some people know the Linux commandline fluently and can completely operate their system with it, while others only know basic commands like ls or cat and need a GUI or guidance for anything more complex, AI models are similar, some (and the latest models in particular) are incredibly capable with even just their basic built-in tools. However they’re not limited by what’s built in, as like I said, they can accept guidance on what to use and how to use it. You can guide them explicitly if you happen to be fluent in their tools, but there are kind of two competing models for how to give them that guidance automatically. These are MCP (model context protocol) which is a separate server they can access that provides structured listings of different kinds of tools they can learn to use and how they work, basically allowing them to connect to a huge variety of APIs in almost any software or service. Some harnesses have an MCP built-in. The other approach is called “skills” and seems to be (to me) a more sensible and flexible approach to giving the AI model enough understanding to become more capable and expand the tools it can use. Again, providing skills is usually something handled by the harness you’re using.

    To make this a little less abstract you can put it in perspective of Claude: Anthropic provides several different Claude models like Haiku, Sonnet, and Opus. These are the text-generation models and they have been trained to produce a particular tool usage format, but Opus tends to have more built-in capability than something like Haiku for example. Regardless of which model you choose though (and you can switch at any time) you’ll be using a harness, typically “claude code” which is typically the CLI tool most people use to interact with Claude in an agentic, tool calling capacity.

    On the open and local side of the landscape, we don’t have anything quite as fast or capable as Claude code unfortunately, but we can do surprisingly okay considering we’re running small local models on consumer hardware, not massive data center farms being enticingly given away or rented for pennies on the dollar of what they’re actually costing these companies on the hopes of successful marketshare-capture and vendor-lock-in leading to future profits.

    Here are some pretty capable tool-use models I would recommend (most should be available for download through ollama and other sources like huggingface)

    • gemma4 (the latest and greatest hotness, MIT licensed using TurboQuant to deliver pretty incredible capability, performance and results even with limited VRAM)
    • qwen3.5 (from Alibaba, a consistent and traditional leader in open models so far with good capability and modest performance)
    • qwen3-coder-next (a pretty huge coding-focused model you might struggle to run unless you have a very beefy system and GPU)
    • glm4.7-flash (a modestly capable and reasonably fast option)
    • devstral-small-2 (an older, not-so-small variant of mistral, the French open-weight AI model if you’re looking for a non-Chinese, non-US based model which are few and far between)
    • PetteriPano@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      3 hours ago

      Gemma4 doesn’t Turboquant. But it is leaner on the KV cache.

      edit: looks like there are forks that do turboquant already

  • Mike Wooskey@lemmy.thewooskeys.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 hours ago

    I’m my experience, running Ollama locally works great. I do have a beefy GPU, but even on affordable consumer grade GPUs you can get good results with smaller models.

    So it technically works to run an AI agent locally, but my experience has been that coding agents don’t work well. I haven’t tried using general AI agents.

    I think the amount of VRAM affordable/available to consumers is nowhere near enough to support a context length that’s necessary for a coding agent to remain coherent. There are tools like Get Shit Done which are supposed to help with this, but I didn’t have much luck.

    So I’m using OpenCode via OpenRouter to use LLMs in the cloud. Sad that I can’t get local-only to work well enough to use for coding agents, but this arrangement works for me (for now).

  • HelloRoot@lemy.lol
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    10 hours ago

    If you are on linux, and want ai assisted stuff like you mentioned there has been this for a while: https://github.com/qwersyk/Newelle

    ( or the weeb version if you prefer: https://wiki.nyarchlinux.moe/nyarchassistant/ )

    and it can use locally run models. But have realistic expectations. If you want it to work well, you need a beefy GPU, a lot of RAM and swap. The “intelligence” is kind of limited if you run low spec models, to the point of it maybe being utterly useless.