• 1 Post
  • 91 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle




  • Yeah, just paying for LLM APIs is dirt cheap, and they (supposedly) don’t scrape data. Again I’d recommend Openrouter and Cerebras! And you get your pick of models to try from them.

    Even a framework 16 is not good for LLMs TBH. The Framework desktop is (as it uses a special AMD chip), but it’s very expensive. Honestly the whole hardware market is so screwed up, hence most ‘local LLM enthusiasts’ buy a used RTX 3090 and stick them in desktops or servers, as no one wants to produce something affordable apparently :/






  • I don’t understand.

    Ollama is not actually docker, right? It’s running the same llama.cpp engine, it’s just embedded inside the wrapper app, not containerized. It has a docker preset you can use, yeah.

    And basically every LLM project ships a docker container. I know for a fact llama.cpp, TabbyAPI, Aphrodite, Lemonade, vllm and sglang do. It’s basically standard. There’s all sorts of wrappers around them too.

    You are 100% right about security though, in fact there’s a huge concern with compromised Python packages. This one almost got me: https://pytorch.org/blog/compromised-nightly-dependency/

    This is actually a huge advantage for llama.cpp, as it’s free of python and external dependencies by design. This is very unlike ComfyUI which pulls in a gazillian external repos. Theoretically the main llama.cpp git could be compromised, but it’s a single, very well monitored point of failure there, and literally every “outside” architecture and feature is implemented from scratch, making it harder to sneak stuff in.


  • OK.

    Then LM Studio. With Qwen3 30B IQ4_XS, low temperature MinP sampling.

    That’s what I’m trying to say though, there is no one click solution, that’s kind of a lie. LLMs work a bajillion times better with just a little personal configuration. They are not magic boxes, they are specialized tools.

    Random example: on a Mac? Grab an MLX distillation, it’ll be way faster and better.

    Nvidia gaming PC? TabbyAPI with an exl3. Small GPU laptop? ik_llama.cpp APU? Lemonade. Raspberry Pi? That’s important to know!

    What do you ask it to do? Set timers? Look at pictures? Cooking recipes? Search the web? Look at documents? Do you need stuff faster or accurate?

    This is one reason why ollama is so suboptimal, with the other being just bad defaults (Q4_0 quants, 2048 context, no imatrix or anything outside GGUF, bad sampling last I checked, chat template errors, bugs with certain models, I can go on). A lot of people just try “ollama run” I guess, then assume local LLMs are bad when it doesn’t work right.



  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldI've just created c/Ollama!
    link
    fedilink
    English
    arrow-up
    57
    arrow-down
    1
    ·
    edit-2
    7 days ago

    TBH you should fold this into localllama? Or open source AI?

    I have very mixed (mostly bad) feelings on ollama. In a nutshell, they’re kinda Twitter attention grabbers that give zero credit/contribution to the underlying framework (llama.cpp). And that’s just the tip of the iceberg, they’ve made lots of controversial moves, and it seems like they’re headed for commercial enshittification.

    They’re… slimy.

    They like to pretend they’re the only way to run local LLMs and blot out any other discussion, which is why I feel kinda bad about a dedicated ollama community.

    It’s also a highly suboptimal way for most people to run LLMs, especially if you’re willing to tweak.

    I would always recommend Kobold.cpp, tabbyAPI, ik_llama.cpp, Aphrodite, LM Studio, the llama.cpp server, sglang, the AMD lemonade server, any number of backends over them. Literally anything but ollama.


    …TL;DR I don’t the the idea of focusing on ollama at the expense of other backends. Running LLMs locally should be the community, not ollama specifically.


  • brucethemoose@lemmy.worldtoLinux@lemmy.mlAre my DVD/VOB files broken?
    link
    fedilink
    arrow-up
    14
    arrow-down
    2
    ·
    edit-2
    1 month ago

    You need software (like MakeMKV) to read the metadata from the DVD and properly chop up or combine the video files. It should be able to export without any re-encoding.

    On a separate note, if you want to shrink the files, I’d recommend av1an if you are comfortable with a little CLI and want the best possible encoding efficiency. In a nutshell it chunks videos and encodes them in parallel, hence its great for really long files like movies/TV on DVDs.




  • Good practice is putting anything important on an encrypted USB drive (as that stuff usually isn’t very big), and just treating the machine as “kinda insecure”

    If you set up a BIOS password, someone at least needs to unscrew your computer to get stuff. But this is generally not setup because people, well, forget their passwords…


  • A problem is volunteers and critical mass.

    Open source “hacks” need a big pool of people who want something to seed a few brilliant souls to develop it in their free time. It has to be at least proportional to the problem.

    This kinda makes sense for robot vacuums: a lot of people have them, and the cloud service is annoying, simpler, and not life critical.

    Teslas are a whole different deal. They are very expensive, and fewer people own them. Replicating even part of the cloud API calls is a completely different scope. The pool of Tesla owners willing to dedicate their time to that is just… smaller.

    Also, I think buying a Tesla, for many, was a vote of implicit trust in the company and its software. It’s harder for someone cynical of its cloud dependence to end up with an entire luxury automobile.


  • 1548 RPM should be slow for a small GPU fan, no? My Nvidia 3090 behaves exactly the same, switching the fan on and off as it hovers around 60C or so.

    Looks like it’s working fine to me.

    Also, take linux GPU monitors with a grain of salt. It’s possible the GPU fan RPM measurement is totally borked, and it basically represents “on” or “off.” Check it with your eyes and ears instead, see if the fan is screaming or not. It shouldn’t be below 76C (as modern GPUs are configured to operate above 80C or so).


  • I mean, you are right, and way more people should be using openSUSE :P

    I will say Arch-derived distros are a good experience if you want to learn how the terminal and other systems work. They’re engineered to be configurable; the documentation is great. But if you just want to use your computer without opening too many hoods, it’s fundamentally not an optimal system.

    Another thing is that many people just want their new laptop to work, and for it to game on linux. Sometimes it does not just work. If you start pulling in fixes and packages that are not supported on your distro, you can screw up any distro very quickly (and this includes the AUR, unofficial Fedora repos and such). If the community packages these, stages them, tests them against all official packages, and they work out-of-the-box… that’s one less hazard.