• 1 Post
  • 73 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle
  • 1548 RPM should be slow for a small GPU fan, no? My Nvidia 3090 behaves exactly the same, switching the fan on and off as it hovers around 60C or so.

    Looks like it’s working fine to me.

    Also, take linux GPU monitors with a grain of salt. It’s possible the GPU fan RPM measurement is totally borked, and it basically represents “on” or “off.” Check it with your eyes and ears instead, see if the fan is screaming or not. It shouldn’t be below 76C (as modern GPUs are configured to operate above 80C or so).


  • I mean, you are right, and way more people should be using openSUSE :P

    I will say Arch-derived distros are a good experience if you want to learn how the terminal and other systems work. They’re engineered to be configurable; the documentation is great. But if you just want to use your computer without opening too many hoods, it’s fundamentally not an optimal system.

    Another thing is that many people just want their new laptop to work, and for it to game on linux. Sometimes it does not just work. If you start pulling in fixes and packages that are not supported on your distro, you can screw up any distro very quickly (and this includes the AUR, unofficial Fedora repos and such). If the community packages these, stages them, tests them against all official packages, and they work out-of-the-box… that’s one less hazard.





  • Actually I had this one!

    Something about their swap config makes it very fragile unless you use RAM swap as enabled by default, and I kept having this when I disabled it for reasons. It was much better once I re enabled it, though occasionally I still have severe issues going way, way, over my RAM pool.

    I don’t mention that much because swapping to like 64GB on a 32GB system seems like an uncommon use case.


  • brucethemoose@lemmy.worldtoLinux@lemmy.mlCachyOs vs PopOs vs others?
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    24 days ago

    I see people saying CachyOS is finicky, but I’ve had almost no issues in two years of extensive use.

    And anything that pops up gets fixed extremely quickly.

    What’s better, everything you need for gaming is in the repos by default and pre-tweaked, no need to fuss with it like other distros. This is my nitpick with Fedora or Arch AUR: once you go outside the curated, officially supported packages, you are asking for trouble.


  • Honestly, most LLMs suck at the full 128K. Look up benchmarks like RULER.

    In my personal tests over API, LLama 70B is bad out there. Qwen (and any fine tune based on Qwen Instruct, with maybe an exception or two) not only sucks, but is impractical past 32K once its internal rope scaling kicks in. Even GPT-4 is bad out there, with Gemini and some other very large models being the only usable ones I found.

    So, ask yourself… Do you really need 128K? Because 32K-64K is a boatload of code with modern tokenizers, and that is perfectly doable on a single 24G GPU like a 3090 or 7900 XTX, and that’s where models actually perform well.


  • Late to this post, but shoot for and AMD Strix Halo or Nvidia Digits mini PC.

    Prompt processing is just too slow on Apple, and the Nvidia/AMD backends are so much faster with long context.

    Otherwise, your only sane option for 128K context in a server with a bunch of big GPUs.

    Also… what model are you trying to use? You can fit Qwen coder 32B with like 70K context on a single 3090, but honestly its not good above 32K tokens anyway.


  • Unfortunately Nvidia is, by fair, the best choice for local LLM coder hosting, and there are basically two tiers:

    • Buy a used 3090, limit the clocks to like 1400 Mhz, and then host Qwen 2.5 coder 32B.

    • Buy a used 3060, host Arcee Medius 14B.

    Both these will expose an OpenAI endpoint.

    Run tabbyAPI instead of ollama, as it’s far faster and more vram efficient.

    You can use AMD, but the setup is more involved. The kernel has to be compatible with the rocm package, and you need a 7000 card and some extra hoops for TabbyAPI compatibility.

    Aside from that, an Arc B570 is not a terrible option for 14B coder models.











  • brucethemoose@lemmy.worldtoSelfhosted@lemmy.worldCan't relate at all.
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    2 months ago

    No, all the weights, all the “data” essentially has to be in RAM. If you “talk to” a LLM on your GPU, it is not making any calls to the internet, but making a pass through all the weights every time a word is generated.

    There are system to augment the prompt with external data (RAG is one word for this), but fundamentally the system is closed.