Are there any self hosted news aggregators?

wise_pancake@lemmy.ca · 4 months ago

That is cool! I’ve been wanting I’ve wanted to use a model like this but haven’t really looked.

Are you self hosting the long context llm, of do what are you using?

Context lengths are what kill a lot of my local llm experiments.

wise_pancake@lemmy.ca · edit-2 4 months ago

If possible convert those files to compressed parquet, and apply sorting and partitioning to them.

I’ve gotten 10-100gb csv files down to 300mb-5gb sizes just by doing that

That makes searching and scanning so much faster, and you can do this all with open source free software like polars and ibis.

wise_pancake@lemmy.ca · 4 months ago

Definitely agree with you

From my experience most companies enshitify before the IPO to juice the metrics and boost their valuations (I.e. their payout).

The fact that they aren’t doing that yet that is a positive sign.

But founders aren’t immune to suffering from billionaire brain rot and years of exposure to the constant sycophancy and wealth seems to turn nearly everyone into a greed driven money soulless vampire.

wise_pancake@lemmy.ca · 4 months ago

I hate that every product has added “for AI” to their name and homepage.

I guess the investor are asking for it, but as a customer I can’t tell what products actually work for my use case anymore.

wise_pancake@lemmy.ca · 4 months ago

Visually this is gorgeous

Really nice work!

wise_pancake@lemmy.ca · 4 months ago

This looks great.

Lord knows I have to make enough mermaid renderings of these UML diagrams.

Will be giving it a try

wise_pancake@lemmy.ca · 4 months ago

Passwords are one I happily pay for someone else to worry about

That’s about my most valuable digital data

wise_pancake@lemmy.ca · 4 months ago

No, not I don’t self host my email which is where a lot of the trouble comes from

I don’t remember having any issues with it ever. That was a concern so I did slowly transition to the custom domain

wise_pancake@lemmy.ca · 4 months ago

That’s pretty much my setup, it is not super hard to get working, it’s basically just copying and pasting the magic numbers they give you

wise_pancake@lemmy.ca · 4 months ago

Three R’s to deGoogling: Reduce, replace, remove.

I would say start by changing your browser and search engine (lots of options out there today), and then set up your own domain for email hosting so you can try different providers.

There isn’t another YouTube with all that content out there, so that one is tough, but you don’t have to 100% de google, 50% is still good. 15% is still good.

wise_pancake@lemmy.ca · 5 months ago

Weirdly I’m polite to all LLMs, but Gemini sets me off and I end up yelling at it.

wise_pancake@lemmy.ca · 5 months ago

Open webui lets you install a ton of different search providers out of the box, but you do need sn API key for most and I haven’t vetted them

I’m trying to get Kagi to work with Phi4 and not having success.

wise_pancake@lemmy.ca · 5 months ago

I’m new/planning to get more into self hosting

I have a crappy NAS in the basement I archive to and copy my borg repos to.

Then I pay for a Dropbox style cloud service and I copy my borg archives there. It’s kind of janky but it’s cheap and works.

wise_pancake@lemmy.ca · 5 months ago

https://garagehq.deuxfleurs.fr/

wise_pancake@lemmy.ca · 5 months ago

If you set up something like Garage with borg with a bunch of other people you could create a network where you essentially swap hard drive space to ensure you’re all backed up.

But I think Garage assumes very high trust with your fellow hosts, so this doesn’t scale beyond direct social connections.

wise_pancake@lemmy.ca · 5 months ago

The object storage (S3-compatible) platform MinIO created a bit of a stir this week with a PR that removes a ton of functionality from the interface of its community edition. When questioned, users were directed to the enterprise version of the platform. In unsurprising open-source fashion, a fork has already been created by the community while others have started migrating to existing alternatives like Garage.

How are people finding Garage? Does anyone have a good comparison vs seaweedFS?

wise_pancake@lemmy.ca · 5 months ago

I don’t hate AI, I hate how it was created, how it’s foisted on us, the promises it can do things it really can’t, and the corporate governance of it.

But I acknowledge these tools exist, and I do use them because they genuinely help and I can’t undo all the stuff I hate about them.

If I had millions of dollars to spend, sure I would try and improve things, but I don’t.

wise_pancake@lemmy.ca · 7 months ago

There are various levels of AI here

Storing embeddings/vectors in a search index can make your searches smarter and more relevant. The embeddings squeeze related concepts closer together than pure keyword approaches, which if done well increases retrieval quality.

RAG tools and AI searches are just a layer on top of your index. When done well these can be really useful in annotating your results and speeding up finding things.

That’s useful when you’re searching say an error message and the AI is able to iterate on keywords and skim a Guthub issue about it and skip to the resolution.

Similarly it’s good when you’re researching something but don’t have the exact words, AI search can iterate and capture your intent, then run several queries based on that.

I don’t find the hallucination problem significant in practice with a lot of AI search tools, but I have found AI is vulnerable to certain types of SEO spam that a human would never fall for.

As an example most companies have a “comparison to” or “alternatives to” blogpost. The AI does not critically look at the fact that a service is hosting a blogpost shilling their own product. So asking search AI for options is actually poor quality because it will return the shilled results that appear in search first.

AI also search adds an additional silent layer of filtering, which you need to be conscious of.

wise_pancake@lemmy.ca · 7 months ago

Potentially that would be a good application of federation and distributed computing

An Internet archive like distributed tool, that then feeds into local tokenization and indexing.

Alternatively a centralized service that generates indices and then locally they are queried would save a lot of energy.

wise_pancake@lemmy.ca · 8 months ago

TimyRSS looks interesting, noticed they have extension support (freshRSS seems to too) so marine that’s the best route to what I want

wise_pancake@lemmy.ca · 8 months ago

Are there any self hosted news aggregators?