Hello, since it’s complicated to index DHTs, I figured it’d be more efficient to build an index of fingerprints from real data once.
So I’ve been collecting releases hashes for this index. It can be used for various purposes:
- check the integrity of your own files (bit rot is a real thing)
- identify BTv2 torrent files that contain specific files (a database of torrent files is required)
- locate alive IPFS swarms to join more easily (no need to read all your data multiple times to recompute various CIDs yourself)
The collection contains around 1K releases and weights 40MB. I’ve prioritized scene Bluray rips of movies (1080p / 2160p). No infohash will be included, as these are not reproducible enough.
I’m using a basic script to add a new release (required it to be named after the official release name). I’m using other ones to discover scene releases in a filesystem; retrieve official release names from files using the srrdb api (crc32 search); collect torrents from Prowlarr and H&R them (but I’d rather crowd-source directly from the community!).
The index is stored on git to allow collaboration. It is hosted using Radicale to avoid centralization and reduce hosting pressures.
If you are interested, join and add your own hashes to the collection in Radicle patches! (see instructions in the README)
Let me know what you think, suggest improvements or discuss similar projects you know about!
This is definitely up my alley, I gave up on keeping all my media in my torrent client indefinitely for seeding because of the performance, so I’ve long dreamed of making some way to reconnect loose files back to torrents so I can seed them.
Seems I could maybe build something on top of this? I tried running magnetico for a while (going so far as to add postgres support to help it scale) but it quickly grows far larger than I want to manage.
My next idea is to make a file scanner that maintains a list of file paths and several common hashes, then do a dht crawl and only save stuff that matches. Then I can hopefully automatically add and remove torrents to a client that has read-only access to the files as needed (remove if plenty of seeders, keep for a while if no or low seeders and rotate through prioritizing stuff that needs seeds)
I’m wondering if there’s some useful overlap between what you’re doing and my goals but I think I need to dig into it more.
Hi, yes it definitely sounds similar for the media files database side. Using a DHT crawler, you can identify new torrents matching specific file tree roots (so only works for bittorrent v2, which is not used so much for now), and update swarms statistics (S/L).
Sweet your using radicle



