jajalayer

joined 1 month ago
[–] jajalayer@lemmy.dbzer0.com 1 points 2 weeks ago

For now it's only scene releases, so there's no duplicate (not sure if this is what you mean).

[–] jajalayer@lemmy.dbzer0.com 1 points 3 weeks ago* (last edited 3 weeks ago) (2 children)

Hi, not sure I get your point. A release has a fixed name and resolution. Regarding genre classification, I estimated if was too complex to determine reliably. And I wanted to avoid storing metadata.

Dates folders are there to reduce directory sizes, similarly to a merkle tree with only one level. This is due to performance limitations with large directories in most filesystems and also in Git. Also it still allows for easy manual search (only the date is required).

Note: there can still be several releases for a single movie (several resolutions and sources, LIMITED/REPACK etc).

[–] jajalayer@lemmy.dbzer0.com 1 points 3 weeks ago

I don't think it would scale for many millions of files.

 

publication croisée depuis : https://lemmy.dbzer0.com/post/67080379

Hello, since it's complicated to index DHTs, I figured it'd be more efficient to build an index of fingerprints from real data once.

So I've been collecting releases hashes for this index. It can be used for various purposes:

  • check the integrity of your own files (bit rot is a real thing)
  • identify BTv2 torrent files that contain specific files (a database of torrent files is required)
  • locate alive IPFS swarms to join more easily (no need to read all your data multiple times to recompute various CIDs yourself)

The collection contains around 1K releases and weights 40MB. I've prioritized scene Bluray rips of movies (1080p / 2160p). No infohash will be included, as these are not reproducible enough.

I'm using a basic script to add a new release (filename must match the official release name). I'm using others to discover scene releases in a filesystem; retrieve official release names from files using the srrdb api (crc32 search); collect torrents from Prowlarr and H&R them (although I'd prefer crowd-source directly from the community!).

The index is stored on git to allow collaboration. It is hosted using Radicale to avoid centralization and reduce hosting pressures.

If you are interested, join and add your own hashes to the collection in Radicle patches! (see instructions in the README)

Let me know what you think, suggest improvements or discuss similar projects you know about!

 

publication croisée depuis : https://lemmy.dbzer0.com/post/67080379

Hello, since it's complicated to index DHTs, I figured it'd be more efficient to build an index of fingerprints from real data once.

So I've been collecting releases hashes for this index. It can be used for various purposes:

  • check the integrity of your own files (bit rot is a real thing)
  • identify BTv2 torrent files that contain specific files (a database of torrent files is required)
  • locate alive IPFS swarms to join more easily (no need to read all your data multiple times to recompute various CIDs yourself)

The collection contains around 1K releases and weights 40MB. I've prioritized scene Bluray rips of movies (1080p / 2160p). No infohash will be included, as these are not reproducible enough.

I'm using a basic script to add a new release (filename must match the official release name). I'm using others to discover scene releases in a filesystem; retrieve official release names from files using the srrdb api (crc32 search); collect torrents from Prowlarr and H&R them (although I'd prefer crowd-source directly from the community!).

The index is stored on git to allow collaboration. It is hosted using Radicale to avoid centralization and reduce hosting pressures.

If you are interested, join and add your own hashes to the collection in Radicle patches! (see instructions in the README)

Let me know what you think, suggest improvements or discuss similar projects you know about!

[–] jajalayer@lemmy.dbzer0.com 1 points 1 month ago

Hi, yes it definitely sounds similar for the media files database side. Using a DHT crawler, you can identify new torrents matching specific file tree roots (so only works for bittorrent v2, which is not used so much for now), and update swarms statistics (S/L).

 

Hello, since it's complicated to index DHTs, I figured it'd be more efficient to build an index of fingerprints from real data once.

So I've been collecting releases hashes for this index. It can be used for various purposes:

  • check the integrity of your own files (bit rot is a real thing)
  • identify BTv2 torrent files that contain specific files (a database of torrent files is required)
  • locate alive IPFS swarms to join more easily (no need to read all your data multiple times to recompute various CIDs yourself)

The collection contains around 1K releases and weights 40MB. I've prioritized scene Bluray rips of movies (1080p / 2160p). No infohash will be included, as these are not reproducible enough.

I'm using a basic script to add a new release (filename must match the official release name). I'm using others to discover scene releases in a filesystem; retrieve official release names from files using the srrdb api (crc32 search); collect torrents from Prowlarr and H&R them (although I'd prefer crowd-source directly from the community!).

The index is stored on git to allow collaboration. It is hosted using Radicale to avoid centralization and reduce hosting pressures.

If you are interested, join and add your own hashes to the collection in Radicle patches! (see instructions in the README)

Let me know what you think, suggest improvements or discuss similar projects you know about!