this post was submitted on 29 Dec 2025

1 points (100.0% liked)

ShareGPT

89 readers

1 users here now

To share LLM text output that others might find interesting.

founded 2 years ago

MODERATORS

PumpkinDrama@reddthat.com

How to Track and Prevent Duplicate Downloads on Linux (reddthat.com)

submitted 2 weeks ago by PumpkinDrama@reddthat.com to c/sharegpt@reddthat.com

0 comments fedilink hide all child comments

You can solve this problem on Linux, but the best approach isn’t a “placeholder file” trick — it’s using a consistent way of tracking what you’ve already downloaded, and then comparing new downloads against that record before accepting them.

Below are practical solutions with increasing levels of automation.

1) Track Files by Hash (Best General Solution)

Instead of tracking filenames (which change when you move files), track file content using a cryptographic hash (MD5/SHA1/SHA256). You build a database of hashes for everything you’ve downloaded; before accepting a new download, compare its hash to the database.

How it works

Every time you add a downloaded file to your archive, compute a hash:
```
sha256sum /path/to/song.mp3 >> ~/download_hashes.txt
```
When a new file arrives, compute its hash:
```
sha256sum newfile.mp3
```

Check if the hash exists in your index:

grep -F "$(sha256sum newfile.mp3 | cut -d' ' -f1)" ~/download_hashes.txt

If there's a match, skip/ignore it.

Automating this

You can write a simple Bash script to do the check before you move a file:

#!/bin/bash

HASHFILE=~/download_hashes.txt

for f in "$@"; do
    h=$(sha256sum "$f" | cut -d ' ' -f 1)
    if grep -qx "$h" "$HASHFILE"; then
        echo "Already downloaded: $f"
    else
        echo "$h" >> "$HASHFILE"
        mv "$f" ~/Music/Library/
        echo "Added: $f"
    fi
done

Usage:

download-check file1.mp3 file2.mp3 …

2) Use a File Indexing Tool

If you prefer a tool-based approach, use a filesystem indexer like recoll, Tracker, or Beagle:

These tools index metadata and file contents.
You can query the index for song titles/artist names.
You can script queries against the index to see if a new download already exists.

Example with grep on an indexed database isn’t as reliable as hashes, but it helps.

3) Soulseek-Specific Strategies

Soulseek clients don’t typically track moved files automatically because they only see the download directory.

Workarounds:

A) Keep your archive and download directory separate

Configure Soulseek to download into a dedicated “incoming” folder.
After moving to your archive, immediately run your hash script.
Soulseek won’t know where the file went, but your system does.

B) Use symlinks/placeholder files

In theory you could leave small “marker” files where the original download was:

song.mp3     -> real file in /Music/Library/song.mp3

But this is fragile:

You must keep the filename the same.
Soulseek may overwrite or delete the placeholder.
You’d need Soulseek settings to never replace existing files.

In practice, the hash-based index approach is safer.

4) Automatic Deduplication Tools

If you want periodic cleanup rather than manual checking:

fdupes / rdfind / duff

These tools scan directories and identify duplicates by content.

Example:

fdupes -r /Music/Library

Advanced script:

fdupes -r ~/Downloads ~/Music/Library

to find duplicates across download and library.

5) A Full Workflow Example

Soulseek downloads to /home/user/Downloads/soulseek-incoming
Run a script:
- Compute hashes for new files
- Check against ~/download_hashes.txt
- Move unique files to /home/user/Music/Library
- Remove duplicates automatically
Optionally index your library with Tracker/recoll to support search.

Summary of Options

Method	Tracks Moves?	Automatable	Soulseek-aware
Placeholder files	No (fragile)	Low	Limited
Hash database	Yes	High	Independent
Indexing tool	Yes	Moderate	Independent
Deduplication scan	Yes (after the fact)	High	Independent

Recommendation

Use a hash index. It reliably identifies duplicates regardless of filename changes or directory moves, and it can be fully automated with scripts.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here