PumpkinDrama

joined 2 years ago
MODERATOR OF
 

According to Education Week’s tracker, there were 17 school shootings in 2025 that resulted in injuries or deaths — the lowest number since 2020.

Different databases that count any gunfire on school property report much higher totals (100–230+ incidents) depending on definitions used, showing how statistics can vary widely based on criteria.

School shooting counts are therefore highly dependent on how a shooting is defined, but serious incidents with injuries or deaths were significantly fewer in 2025 compared with recent years.

[–] PumpkinDrama@reddthat.com 3 points 7 hours ago

Yeah, I think you are right, the whole internet is going to end up behind a login if bot acitivy keeps increasing.

[–] PumpkinDrama@reddthat.com 3 points 8 hours ago* (last edited 8 hours ago) (1 children)

Most of the forums I know are completely dead, killed by Reddit or Discord. But I know of some programming forums that are still active, they are dedicated to specific programming languages like Python or Rust. That's what I mean by thematic. You can discuss a wide range of topics, but the primary focus is on a specific subject. These are the ones I've used:

 

Now there's NodeBB. Although it doesn't yet have an Android app, I'm excited about the potential return of old-school forum software that it represents.

[–] PumpkinDrama@reddthat.com 1 points 1 day ago* (last edited 1 day ago) (10 children)

You are perfectly demonstrating how authoritarian habits survive by being trivialized at the stage where they’re still “just a timeout,” and if you genuinely believe people wouldn’t kill each other over disagreement the moment consequences disappear, then you must have an extraordinarily optimistic view of human nature.

 

I've read that they enforce child modes to promote mental health and reduce stress. I want to see how the content differs from Western sites.

 

I’m trying to find a way to archive all my saved Lemmy posts from multiple instances in one place, and be able to search through them by tags, title, body text, date, etc.

Does anyone know of a free online service or a local Linux tool that can?

  • fetch and store saved posts from all instances I’m active on,
  • let me query/search effectively (tags, title, content, date, etc.).

Thanks in advance!

 

I just got banned from SpaceBattles for an introduction post. Not for doxxing, not for threats, not for spam—just for expressing opinions and interests that didn’t fit the approved worldview. Apparently that’s enough now.

What’s wild is that this isn’t an isolated case. I looked back at an older thread out of curiosity, and roughly 80% of the people who were active when it started are now banned. Read that again. That’s not “keeping the peace.” That’s systematic purging. When most of the original voices disappear, you’re not moderating a community—you’re curating an echo chamber, enforcing ideological cleansing. Strip away the forum UI and polite language, and what you’re left with is authoritarian repression: a system where dissenting voices are removed until only the sanctioned narrative remains.

In the real world, this kind of logic doesn’t just stop at bans and deletions—it ends with people being permanently silenced. History is full of examples. Look at Jeju Island, where American-backed South Korean forces massacred huge portions of the population for protesting, all under the excuse of “restoring order.” That’s where this mindset leads when it’s given real power.

SpaceBattles used to market itself as a place for creative freedom. Now it feels like a NATO-flavored circlejerk where deviation from consensus is treated as a personal offense. There’s an orthodoxy, and if you don’t kneel to it, you’re out. No warning, no real engagement—just deletion and exile.

The irony is hard to miss. People can openly believe in a supernatural entity watching everyone’s thoughts and actions, and that’s treated as normal, untouchable, beyond critique. But have an opinion that challenges mainstream political narratives, media framing, or power structures? Suddenly it’s an inquisition. Labels come out, motives are assigned, and the ban hammer drops.

We’re constantly lectured about “authoritarianism” abroad, yet this kind of censorship is defended as virtuous because it’s done by the “right” people. At least some countries are honest about controlling speech. Here, it’s wrapped in the language of safety and community standards while dissenting voices quietly vanish. Call me cynical, but when a forum erases most of its long-term users for wrongthink, that’s not a healthy community—it’s ideological hygiene.

What really gets me is the smugness. The absolute certainty that silencing is the same as being right. That deleting posts is the same as winning arguments. These are the modern book burners: not torching paper, but scrubbing perspectives, rewriting history, and pretending the absence of disagreement proves consensus.

If your ideas are so fragile they can’t survive contact with uncomfortable opinions, maybe they’re not as solid as you think. And if a forum can’t tolerate a blunt introduction post, it’s probably not about “rules”—it’s about control.

 

Here are practical ways you can discover similar GitHub repositories based on the repositories you’ve starred — both using built-in GitHub features and external tools/services:


🚀 1. Use GitHub Explore & Personal Recommendations

🔹 GitHub Explore

GitHub’s Explore page gives you trending and suggested repositories based on stars and popular content. It’s not personalized to only your starred list, but GitHub uses overall star signals to highlight related projects.

🔹 Your GitHub Stars page

Your “⭐ Your stars” view shows repositories you’ve starred, and GitHub sometimes shows related content (e.g., trending or popular repos in similar topics) in your feed based on what you’ve starred.


🧠 2. Tools That Generate Similar Repo Recommendations

🧩 SimRepo — Similar repositories in GitHub sidebar

SimRepo is a browser extension that automatically displays similar repositories in the GitHub sidebar whenever you view a repository page. It uses machine learning and star-based embeddings to find related projects.

Works as an extension for Chrome/Chromium-based browsers (and planned on other browsers)

Shows similar repos for popular projects (e.g., >150 stars)

Provides suggestions automatically in the UI

Good for: discovering related projects while browsing repos.


🔍 GitRec — Recommendations based on your stars

GitRec is another Chrome extension / tool that analyzes repositories you’ve starred and then suggests similar ones. It uses a recommendation engine (often Gorse) to calculate relevance from your activity.

Adds recommendations directly into GitHub browsing

Suggests repos you may like based on starred history

Compatible with GitHub.com

Good for: seamless discovery without leaving GitHub.


🛠 3. CLI & Search Tools

📦 github-stars-search

A command-line tool that lets you semantically search within your starred repositories and can help you cluster or search for similar ones based on README content and embeddings.

You fetch all starred repos

The tool generates vector representations of README/text and lets you search semantically

Useful if you want offline or scriptable similarity querying

🐍 starred-repo-finder

Another CLI utility that finds other repositories by analyzing stargazers of a given repo — essentially looking at users who starred a repo and what else they’ve starred.

Helps discover repos favored by similar users

Filter results by popularity or forks

Outputs results in JSON/CSV


📌 4. DIY Data & Scripts

📊 Build your own similarity model

If you’re into data science or custom workflows, you can:

Fetch your starred repo list via the GitHub API

Extract metadata (stars/topics/languages/README text)

Compute similarity using text embeddings, graph analysis, or collaborative filtering

Tools like StarDose (a GitHub recommender system) demonstrate this approach.

Good for: fully customized recommendations or research.


🧠 5. Practical GitHub Search Tricks

While GitHub search doesn’t directly provide similarity scoring, you can use advanced filters to find related projects manually, for example:

topic:machine-learning stars:>50 language:python

This helps you target repositories in the same domain as your starred set.


🧩 Summary: Which to Use?

Goal Best Option

Automatic similar repo suggestions while browsing SimRepo, GitRec Terminal-based search & custom queries github-stars-search, starred-repo-finder Manual discovery within GitHub Explore, advanced search Custom data-science recommendations Build your own embedding/graph model

 

Below is a Python recommendation engine example that you can improve through usage and that supports sharing anonymized training data via onion‐routed communication (e.g., using Tor onion services). Onion routing (used by Tor) ensures that the data you share with the server is routed through multiple encrypted layers so no single node knows both where it came from and where it’s going, improving privacy.


📌 System Architecture Overview

Clients

Each user runs a local Python instance.

They compute recommendation model updates (e.g., collaborative filtering gradients).

Updates are sent through a Tor onion service to the central server.

Tor Onion Service

A server instance runs a hidden Tor service that only accepts connections routed through the Tor network.

This protects both client identities and server location.

Server

Aggregates anonymized model updates.

Sends back an updated global model


🧠 Step 1 — Setup Tor Onion Service

First, set up Tor on the server and configure your hidden service in torrc:

Example torrc

HiddenServiceDir /var/lib/tor/recommendation_service/ HiddenServicePort 5000 127.0.0.1:5000

After restarting Tor, this will generate an .onion address that clients will use to send requests through the Tor network.


🔧 Step 2 — Basic Recommendation Engine (Python)

This example uses matrix factorization for collaborative filtering; replace with your preferred algorithm.

  1. Install libraries:

pip install numpy pandas scikit-learn flask requests[socks]

  1. Shared codebase:

📌 server.py

from flask import Flask, request, jsonify import numpy as np import threading

app = Flask(name)

Global model (random initialization)

n_users, n_items, n_factors = 1000, 1500, 50 user_factors = np.random.rand(n_users, n_factors) item_factors = np.random.rand(n_items, n_factors)

lock = threading.Lock()

@app.route("/update_model", methods=["POST"]) def update_model(): data = request.get_json() uf = np.array(data["user_factors"]) ifac = np.array(data["item_factors"])

with lock:
    # Simple addition aggregation (for demo)
    global user_factors, item_factors
    user_factors[: uf.shape[0]] += uf
    item_factors[: ifac.shape[0]] += ifac

return jsonify({"status": "ok"})

@app.route("/get_model", methods=["GET"]) def get_model(): return jsonify({ "user_factors": user_factors.tolist(), "item_factors": item_factors.tolist() })

if name == "main": app.run(port=5000)


📡 Step 3 — Client Code Sending Updates via Tor

client.py

This uses requests with a SOCKS proxy pointed at Tor’s local SOCKS port (typically 9050).

import numpy as np import requests

Train or load your local data

local_user_factors = np.random.rand(10, 50) local_item_factors = np.random.rand(20, 50)

onion_url = "http://.onion"

session = requests.Session() session.proxies = { "http": "socks5h://localhost:9050", "https": "socks5h://localhost:9050" }

Send updates through Tor

resp = session.post( f"{onion_url}/update_model", json={ "user_factors": local_user_factors.tolist(), "item_factors": local_item_factors.tolist() }, timeout=30, )

print("Server response:", resp.json())

Optionally fetch updated model

updated = session.get(f"{onion_url}/get_model").json() print("Updated global model fetched.")


🛡️ Key Concepts & Extensions

🔹 Onion Routing for Privacy

Onion routing wraps messages in layers of encryption through multiple relays so no single point knows both endpoints, preserving privacy. This model ensures data sharing between clients and server stays anonymous.

📈 Improving With Federated Techniques

To avoid sending raw model data and further improve privacy, integrate federated learning:

Each client trains locally.

Only model updates (not raw data) are sent to the server.

A server aggregates updates to improve a global model. Research in this area includes FedGNN, LightFR, and other privacy-preserving federated recommenders.

🧠 Privacy Enhancements

Use differential privacy on gradients before sending them.

Use secure multi-party computation (SMPC) or encryption layers.


🧪 Testing & Deployment

  1. Local tests — Run server locally and point client via Tor SOCKS.
  2. Dockerize components for reproducible deployment.
  3. Security audits — Ensure the onion service and proxy configuration don’t leak identifying info.
[–] PumpkinDrama@reddthat.com 18 points 3 days ago (2 children)

Mind elaborating?

[–] PumpkinDrama@reddthat.com 3 points 3 days ago* (last edited 3 days ago) (1 children)

As it becomes more popular, bots will pivot here, and it will be harder to block them than in centralized platforms. Even after being banned from one instance, they can move to another and continue. The Reddit-style moderation system puts too much strain in a handful of users making necessary to rely on automods. The automod on lemmy.world banned me instance‑wide for reasons unknown, and as the number of bots increases, the automods will become looser. There will be many false positives and genuine users being banned.

Related

 

I’ve noticed that on pkgstats.archlinux.de most packages’ popularity graphs seem to show a big drop recently — even popular ones. Is the site broken, has data collection changed, or is something else going on? Anyone know why this is happening?

[–] PumpkinDrama@reddthat.com -1 points 3 days ago* (last edited 3 days ago) (11 children)

Why would you post a YouTube video here with only 130 views? There are probably AI-generated music videos with millions of views, so whatever this says is likely at odds with reality.

[–] PumpkinDrama@reddthat.com 4 points 3 days ago* (last edited 3 days ago) (1 children)

I believe that because Reddit is generally left-leaning and the majority of those users are opposed to AI, we may see a disproportionate rise in AI-generated right-wing content, which could influence public opinion. And the pentagon also showed interest in using LLMs to gaslight people.

 

I believe that because Reddit is generally left-leaning and the majority of those users are opposed to AI, we may see a disproportionate rise in AI-generated right-wing content, which could influence public opinion. And the pentagon also showed interest in using LLMs to gaslight people.

[–] PumpkinDrama@reddthat.com 5 points 4 days ago

I have tried many of them out of curiosity, but this is the only one I use regularly. I still prefer Reddit, though. I think Lemmy fragments communities and has too many memes and too much U.S.-centric news. I expected a federated platform to offer more diversity of thought, but it feels like the same kind of groupthink you see on Reddit.

[–] PumpkinDrama@reddthat.com 1 points 4 days ago

Yeah it has a download to and a download option.

[–] PumpkinDrama@reddthat.com 2 points 4 days ago (2 children)

Nah I like different folders for the different media types

[–] PumpkinDrama@reddthat.com 3 points 6 days ago (1 children)

The music was just an example, what about movies, books, etc.

 

You can solve this problem on Linux, but the best approach isn’t a “placeholder file” trick — it’s using a consistent way of tracking what you’ve already downloaded, and then comparing new downloads against that record before accepting them.

Below are practical solutions with increasing levels of automation.


1) Track Files by Hash (Best General Solution)

Instead of tracking filenames (which change when you move files), track file content using a cryptographic hash (MD5/SHA1/SHA256). You build a database of hashes for everything you’ve downloaded; before accepting a new download, compare its hash to the database.

How it works

  1. Every time you add a downloaded file to your archive, compute a hash:

    sha256sum /path/to/song.mp3 >> ~/download_hashes.txt
    
  2. When a new file arrives, compute its hash:

    sha256sum newfile.mp3
    
  3. Check if the hash exists in your index:

    grep -F "$(sha256sum newfile.mp3 | cut -d' ' -f1)" ~/download_hashes.txt
    
  4. If there's a match, skip/ignore it.

Automating this

You can write a simple Bash script to do the check before you move a file:

#!/bin/bash

HASHFILE=~/download_hashes.txt

for f in "$@"; do
    h=$(sha256sum "$f" | cut -d ' ' -f 1)
    if grep -qx "$h" "$HASHFILE"; then
        echo "Already downloaded: $f"
    else
        echo "$h" >> "$HASHFILE"
        mv "$f" ~/Music/Library/
        echo "Added: $f"
    fi
done

Usage:

download-check file1.mp3 file2.mp3 …

2) Use a File Indexing Tool

If you prefer a tool-based approach, use a filesystem indexer like recoll, Tracker, or Beagle:

  • These tools index metadata and file contents.
  • You can query the index for song titles/artist names.
  • You can script queries against the index to see if a new download already exists.

Example with grep on an indexed database isn’t as reliable as hashes, but it helps.


3) Soulseek-Specific Strategies

Soulseek clients don’t typically track moved files automatically because they only see the download directory.

Workarounds:

A) Keep your archive and download directory separate

  • Configure Soulseek to download into a dedicated “incoming” folder.
  • After moving to your archive, immediately run your hash script.
  • Soulseek won’t know where the file went, but your system does.

B) Use symlinks/placeholder files

In theory you could leave small “marker” files where the original download was:

song.mp3     -> real file in /Music/Library/song.mp3

But this is fragile:

  • You must keep the filename the same.
  • Soulseek may overwrite or delete the placeholder.
  • You’d need Soulseek settings to never replace existing files.

In practice, the hash-based index approach is safer.


4) Automatic Deduplication Tools

If you want periodic cleanup rather than manual checking:

fdupes / rdfind / duff

These tools scan directories and identify duplicates by content.

Example:

fdupes -r /Music/Library

Advanced script:

fdupes -r ~/Downloads ~/Music/Library

to find duplicates across download and library.


5) A Full Workflow Example

  1. Soulseek downloads to /home/user/Downloads/soulseek-incoming

  2. Run a script:

    • Compute hashes for new files
    • Check against ~/download_hashes.txt
    • Move unique files to /home/user/Music/Library
    • Remove duplicates automatically
  3. Optionally index your library with Tracker/recoll to support search.


Summary of Options

Method Tracks Moves? Automatable Soulseek-aware
Placeholder files No (fragile) Low Limited
Hash database Yes High Independent
Indexing tool Yes Moderate Independent
Deduplication scan Yes (after the fact) High Independent

Recommendation

Use a hash index. It reliably identifies duplicates regardless of filename changes or directory moves, and it can be fully automated with scripts.

 

I’m using Linux and I’m trying to avoid accidentally downloading the same files multiple times.

For example, I use Soulseek to download music. After a song finishes downloading, I usually move it to another folder (my main music library). Later on, when I’m searching again, I don’t always remember whether I already have a particular song, and I end up re-downloading it.

Is there a good way on Linux to keep track of what I’ve already downloaded, even after files are moved to different folders, so I can avoid downloading duplicates? Ideally, I’d like something that doesn’t require manually searching my entire music library every time.

One idea I had was leaving a placeholder file behind in the original download directory and configuring Soulseek not to overwrite it, but I’m not sure if Soulseek even has that option.

What tools or workflows would you recommend for this?

[–] PumpkinDrama@reddthat.com 1 points 1 week ago* (last edited 6 days ago)

Yes and that is a pretty bad attitude, tbh. People who do that should have a low attitude score, that’s how we know it’s working. ;-)

I thought it measured votes received rather than votes given. A tooltip explaining what it means would be nice, and CSS to hide my own attitude as well.

 

I prefer platforms that hide downvotes on my content because I don’t want to see negative vote counts on my own posts for peace of mind, yet I still want to see vote totals on other people’s content. Piefed (and Lemmy) should let users hide downvotes for their own posts while keeping voting visible elsewhere.

Also, both Lemmy and Piefed lack a good outlier filter. The Top filter floods me with memes and entertaining posts, while Lemmy’s scaled feed often shows too little of actual interest. An outlier filter that detects per-community outliers would surface more relevant content without drowning feeds in content from the popular communities.

A practical implementation would maintain a rolling baseline for each community by tracking the moving average and standard deviation of post scores at a fixed time offset (for example, one hour after submission). When a new post reaches that same offset, compute its z-score by subtracting the community’s average and dividing by the standard deviation. The filter algorithm then uses this z-score as the cross-community ranking signal: instead of sorting posts by raw score, it sorts by z-score, so posts that outperform their community’s usual activity (high positive z) rise to the top regardless of absolute score. This lets the algorithm compare posts from different communities on a common scale, surfacing outliers that are unusually engaging for their community and reducing the dominance of high-traffic communities.

view more: next ›