AlteredStateBlob

joined 2 years ago

It is not enough, no. The LLM might reveal training data, showing the original text and that is a simple Google search with site:reddit.com away from identifing the user. It's trivial and thus not anonymized.

[–] AlteredStateBlob@kbin.social 1 points 1 year ago (4 children)

It doesn't matter, as long as the text is supplied as is, a simple Google search with the text and site:reddit.com will reveal the author, keeping it identifiable. True anonymization under GDPR almost does not exist, as it would destroy the dataset and make it unusable.

[–] AlteredStateBlob@kbin.social 2 points 1 year ago (2 children)

That is not quite correct. As long as it is possible to identify the user, it is personal data. True anonymization under GDPR is nearly impossible without destroying the data set.

Reddit would have to fully delete it, otherwise simply searching Google with the exact text with site:reddit.com on any comment immediately reveals who the author is.

It doesn't matter if the dataset in use allows for identification, as long as identification remains possible.

The DPAs have discretion on how they interpret the laws and what guidance they give. This is something you could only really pursue through litigation beyond what reply you're getting from your DPA. Personally, I am not trusting reddit to actually, truly delete anything. But there would need to be proof for that, beyond my suspicions.

If deleted was truly deleted, I'd say they're right on an individual case.

The issue I'm outlining is however of a different nature, so I am somewhat hopeful at least some DPA will take this issue on.

[–] AlteredStateBlob@kbin.social 22 points 1 year ago (12 children)

Every post is tied to a username and email address, making it personal information, since each poster can be identified. I'm sure they're also tracking further metrics such as IP addresses, browser fingerprints, etc. It is immaterial if we from the outside are able to identify users, it only matters if it's possible given the data available to the processor. In this case, it is. Not to mention, there is a good chance texts and posts themselves contain plenty of personal information, such as linking to other user profiles, mentioning and discussing people, etc.

DPOs in Europe don't always work with lawyers. I mainly deal with mid-sized companies and work with lawyers on the end of the larger corporations, absolutely. I was simply clarifying I am not a lawyer and don't claim to be one.

[–] AlteredStateBlob@kbin.social 17 points 1 year ago

The requests don't go to reddit, but the supervisory authorities. They can try and ignore those requests, but since they have offices in the EU, those can and will be slapped around - if any DPA takes action, that is.

Awesome, thank you!

[–] AlteredStateBlob@kbin.social 5 points 1 year ago (8 children)

Nope, your username and email are required and linked to your data, so it's entirely personal information. True anonymization is impossible with open text fields, as it's always possible that people reference other users within their posts, etc.

Of course, what the DPAs do with it, is another matter. Doesn't hurt to try.

[–] AlteredStateBlob@kbin.social 9 points 1 year ago (2 children)

I'm not a lawyer, but a data protection officer with certification in Germany.

[–] AlteredStateBlob@kbin.social 76 points 1 year ago (1 children)

I've made a write up for you to follow along and reference: https://kbin.social/m/reddit@lemmy.world/t/854162/Any-EU-based-users-of-reddit-should-immediately-file-a

tl;dr instructions towards the end.

[–] AlteredStateBlob@kbin.social 30 points 1 year ago (1 children)

I posted an extensive write up over here: https://kbin.social/m/reddit@lemmy.world/t/854162/Any-EU-based-users-of-reddit-should-immediately-file-a

Scroll down to the last section for tl;dr instructions :)

view more: ‹ prev next ›