this post was submitted on 07 Feb 2026
83 points (100.0% liked)

Technology

6003 readers
207 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.


Post guidelines

[Opinion] prefixOpinion (op-ed) articles must use [Opinion] prefix before the title.


Rules

1. English onlyTitle and associated content has to be in English.
2. Use original linkPost URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.
3. Respectful communicationAll communication has to be respectful of differing opinions, viewpoints, and experiences.
4. InclusivityEveryone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Ad hominem attacksAny kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.
6. Off-topic tangentsStay on topic. Keep it relevant.
7. Instance rules may applyIf something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.


Companion communities

!globalnews@lemmy.zip
!interestingshare@lemmy.zip


Icon attribution | Banner attribution


If someone is interested in moderating this community, message @brikox@lemmy.zip.

founded 2 years ago
MODERATORS
 

AI helpers can now rummage through multiple documents

top 28 comments
sorted by: hot top controversial new old
[–] recursive_recursion@piefed.ca 23 points 7 hours ago (1 children)

The Register asked Microsoft about the privacy implications and what happens to user data while an agent does its thing, but other than acknowledging our question, the company did not respond.

[–] diabetic_porcupine@lemmy.world 12 points 5 hours ago

We recognize your willingness to have rights, and wish you the very best!

[–] sad_detective_man@sopuli.xyz 22 points 7 hours ago (4 children)

Hey, mine is empty. Can anyone recommend something I could put in there to poison it?

[–] Jesus_666@lemmy.world 3 points 3 hours ago

You could have a really simple Markov chain generator fill a gigabyte's worth of .txt files with nonsense sentences. At least that's "content" they have to parse.

[–] ch00f@lemmy.world 5 points 4 hours ago
[–] nexguy@lemmy.world 9 points 6 hours ago (1 children)
[–] sad_detective_man@sopuli.xyz 4 points 5 hours ago

Not a bad idea

[–] TropicalDingdong@lemmy.world 9 points 7 hours ago (2 children)

Hey, mine is empty. Can anyone recommend something I could put in there to poison it?

A couple hundred million 0kb files?

[–] SendMePhotos@lemmy.world 6 points 6 hours ago

A ton of folders

[–] AliasAKA@lemmy.world 4 points 6 hours ago (2 children)

That won’t poison an LLM exactly.

https://www.anthropic.com/research/small-samples-poison#%3A%7E%3Atext=For+example%2C+LLMs+can+be%2Cwidespread+adoption+in+sensitive+applications.

Theoretically this is a place to start. They probably have mitigations for many of these.

[–] halcyoncmdr@piefed.social 2 points 1 hour ago

They probably have mitigations for many of these.

Have you seen the state of testing for Microsoft products nowadays? Or rather the apparently complete lack of testing.

[–] sad_detective_man@sopuli.xyz 1 points 4 hours ago

I found this study, it looked promising but I think it only works on the one LLM they were targeting. Also they seem to be working to protect ai models so results they find will probably be implemented as ways to protect against poisoning. I guess intentional dataset poisoning hasn't come as far as I hoped

[–] nexguy@lemmy.world 15 points 7 hours ago

All of my zero one drive files? heckin shoot

[–] 4am@lemmy.zip 5 points 6 hours ago (1 children)

Yup, there it is. I’ve been saying it.

Everyone from individuals to Fortune 500s are storing their data on OneDrive and SharePoint in the cloud.

ML models (not necessarily LLMs) are incredible at finding patterns and targeted data points in huge data sets.

I wonder why they need all those data centers for all the AI workloads that barely anyone is using, or wants.

Hmm. Hmmmmmm. 🤔

[–] crunchy@lemmy.dbzer0.com 4 points 5 hours ago

The government, including the military, uses OneDrive and SharePoint too.

[–] BagOfHeavyStones@piefed.social 9 points 7 hours ago (1 children)

So, just need to prompt for a person's name and get to find the contents of all the files with that name in it?

[–] 6nk06@sh.itjust.works 7 points 6 hours ago (1 children)

It's like grep but it's killing us as a bonus.

[–] otacon239@lemmy.world 5 points 5 hours ago

It’s also less accurate and will make up results. So it has that going for it.

[–] Dadifer@lemmy.world 5 points 7 hours ago
[–] plinky@hexbear.net 3 points 7 hours ago

as somewhat ai curious person, in my experience, with at least small local llm (40-80b) they are absolutely shit at working with large texts, at best they can make a passable summary, contradictory information is ignored (if it's in two documents), document citations are pulled from the ass 1/3rd of the time (and the pulled info is equally dodgy, if it's pass/fail type on exactness), is copilot better than this?

[–] AlecSadler@lemmy.dbzer0.com -4 points 5 hours ago (4 children)

Rufus let me install windows with a local account and debloat scripts let me nuke/hide OneDrive.

And, INB4 anyone says "switch to linux", I'm sorry but it just isn't viable while maintaining the same seamless experience for what I need. I hate Windows, but I also value my time.

[–] Orcocracy@hexbear.net 2 points 2 hours ago

If you value your time you should switch to Linux. Yes there’s a brief adjustment period, but especially now, that period is very short and well worth the initial effort. It’s not 2005 anymore, Linux has less jank than Windows or Mac OS now.

[–] justlemmyin@lemmy.world 6 points 4 hours ago (1 children)

All good mate, we will be here for you when you are ready to take the plunge.

[–] AlecSadler@lemmy.dbzer0.com 3 points 4 hours ago* (last edited 4 hours ago) (1 children)

I'm 8 distro attempts in so far and have yet to not sink substantial time into driver and general usability issues.

In another life I'd be happy to sacrifice productivity to grind on fixing the issues, but presently it has real costs.

For now, I run Proxmox and a slew of linux containers and VMs - but still unfortunately use windows as my primary :/

[–] z3rOR0ne@lemmy.ml 1 points 1 hour ago

My use case for Linux is very minimal, so I can't say I know your particular use case, but what drivers do you have issues with? Purely out of curiosity.

[–] 0xtero@beehaw.org 1 points 2 hours ago

You might also be interested in running this: https://github.com/zoicware/RemoveWindowsAI

[–] buckykat@hexbear.net 2 points 4 hours ago (1 children)

I just really need the seamless experience of running a new debloat script every month to stop whatever new spyware Microsoft is pushing. I value my time so much that I want to waste it fighting my operating system at every turn.

[–] RisingSwell@lemmy.dbzer0.com 1 points 2 hours ago

It's not that hard to stop windows, there are programs that block internet access by process, and windows only has so many of them. One drive doesn't boot on my PC, and if edge tried to open it auto-fails because edge is offline.