this post was submitted on 05 Feb 2026
23 points (92.6% liked)

Ask Lemmy

37554 readers
1339 users here now

A Fediverse community for open-ended, thought provoking questions


Rules: (interactive)


1) Be nice and; have funDoxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them


2) All posts must end with a '?'This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?


3) No spamPlease do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.


4) NSFW is okay, within reasonJust remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].


5) This is not a support community.
It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.


6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online


Reminder: The terms of service apply here too.

Partnered Communities:

Tech Support

No Stupid Questions

You Should Know

Reddit

Jokes

Ask Ouija


Logo design credit goes to: tubbadu


founded 2 years ago
MODERATORS
 

For most use cases, web search engines are fine. But I am wondering if there are alternative ways to finding information. There is also the enshittification of google and tbh most(free) search engines just give google search result

Obviously, the straight answer is just asking other people, in person or online, in general forums or specialised communities

Libraries are good source too but for those of is that don't have access to physical libraries, there free online public libraries(I will post the links for those that I found below)

Books in general, a lot of them have reference to outside materials.

So, I been experimenting with an AI chat bot(Le chat), partially as life coach of sorts and partially as a fine tuned web search engine. To cut to the chase, its bad. when its not just listing google top results it list tools that are long gone or just makes shit up. I was hoping it to be a fine tuned search engine, cuz with google, if what you want is not in the top 10 websites, your on your own.

So yeah, that all I can think of. Those are all the routes I can think of for finding information and probably all there is but maybe I missed some other routes.

you are viewing a single comment's thread
view the rest of the comments
[–] CorrectAlias@piefed.blahaj.zone 3 points 20 hours ago (3 children)

Even local models are trained on stolen art and content. That's the immoral part.

[–] kata1yst@sh.itjust.works 2 points 19 hours ago

There's many models that use open source training sets and weights. You can choose them.

[–] bridgeenjoyer@sh.itjust.works 2 points 19 hours ago

No one seems to get this part.

[–] riskable@programming.dev 0 points 19 hours ago* (last edited 19 hours ago)

AI models aren't trained on anything "stolen". When you steal something, the original owner doesn't have it anymore. That's not being pedantic, it's the truth.

Also, if you actually understand how AI training works, you wouldn't even use this sort of analogy in the first place. It's so wrong it's like describing a Flintstones car and saying that's how automobiles work.

Let's say you wrote a book and I used it as part of my AI model (LLM) training set. As my code processes your novel, token-by-token (not word-by-word!), it'll increase or decrease a floating point value by something like 0.001. That's it. That's all that's happening.

To a layman, that makes no sense whatever but it's the truth. How can a huge list of floating point values be used to generate semi-intelligent text? That's the actually really fucking complicated part.

Before you can even use a model you need to tokenize the prompt and then perform an inference step which then gets processed a zillion ways before that .safetensors file (which is the AI model) gets used at all.

When an AI model is outputting text, it's using a random number generator in conjunction with a word prediction algorithm that's based on the floating point values inside the model. It doesn't even "copy" anything. It's literally built upon the back of an RNG!

If an LLM successfully copies something via it's model that is just random chance. The more copies of something that went into its training, the higher the chance of it happening (and that's considered a bug, not a feature).

There's also a problem that can occur on the opposite end: When a single set of tokens gets associated with just one tiny bit of the training set. That's how you can get it to output the same thing relatively consistently when given the same prompt (associated with that set of tokens). This is also considered a bug and AI researchers are always trying to find ways to prevent this sort of thing from happening.