1019
Reddit's licensing deal means Google's AI can soon be trained on the best humanity has to offer — completely unhinged posts
(www.businessinsider.com)
This is a most excellent place for technology news and articles.
It's not really. There is a potential issue of model collapse with only synthetic data, but the same research on model collapse found a mix of organic and synthetic data performed better than either or. Additionally that research for cost reasons was using worse models than what's typically being used today, and there's been separate research that you can enhance models significantly using synthetic data from SotA models.
The actual impact will be minimal on future models and at least a bit of a mixture is probably even a good thing for future training given research to date.