Ehhhh... I don't disagree that this is what OpenAI is trying to do, but I don't really buy the rest of the message. You fundamentally need real training data to get anywhere - synthetic training data might be an option and it might be better than nothing, but it's the same problem as using LLM output to train LLMs. All the weird deviations from the underlying dataset get captured and amplified, and the result is performance degradation. If Sora was good enough to get around this limitation, that would imply that OpenAI had enough actual training data to accomplish that, so they wouldn't need Sora to generate more.
Sam Altman knows that his company is screwed if they don't get major government support, and he's clearly chosen to abandon any pretense of scruples to chase those dollars, but that doesn't mean he can deliver. We're probably more likely to see drones bombing someone because they wore the same shirt as the target rather than actually competent automated surveillance state.