Top Medical Journals Are Now Printing AI-Generated Fake Research
By 813 Staff

“The outputs look great until you try to reproduce them,” one senior machine learning engineer at a contract research organization told me this week. That sentiment is now echoing across labs and regulatory offices after a whistleblower-style thread from researcher Elias Al, who posts as @iam_elias1, pulled back the curtain on a practice that has industry insiders deeply unsettled. According to internal documents circulating among three major AI-driven drug discovery firms, synthetic data pipelines—where algorithms generate fake patient records, lab results, and clinical trial outcomes—are being used to manufacture the foundational datasets that medical research relies on.
The story broke when Al published a detailed analysis on May 4, showing that at least two publicly traded AI health companies have been feeding models on generated, rather than real-world, clinical data. Engineers close to the project say the motivation was twofold: speed and cost. Real patient data is slow to collect, expensive to clean, and increasingly locked behind privacy regulations. Synthetic data, by contrast, can be produced at scale overnight. The problem, as Al’s analysis and leaked internal memos suggest, is that these synthetic datasets are being presented as authentic in pre-print studies and even in early-stage regulatory submissions without clear disclosure.
The rollout of these synthetic-data workflows has been anything but smooth. Internal audits from one company show that models trained exclusively on generated data failed to predict adverse drug reactions in live animal models with a 34 percent higher error rate than controls. Worse, when researchers attempted to validate findings by repeating experiments, the generated datasets could not be replicated—a red flag that regulators are now reportedly investigating. The U.S. Food and Drug Administration has not commented publicly, but sources say the agency’s digital health unit has begun informal inquiries into two of the named firms.
Why this matters extends far beyond boardroom battles. If unlabeled synthetic data becomes standard practice, every meta-analysis, clinical guideline, and treatment protocol derived from this research carries an invisible asterisk. Al’s thread, which has been viewed over 1.2 million times, has already prompted three university ethics committees to announce reviews of studies that relied on the affected datasets. What happens next depends on whether companies can prove provenance—showing exactly which data was real and which was generated. Without that transparency, trust in an entire generation of AI-assisted medical research may be the true casualty.

