Academic publishing has long operated on the assumption that the humans submitting research papers are, at minimum, the authors of their own words. That assumption is now under siege. The preprint server arXiv — the sprawling, open-access repository that hosts millions of scientific papers before formal peer review — announced in mid-May 2026 that it will begin issuing year-long submission bans to any researcher caught flooding the platform with AI-generated content riddled with fabricated citations and invented data. The move marks one of the sharpest institutional responses yet to what researchers have taken to calling AI slop in scientific publishing.
To understand why arXiv felt compelled to act, one must appreciate the scale of the problem. Over the past eighteen months, platform moderators have flagged a surge in submissions that pass a surface-level plausibility test — coherent prose, correctly formatted references — but disintegrate under scrutiny. Citations point to journals that do not exist. Author names are scrambled. Statistical results are internally inconsistent. In the worst cases, entire datasets appear to have been confabulated by language models instructed to produce something that looks like experimental output. One member of arXiv’s volunteer moderation team, speaking on background, described the situation plainly: the prose is polished, but the science is fiction.
The phenomenon is not unique to arXiv. Journals across disciplines have reported similar patterns since large language models became widely accessible in late 2023. But arXiv occupies a peculiar position in the research ecosystem: it is openly accessible, lightly moderated by design, and serves as the first point of public record for fields including physics, mathematics, computer science, and quantitative biology. A hallucinated preprint on arXiv can be cited by other papers, amplified on social media, and embedded in secondary literature long before any formal retraction process catches up. The reputational damage to genuine scientists working in adjacent areas can be severe.
The new enforcement policy, as described in arXiv’s internal communications reviewed for this article, will operate on a tiered basis. A first confirmed violation triggers a formal warning and mandatory review of any recently posted submissions. A second violation within a rolling twelve-month window results in the year-long submission ban. Repeat offenders face permanent removal from the platform. Critically, arXiv is investing in a small team of human reviewers supported by automated detection tools — the irony of using AI to catch AI-generated content is not lost on administrators — tasked with investigating flagged submissions before any penalty is issued.
Critics of the policy worry about false positives. Researchers who use AI tools legitimately — for grammar checking, translation assistance, or literature summarization — could theoretically be caught in the same net as outright fabricators. The line between AI-assisted and AI-generated is genuinely blurry, noted Dr. Farida Okonkwo, a computational biologist at a Gulf-region university who studies research integrity. A paper drafted by a human but heavily edited by a language model occupies a grey zone that no automated detector handles well. arXiv’s administrators acknowledge this concern and have indicated that human judgment will be the final arbiter in contested cases.
The broader implications extend well beyond one preprint server. If arXiv’s policy proves effective — and the research community perceives it as fair — it could set a precedent for formalized AI-content governance across the academic publishing stack. Major journals have experimented with disclosure requirements, asking authors to declare whether AI tools were used in manuscript preparation. Fewer have moved toward active enforcement with meaningful consequences. A year-long ban from a platform as central as arXiv is a genuine professional penalty, one that would be felt acutely by early-career researchers for whom a continuous publication record is a prerequisite for grant applications and tenure-track positions.
What arXiv’s action signals most clearly is a growing institutional consensus that voluntary disclosure is insufficient. The incentive structure of academic publishing — where volume of output remains a proxy for productivity in many evaluation systems — creates pressure to publish faster than careful science permits. AI tools lower the cost of producing plausible-looking papers dramatically. Without enforcement mechanisms that impose real costs on bad actors, the quality signal that preprint servers provide degrades for everyone. A repository full of hallucinated physics papers is not a minor inconvenience; it is an epistemic hazard. arXiv’s year-long suspension policy, whatever its imperfections, is at least an acknowledgment that the platform has a duty to the integrity of the scientific record it hosts.