In July, the audit was launched with a 237-page assurance report submitted as part of a contract with Deloitte to the Department of Employment and Workplace Relations in Australia, which cost AU$440,000 (roughly US$290,000). This Deloitte refund AI report looked into the quality of the department’s IT systems and the welfare system’s automated penalty system.
Unfortunately, the report was short-lived: health and welfare law researcher Chris Rudge of the University of Sydney reported publicly, and on social media, that the report contained fabricated references Deloitte. He pointed out that there were a number of misreferences, some misattributed to sources that were not real, as well as misattributed academic cites and an unattributed quote by a judge of the federal court which seemed to turn up in various places.
Errors, Removal, and Partial Refund After AI Errors
Following an internal review the department and Deloitte have confirmed there were some inaccurate footnotes and references. Deloitte has agreed to remit the final payment owed on the contract, a partial refund after AI errors, though the amount of the refund remains undisclosed. We later saw a revised version uploaded to the department’s website. The new Deloitte refund AI report had deleted the fictitious court quote, and the fictitious academic references.
The report also provided a disclosure that portions of it had been drafted using a generative AI model, Azure OpenAI. The department stood by its view that the references were inaccurate but that the findings and recommendations of the report remained unchanged. When we contacted Deloitte, we were told the issue was resolved directly with the client; and they will not confirm whether the AI tool was responsible for the AI hallucinations in reports.
What Was Fabricated and How It Was Discovered
As indicated in Rudge’s review, there were around twenty errors in the first report, and perhaps the most alarming was “citing to Professor Lisa Burton Crawford for a book that was not in her area of expertise, and did not exist.”
Rudge also noted that the report made a claim of authoritative legitimacy by citing the work of various academics, but did not clearly show if the authors had actually read the materials cited. He particularly pointed out how serious it was to misrepresent what a federal court judge, whom the report was intended to assess for legal compliance, said.
Experts indicate that a well-known risk of generative AI systems is that they will “hallucinate,” or fabricate content to fill in the gaps. The fabricated references Deloitte incident is a clear example of AI hallucinations in reports, raising concerns about the reliability of AI in professional research and consulting work.
Political and Institutional Backlash
The event generated intense political responses. Senator Barbara Pocock of the Greens stated that Deloitte should return all AU$440,000 for the “ineptitude” of using AI for such a delicate task, stating that getting a judge’s words wrong would have resulted in a fail grade for a first-year law student.
Labor Senator Deborah O’Neill said it showed a “human intelligence problem,” not just software failings. She asserted that consulting firms should be able to say “this is who produced this report” or “this is what the AI software produced,” highlighting the need for better clarity on how AI ought to be used for tasks that are official or analytical.
She also added that this case should prompt a larger review of corporate accountability when AI hallucinations in reports lead to misinformation or public embarrassment. The partial refund after AI errors should serve as a warning for other consulting firms about transparency and due diligence.
What This Means Going Forward
This situation illustrates the potential dangers of using generative AI for formal work and serious consulting work. Making mistakes such as creating a quote or an academic source that does not exist undermines both the integrity of the report and the reputation
Organizations using AI-generated material should establish rigorous review and fact-checking processes before disseminating it. Responsibility cannot be delegated to algorithms.
According to Deloitte, the revised report remains representative of accurate analysis and a valid recommendation. But public trust may now depend on the degree to which the firm and other consulting firms are open about their use of AI and their process for validating human approval.
What this all adds up to is that while AI can provide enhanced productivity, it is not a substitute