AI at Work: The Cost of Pretending We Have Answers
Every day, we face an uncomfortable reality that both amazes and unsettles us. Here's an example: intelligent professionals are making stupid mistakes with AI, nobody really knows what to do about it, and most of the proposed "solutions" are security theater without sufficient empirical evidence.
1. What We Know: The Facts
I begin my reflection with a documented and verifiable case: Mata v. Avianca (S.D.N.Y., June 2023). In this case, attorneys Steven Schwartz and Peter LoDuca submitted a brief with six citations to non-existent cases generated by ChatGPT.[1] This is documented in public court records. Judge Kevin Castel fined them and required notification to all fictitiously cited judges.
2. Verifiable Data on the Extent of the Problem
In the United States following Mata v. Avianca (S.D.N.Y., June 22, 2023, joint fine of USD 5,000), new episodes were recorded with sanctions and more serious measures, such as the disqualification of three attorneys from the Butler Snow firm in a prison litigation case in Alabama (July 24, 2025), and sanctions in Utah (May-June 2025) for filing appeals with non-existent citations. In the United Kingdom, meanwhile, the High Court issued public warnings after detecting 18 out of 45 fictitious citations in one case, and five in another, with referrals to professional regulators (Hodge, 2025; Tobin, 2025; Booth, 2025). What we don't know is the actual number of undetected cases. Those reported are only the ones that resulted in judicial sanctions.
Table 1. Recent Court Cases for Improper AI Use
Date | Jurisdiction | Case | Conduct | Outcome | Source |
---|---|---|---|---|---|
June 22, 2023 | U.S. (S.D.N.Y.) | Mata v. Avianca, Inc., 1:22-cv-01461 | Invented citations generated with AI in brief | Joint fine USD 5,000 + corrective orders | S.D.N.Y., June 22, 2023 |
May 27 / June 4, 2025 | U.S. (Utah Court of Appeals) | Garner v. Kadince, 2025 UT App 80 | Appeal petition with non-existent citations (AI) | Sanctions: costs, reimbursements, and USD 1,000 donation | Utah Ct. App, June 4, 2025 |
July 24, 2025 | U.S. (N.D. Ala.) | Prison litigation against Jeff Dunn | Filing with false citations generated with AI | Disqualification of three attorneys (Butler Snow) + referral to bar | N.D. Ala, July 24, 2025 |
June 6, 2025 | UK (High Court) | QNB v. Al-Haroun; Forey (housing) | 18/45 fictitious citations in one case and 5 in another | Formal warning; referral to regulators; possible sanctions | High Court, June 6-7, 2025 |
Source: Author's compilation
3. Recognized Pattern: The Categorization Problem
Professionals are using generative AI as if it were a specialized search tool, when it is a plausible text generation tool. Verifiable evidence indicates that ChatGPT, Claude, Gemini, etc., are trained to optimize the probability of the next linguistic unit given the previous ones. In other words, plausibility could be described as a phenomenological effect, not an objective one (Bender et al. 2021). The so-called “hallucinations” are a structural consequence of the training objective (next-token prediction) and the lack of grounding by default (Bender et al. 2021; Tonmoy et al. 2024). Furthermore, the developers of these systems explicitly warn against their use for tasks requiring factual precision. What we don't know is why this warning doesn't translate into professional practice. Hypotheses abound (time pressure, technological misunderstanding, perverse incentives), but systematic studies are lacking
4. What We Don't Know: The Evidence Gaps
We lack evidence indicating the actual size of this problem. We don't have data on the real incidence of undetected errors, variation by industry or sector, by type of organization, or about professional demographics. Moreover, we therefore have no temporal trends—is it getting worse, improving, or stabilizing?
4.1 Effectiveness of Interventions
Most proposed "best practices" are not supported by adequate empirical evidence. For example, the double verification proposal sounds sensible, but does it reduce errors? Or does it create false confidence? How much additional time does it require? Is it cost-effective? Regarding organizational policies, do workers follow them under pressure? What types of policies are effective vs. dead letters? The famous training: Does changing knowledge about AI translate into real behavioral change? For how long? What type of training works?
Faced with this landscape, some evidence can begin to be used as anchors for future analyses. I outline three: organizational adoption, documented incidents, and legal tools with RAG (Retrieval Augmented Generation). Global surveys from 2025 report high adoption with sectoral variation and governance bottlenecks (McKinsey, 2025). Second, documented incidents. The AI Incident Database and AI Index 2025 series show an increase in reports and greater severity in regulated domains (HAI/Stanford, 2025). Technical evaluations and documentation (Lexis+, Westlaw, Thomson Reuters) describe citation verifiers and anchoring to primary sources, with error rates lower than those of using generic chatbots.
4.2 Opportunity Costs
We don't know the real trade-off between speed/efficiency vs. precision, between verification cost vs. error cost, and finally, between benefits derived from AI use vs. risks of misuse.
5. Why Are We Flying Blind? The Structural Forces
5.1 Perverse Incentives
In academia, researchers have incentives to publish "solutions" and theoretical frameworks, not to humbly document that we don't know what works. In the industry, AI companies have incentives to minimize problems and exaggerate the ease of solutions. User companies have incentives not to report errors. In consulting activities, consultants sell certainty, not uncertainty. In summary, admitting we don't know what to do doesn't generate contracts.
5.2 The Time Horizon Problem
The effects of massive AI adoption in professional work unfold over years, but pressures to "respond" are immediate. This generates a market for premature solutions.
5.3 Absence of Metrics Infrastructure
For AI adoption in professional contexts, there are no comparable, accepted, and auditable sectoral metrics like those that have existed in other areas for years, such as pharmacovigilance (post-market monitoring of medications), crash testing (systematic evaluation of vehicle safety), or clinical trials (rigorous evaluation before adoption), among others. However, the good news is that the first partial pieces are beginning to appear, including the notable NIST AI RMF 1.0 (U.S. Department of Commerce, 2023) and its profile for generative AI (AI 600-1, 2024), in addition to MLCommons/AILuminate efforts for safety testing.
6. The Stark Reality: Are We Improvising in Real Time?
6.1 What's Happening
In law firms, some completely prohibit AI (losing potential efficiency), others adopt it without protocols (assuming unknown risks), and most improvise ad hoc policies without evidence of effectiveness. In hospitals, there's informal adoption by residents and nurses, as well as nervous administrators, but without clear protocols, leading to massive variation in practices between institutions. In companies generally, policies range from total prohibition to uncritical adoption, compliance theater manuals (policies nobody can follow under real pressure), or delegation of responsibility to individual workers without systemic support.[2]
6.2 The Real Costs of Uncertainty
Defensive over-regulation is evident in organizations that prohibit potentially beneficial uses out of fear of the unknown. On the other hand, negligent under-regulation is not scarce. Many organizations adopt AI without precautions because "everyone is doing it." All of this occurs in a climate we could label as professional anxiety, which traps professionals between pressure to adopt new tools and fear of damaging their careers. This issue also reveals another problem: resource inequality. Professionals with more resources can afford additional verification; others are exposed to more risk.
6.3 The Real Trade-offs (Without Easy Answers)
-
Speed vs. Precision. In many professional contexts, however, speed also constitutes a fundamental value. A late legal brief, for example, can be more harmful than an imperfect but timely one. The challenge lies in optimizing this balance without concrete data on the frequency and severity of errors generated under different work protocols.
-
Democratization vs. Control. AI allows young professionals to produce work that previously required years of experience. But it also amplifies judgment errors that experience prevents. The dilemma: Restrict access (preserving hierarchies) or accept initial errors as the cost of democratization?
-
Individual vs. Systemic. At the individual level, exhaustive verification always seems sensible. At the systemic level, if everyone verifies everything, the aggregate cost can be prohibitive and AI's competitive advantage disappears.
-
Tragedy of the Commons. Individual optimization can lead to socially suboptimal equilibria.
7. Why Current "Solutions" Are Inadequate
7.1 The Theater of "Best Practices"
The recommendation of "best practices" (e.g., create AI policies, train staff, implement checklists, monitor compliance, etc.) suffers from several problems, including: it's usually based on intuition, not evidence; it ignores implementation and adherence costs; it assumes problems are knowledge-based, not structural; and most of the time, it doesn't consider contextual variation. Under this scenario, it's worth admitting that a policy is "theater" if compliance is impossible under real SLAs (Service Level Agreements), without adherence auditing, without outcome metrics (e.g., rate of false citations per 1,000 pages, weighted severity), and in the absence of organizational consequences when it fails.
7.2 The False Promise of Regulation
Regulation is staggered, mostly reactive, and still lacks homogeneous operational metrics.
Table 2. Implementation Timeline of the AI Act (EU)
Milestone | Date | What Applies |
---|---|---|
Entry into force | August 1, 2024 | Start of deadline counting |
Prohibitions + AI literacy | February 2, 2025 | Prohibited practices and basic AI literacy obligations |
GPAI (general purpose models) | August 2, 2025 | Rules and code of practice for GPAI providers |
High risk (regulated products) | August 2, 2027 | Transition window for integrated high-risk systems |
Source: Author's compilation based on EU implementation calendar
Furthermore, professional self-regulation (bar associations, medical associations, etc.) moves even more slowly than governmental regulation because it has to deal with, among other things, its corporate interests.
7.3 The Mirage of the "Technological Solution"
We shouldn't be seduced by the phrase "the next version of AI will be more reliable," because even if it were true, the uncomfortable question would persist: how do we evaluate reliability? What level is "sufficient" for what uses? Arguments like "specialized tools will solve the problem" also resemble mirages, since specialization adds cost and may reduce versatility.
8. What Could We Do? (Without Illusions of Certainty)
8.1 Evidence Infrastructure
What we need, but don't have, are systematic records of professional AI incidents, longitudinal studies of different adoption protocols, standardized cost-benefit metrics, and platforms for sharing "near misses" without penalties.
8.2 Responsible Experimentation
Instead of universal solutions, perhaps it would be more useful (perhaps even necessary) to bet on small pilot tests with rigorous measurement, taking into account intentional variation in protocols to generate comparative evidence, reinforced with careful documentation of what works in what contexts (and why). For example, cluster design (teams), A/B with three arms (no AI / AI without RAG / AI+RAG), outcomes (time, citation accuracy, retractions, near-misses), with specific testing horizons and internal pre-registration and publication of negatives. These tests can be anchored with the NIST GAI Profile (U.S. Department of Commerce, 2024).
8.3 Honesty About Limitations
Explicitly acknowledge what we don't know, what trade-offs we're making, what we're betting will turn out to be true, and when and how we'll review those bets. We need to acknowledge uncertainty instead of pretending we have answers, recognize that we're in uncharted territory, and act accordingly.
8.4 Experiment, Don't Theorize, and Share Failures
Instead of creating elaborate theoretical frameworks, perhaps we should try small interventions and measure results systematically. Instead of only publishing "successes," we should focus on documenting what doesn't work about what doesn't work. This strategy would help prevent the repetition of costly errors.
8.5 Accept Trade-offs
Instead of seeking perfect solutions, it would be appropriate to recognize that every option has costs and we should be more explicit about our choices—that is, what we prioritize, why we do it, and to what extent. Organizations and professionals can begin to apply, for example, a net value decision rule[3] and design verification thresholds by context: "in documents with legal citations, require 100% automatic verification against Westlaw/Lexis; in internal memos, use 20% sampling."
Conclusion: Honesty as a Starting Point
The fundamental problem isn't that competent professionals are making stupid decisions with AI. It's that we're pretending to know what to do when we're improvising. The consequences of this dishonesty are: premature solutions that may be counterproductive, the transfer of responsibility to individual workers without systemic support, and the loss of opportunities to learn systematically.
The alternative is to admit that we're in a massive real-time experiment, design that experiment responsibly, and commit to changing course when evidence indicates it. This is less satisfying than proclaiming "best practices" or creating elegant theoretical frameworks. But it's more honest, and probably more useful in the long term, since the first rule for navigating unknown territory isn't to pretend you know the way—it's to admit you're exploring.
Bibliography
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). "On the dangers of stochastic parrots: Can language models be too big?". Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610-623).
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. "On the Dangers of Stochastic Parrots." In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. New York, NY, USA: ACM. https://doi.org/10.1145/3442188.3445922.
Booth, Robert (June 6, 2025). "High court tells UK lawyers to stop misuse of AI after fake case-law citations". The Guardian. https://www.theguardian.com/technology/2025/jun/06/high-court-tells-uk-lawyers-to-urgently-stop-misuse-of-ai-in-legal-work?utm_source=chatgpt.com
Hodge, Neil. (July 28, 2025). "Technology: UK judge warns lawyers about risks of AI use in court". International Bar Association. https://www.ibanet.org/Technology-UK-judge-warns-lawyers-about-risks-of-AI-use-in-court?utm_source=chatgpt.com
McKinsey & Company. (2025, March 5). The state of AI: How organizations are rewiring to capture value (Global Survey report). https://www.mckinsey.com
Stanford Human-Centered Artificial Intelligence. (2025). Artificial Intelligence Index Report 2025. https://hai.stanford.edu
Tobin, San. (June 6, 2025). "Lawyers face sanctions for citing fake cases with AI, warns UK judge". Reuters. https://www.reuters.com/world/uk/lawyers-face-sanctions-citing-fake-cases-with-ai-warns-uk-judge-2025-06-06/?utm_source=chatgpt.com
Tonmoy, S. M., S. M. Zaman, Vinija Jain, Anku Rani, Vipula Rawte, Aman Chadha, and Amitava Das. 2024. "A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models." ArXiv Preprint ArXiv:2401. 01313 6.
U.S. Department of Commerce. (2023). Artificial intelligence risk management framework (AI RMF 1.0). URL: https://nvlpubs. nist. gov/nistpubs/ai/nist. ai, 100-1.
U.S. Department of Commerce. (2024). Artificial intelligence risk management framework: Generative artificial intelligence profile. NIST Trustworthy and Responsible AI Gaithersburg, MD, US.
Main Legal Cases
Mata v. Avianca:
Castel, P. K. (2023, June 22). Mata v. Avianca, Inc., No. 1:2022cv01461. U.S. District Court for the Southern District of New York. https://law.justia.com/cases/federal/district-courts/new-york/nysdce/1:2022cv01461/575368/54/
Cerullo, M. (2023, May 27). Lawyer apologizes for fake court citations from ChatGPT. CNN Business. https://www.cnn.com/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html
Palazzolo, J. (2023, June 22). Judge sanctions lawyers for brief written by A.I. with fake citations. CNBC. https://www.cnbc.com/2023/06/22/judge-sanctions-lawyers-whose-ai-written-filing-contained-fake-citations.html
Seyfarth Shaw LLP. (2023, June 22). Update on the ChatGPT case: Counsel who submitted fake cases are sanctioned. Seyfarth Shaw Legal Updates. https://www.seyfarth.com/news-insights/update-on-the-chatgpt-case-counsel-who-submitted-fake-cases-are-sanctioned.html
Additional U.S. Cases
Other federal cases:
McLane Middleton. (2024, April 18). Fake news in court: Attorney sanctioned for citing fictitious case law generated by AI. McLane Middleton Insights. https://www.mclane.com/insights/fake-news-in-court-attorney-sanctioned-for-citing-fictitious-case-law-generated-by-ai/
Reuters. (2024, November 26). Lawyer sanctioned over AI-hallucinated case cites, quotations. Bloomberg Law. https://news.bloomberglaw.com/litigation/lawyer-sanctioned-over-ai-hallucinated-case-cites-quotations
Tarm, M. (2023, December 30). Michael Cohen sent AI-generated fake legal cases to his lawyer. NPR. https://www.npr.org/2023/12/30/1222273745/michael-cohen-ai-fake-legal-cases
ABA Journal. (2024, February 26). No. 42 law firm by head count sanctioned over fake case citations generated by AI. ABA Journal. https://www.abajournal.com/news/article/no-42-law-firm-by-headcount-could-face-sanctions-over-fake-case-citations-generated-by-chatgpt
National Law Review. (2025, February 24). Lawyers sanctioned for citing AI-generated fake cases. National Law Review. https://natlawreview.com/article/lawyers-sanctioned-citing-ai-generated-fake-cases
Legal.io. (2025). Fake case citations land two attorneys in hot water over AI misuse. Legal.io. https://www.legal.io/articles/5609086/Fake-Case-Citations-Land-Two-Attorneys-in-Hot-Water-Over-AI-Misuse
International Cases
Canada:
MacLean Family Law. (2024, April 19). Stopping fake BC AI legal cases. MacLean Family Law. https://macleanfamilylaw.ca/2024/03/21/stopping-fake-bc-ai-legal-cases/
Canadian Lawyer. (2025, June 6). Why banning AI in court is the wrong fix for fake case citations. Canadian Lawyer. https://www.canadianlawyermag.com/news/opinion/why-banning-ai-in-court-is-the-wrong-fix-for-fake-case-citations/392576
LawNext. (2025, May 14). AI hallucinations strike again: Two more cases where lawyers face judicial wrath for fake citations. LawSites. https://www.lawnext.com/2025/05/ai-hallucinations-strike-again-two-more-cases-where-lawyers-face-judicial-wrath-for-fake-citations.html
United Kingdom:
Ha, A. (2025, June 9). Lawyers could face 'severe' penalties for fake AI-generated citations, UK court warns. TechCrunch. https://techcrunch.com/2025/06/07/lawyers-could-face-severe-penalties-for-fake-ai-generated-citations-uk-court-warns/
JURIST. (2025, June 10). UK judge warns lawyers of consequences for misusing AI in court filings. JURIST News. https://www.jurist.org/news/2025/06/uk-judge-warns-lawyers-of-consequences-for-misusing-ai-in-court-filings/
Legal Futures. (2025, July 3). Court issues stark warning to lawyers over AI-generated fake cases. Legal Futures. https://www.legalfutures.co.uk/latest-news/court-issues-stark-warning-to-lawyers-over-ai-generated-fake-cases
International Bar Association. (2025). Technology: UK judge warns lawyers about risks of AI use in court. International Bar Association. https://www.ibanet.org/Technology-UK-judge-warns-lawyers-about-risks-of-AI-use-in-court
Computing. (2025). UK judge warns of justice risks as lawyers cite fake AI-generated cases in court. Computing. https://www.computing.co.uk/news/2025/ai/judge-warns-of-justice-risks-from-ai
Technical Sources on AI Design
MIT Sloan Teaching & Learning Technologies. (2023, August 30). When AI gets it wrong: Addressing AI hallucinations and bias. MIT Sloan EdTech. https://mitsloanedtech.mit.edu/ai/basics/addressing-ai-hallucinations-and-bias/
OpenAI. (n.d.). Does ChatGPT tell the truth? OpenAI Help Center. https://help.openai.com/en/articles/8313428-does-chatgpt-tell-the-truth
Smith, C. S. (2023, March 29). Hallucinations could blunt ChatGPT's success. IEEE Spectrum. https://spectrum.ieee.org/ai-hallucination
Techdirt. (2025, April 29). The hallucinating ChatGPT presidency. Techdirt. https://www.techdirt.com/2025/04/29/the-hallucinating-chatgpt-presidency/
Axios. (2025, June 4). Why hallucinations in ChatGPT, Claude, Gemini still plague AI. Axios. https://www.axios.com/2025/06/04/fixing-ai-hallucinations
Zapier. (2024, July 10). What are AI hallucinations---and how do you prevent them? Zapier Blog. https://zapier.com/blog/ai-hallucinations/
Studies on AI Error Rates
Emsley, R. (2023). ChatGPT: These are not hallucinations -- they're fabrications and falsifications. Schizophrenia, 9, 52. https://doi.org/10.1038/s41537-023-00379-4
Emsley, R. (2023). ChatGPT: These are not hallucinations -- they're fabrications and falsifications. PMC. https://pmc.ncbi.nlm.nih.gov/articles/PMC10439949/
Legal and Professional Analysis
Association of Corporate Counsel. (n.d.). Practical lessons from the attorney AI missteps in Mata v. Avianca. ACC Resource Library. https://www.acc.com/resource-library/practical-lessons-attorney-ai-missteps-mata-v-avianca
American Bar Association. (2024, September). Common issues that arise in AI sanction jurisprudence and how the federal judiciary has responded to prevent them. Business Law Today. https://www.americanbar.org/groups/business_law/resources/business-law-today/2024-september/common-issues-arise-ai-sanction-jurisprudence/
National Law Review. (2024). Court slams lawyers for AI-generated fake citations. National Law Review. https://natlawreview.com/article/court-slams-lawyers-ai-generated-fake-citations
Attorneys Schwartz and LoDuca had decades of experience. They knew that false citations are career-ending. They knew verification tools existed (Shepard's, KeyCite). So why did they entrust something so verifiable and critical to a tool they clearly didn't understand? One can speculate (in fact, we must do so): Did they think ChatGPT was "Google but better"? Were they under extreme time pressure? Had they never used AI tools before and didn't understand how they work? Was it a minor case where they thought they could "get away with it"? Perhaps in the future we'll learn more about the real motivations they faced, their thought process, and what alternatives they considered. ↩︎
For example, organizations could implement risk-based controls immediately (Monday-ready). For instance, for tasks with high error costs (health, legal, finance), prohibit the use of generalist chatbots for pieces presented to third parties; only tools with automatic verification and links to primary sources. ↩︎
Where Net Value = (Error Probability × Severity) − (Time Saved × Hourly Review Cost). ↩︎