I Genuinely Don't Understand Why People Tolerate Hallucinations in AI

It’s 2026, and the AI hype cycle is reaching desperate levels. It’s clear there are cracks in the foundation, and one of those cracks is hallucination. I can’t fathom how people are still putting up with them.

What Happens When You Search for Yourself With AI
Some Models Are Bad, Some Are Okay
Hallucinations in Code Reviews
Hallucinations Are Innate

What Happens When You Search for Yourself With AI

One of the things I criticize generative AI for most heavily is hallucination. If you’re an expert in a topic, then you see it all the time. The AI gives you a surface level description, often with mistakes. In contrast, if you’re not an expert, every response looks great.

I heard this phenomenon described recently in a video I was watching (unfortunately, I couldn’t find the link, but I think it was Mo Bitar) which talked about how people will say things like, “AI is really good at back end code but terrible at front end code.” If you happen to be bad at back end code but good at front end code, you might come to a conclusion like this. It’s not the ability of the model but your ability that dictates how good the responses seem.

So, my thought was: why don’t I search myself up with the popular LLMs? I’m an expert in my own life, so I’ll be able to easily spot the mistakes. Here’s what happens when I drop my last name in ChatGPT:

A screenshot of ChatGPT failing to identify me.

You might not know this, but my last name (for the time being) is entirely unique. Only three people have it: my wife, my daughter, and me. If you even search the word “grifski” in a search engine, you will only find information about us. It’s frankly the biggest downside of making up our own last name.

Yet, if I go ask chat about me, it makes up a completely fictional person. Sure, I just dumped my last name in, but I can do the same with DuckDuckGo and get significantly more relevant information.

What I find so strange about this is that “Adam Grifski” is linked. When I click it, ChatGPT generates a completely fictional profile of a person named “Adam Grifski.” We’re in 2026, by the way. I’m not making fun of models from the early days circa 2022. These are meant to be significantly improved and more polished than the models we used to dunk on for telling people to eat glue.

Some Models Are Bad, Some Are Okay

Frankly, I’m fascinated by this. And, it’s interesting because all the free models do this (I feel obligated to say this because there’s always some guy who’s like, “you’re not using the $200/month models.” Do y’all hear yourself?). I opened Gemini, and I was greeted with yet another fictional character: Dennis Grifski (Griffith?).

A screenshot of Gemini failing to identify me.

This one at least describes a person much closer to me, but like an evil version of me. It then went on to make up a GitHub URL and a personality for this Dennis character that does not exist.

Somehow, Copilot was able to successfully find me, but even the description is insane:

A screenshot of Copilot correctly identifying me with very strange descriptions.

Like, all of this info is more or less accurate, but it’s presented in such an odd way. Like, it identifies my wife and I and describes us in a vague sense. It claims I’m the most prominent Grifski because I wrote an article about the name change, not because I’m the only Grifski with an online presence. It also draws strange comparisons between me and brands like Grishko.

Unfortunately, if you want to test Claude, you have to create an account with your phone number. Naturally, I took one for the team to get the following slop:

A screenshot of Claude correctly aggregating data about myself.

Like Copilot, it successfully identified me. In this case, it just straight ripped my profile from my job. It then stole my profile from my website, basically bar-for-bar. A similar pattern was used with my ACM profile. Of all the models, Claude was most accurate, but it basically just did search engine aggregation. I don’t find this kind of stuff impressive.

Hallucinations in Code Reviews

While it’s fun to watch these models hallucinate information about me, I think it’s a lot more sinister when you aren’t an expert. For example, GitHub “recently” added Copilot to their pull request workflow. Now, you can request a code review from Copilot in lieu of an actual person. While I’m not sure I fully hate this yet, it’s lead to some really absurd experiences for me.

See, I let students complete a “portfolio project.” I think I’ve written about it elsewhere on the site, but basically students can start building out their GitHub profile by building a software component. As a part of the assessment, students are required to submit pull requests, which I use as an opportunity for code review.

This semester is the first time I’ve given this project where students have requested AI reviews of their pull requests before I’ve had a chance to grade them. To be honest, I actually enjoy it because I get a chance to dunk on their comments. For example, here’s some feedback a student of mine got from Copilot on their pull request:

Prefer using assertTrue(…)/assertFalse(…) instead of assertEquals(true/false, …) so failures produce clearer intent and better messages in JUnit output.

I found this feedback kind of absurd. In our classes, we generally discourage students from using assertTrue() and assertFalse() precisely because their error messages are bad:

Asserts that a condition is false. If it isn’t it throws an AssertionError without a message.

If you use assertEquals(), you will at least get an error message that reads something like “expected true but was false.” Though, perhaps this was improved in more recent versions of JUnit, since we’re still using JUnit 4. Also, if anything, the critique should be: “use customized messages for test cases, so it’s clear what went wrong.” Or, just change your workflow to match what folks discussed here.

But, in any case, Copilot shouldn’t be giving advice for things it lacks context on (i.e., assuming JUnit 5 when we’re using JUnit 4). This is like when you ask for advice on how to do something on StackOverflow, and half the comments are like “don’t.” Like, thanks man. That’s really helpful stuff.

What I find particularly baffling though is that the bot is giving what I would call “soft” advice. There isn’t an error in the code. There are no mistakes or potential bugs. It’s just straight up giving a style suggestion, which makes it seem like the bot has a personal preference or something.

If this doesn’t seem odd to you, what the hell does this look like when the code is also AI generated? “Hey, Claude. I love what you did here, but you can really show clearer intent with this other method.” What are we doing here?

Hallucinations Are Innate

I think the thing that drives me nuts is that hallucination, as far as I can tell, is an unsolvable problem. If you build “intelligence” around the idea of statistical modeling and neural nets, you’re going to get a very fancy regression machine. That means for every input there is some predicted output, even if the predicted output is a complete guess.

It’s not like we’re talking about a massive lookup table. There is no mapping of questions to answers. There is only the prediction of outputs from inputs. While there may be ways to mitigate this issue, the underlying problem is unavoidable.

I think my favorite way I’ve heard this described is that the models are essentially lossy compression algorithms. They store a massive amount of information in a tiny footprint, but the cost of that compression is a loss of information. At best, these models can only partially reconstruct their training data, and most folks would prefer it didn’t do that anyway (i.e., overfitting is bad).

Yet, they’re just good enough to trick a layman into trusting the outputs. It’s why the average person is fascinated with them. It gives very convincing answers, even if they’re wrong. But it’s precisely because they’re often hallucinating, that I just refuse to tolerate them completely. Like, who’s to say that the entirety of their output isn’t a hallucination that happens to get some things right?

And ironically, the solution everyone gives is, “well, you just have to verify the output.” My brother in Christ: why wouldn’t I just search for that information in the first place? Likewise, I don’t even trust people to do the manual verification anyway. You know people are just taking whatever the chat bot says as gospel. I wouldn’t be surprised if I were to lose out on future job prospects because HR has a AI tool that looks up applicants, and mine just says “most prominent person with last name; makes puzzles; loves neovim.”

What I find really funny is that the solution to this problem is apparently agentic AI. In other words, spin up like 10 models simultaneously, have them play roles, and autonomously deliberate. Perhaps after 30 iterations of the query “Grifski,” the models can come to some consensus. Talk about an expensive way to do something that my toddler could do. Insert “look what they need to mimic a fraction of our power” meme.

Anyway, that’s enough shitposting for a minute. I’ll call it here. As usual, here are some related pieces if you found this one enjoyable:

In addition, I’d love if you ran over to my list of ways to grow the site. There, you’ll find links to the newsletter, Patreon, and more. Otherwise, take care!

The Hater's Guide to Generative AI (24 Articles)—Series Navigation

As a self-described hater of generative AI, I figured I might as well group up all my related articles into one series. During the earlier moments in the series, I share why I’m skeptical of generative AI as a technology. Later, I share more direct critiques. Feel free to follow me along for the ride.

← Previous Post: [#23]

I Genuinely Don’t Understand Why People Tolerate Hallucinations in AI

Table of Contents

What Happens When You Search for Yourself With AI

Some Models Are Bad, Some Are Okay

Hallucinations in Code Reviews

Hallucinations Are Innate

Recent Blog Posts