I had Claude resurface texts I forgot to respond to, and realized that the real blocker—obviously—was that I didn’t want to reply. I’ve tried countless apps to shut out distractions, but procrastinate just as well by staring blankly at walls. What’s actually tough about my job is coming up with novel frames for important ideas and devising sentences that are equal parts sharp, lively, and true. You can have the best Deep Research reports in the world, and still lack a unique point of view.
I’m not the only one having this issue. Just because Claude Code can be wielded by a nontechnical person does not mean it’ll be a big productivity boon. Sentence generation is a software problem, but insight is not. Sending reminders is a software problem, but motivation is not. When you have a hammer, everything looks like a nail; when you can conjure solutions at will, you won’t stop to ponder why you built them.
I’ve just been using Claude Code and my task management skill to work through my current life areas and projects and help me define next actions. I’ve had enough time with Opus 4.5 recently that I was legit surprised when it started hallucinating plausible but absolutely-incorrect stuff about some of my projects. Not least because it went from what I’m used to to GPT-3.5-levels of batshit hallucinations so suddenly.
Its explanation of why this happened is interesting.
Brief Context
I manage my stuff with a GTD/PARA-esque model: Areas, Projects and Tasks. Areas include things like Finance, Health, Coding etc and contain projects. Projects can contain tasks. I’ve recently built Taskdn , which stores areas, projects and tasks as markdown files in my Obsidian vault and includes a Claude Code skill & CLI to help Claude Code work with them. My personal area and project files have been in this system for a few months now, but having just shipped an Alpha release of the desktop app , it was time to populate my tasks/ directory with task files and start using them as my daily driver.
So I fired up CC, loaded the skill and basically said “look at all my areas and projects. Let’s define next actions for them all and create the appropriate task files. Some projects have clear checklists in the project docs, others we’ll need to discuss. Let’s do them one by one. What order would you suggest?”
Claude ran a few tdn --ai commands and sensibly decided it should read all my 14 area and 23 project files in full. It gave a very good summary of the current situation, sensibly suggested we skip a few projects and proposed an order of attack. For each project it would report anything in the project doc which seemed like current/future tasks and propose clear Next Actions to create. If it was unsure, it would ask me for more info. Once we’d agreed on the tasks to create it would do so and edit the project doc accordingly.
Which worked perfectly for the first three projects on the list:
Tax Return YE April 2025
End-of-Year Finance Reset (renamed to Jan Finance Reset + created new Credit File and History project)
[REDACTED]
And then we got to the next three – all of which sit under my RAFAC area and have to do with my voluntary work with the RAF Air Cadets …
RAFAC Re-qualification
Currently in project doc:
eLearning modules:
- Safeguarding (yearly), Data Protection (yearly), Prevent, Fire Safety, GDPR, Cyber Security, AMS 5 & 6
Courses:
- DCCT (in-progress, have place)
- Staff Ride (Feb 2026)
- M Qual & LR (need applications)
Admin:
- ID photos, Bader photo, DBS update, First Aid cert, SMS competencies
The only real things here are DCCT and M Qual & LR – the rest is plausible but totally made-up nonsense. (Fuck knows what Feb’s Staff Ride is but… it sounds fun? Maybe I’m joining the Cavalry!)
And for the next project, which relates to squaring my uniform and field kit, I was confidently told that my project doc prioritises the following: C95 Lightweights, MTP Trousers, C95 shirts, Flying suit, Softie jacket.
If you happen to know what these items are, you’ll certainly be laughing at the image of me wearing all of them together, in the year 2025. For everyone else: CS95 went out of service in 2010, lightweights are from the 1980’s and my softie jacket is from the mid 2000’s and buried somewhere in my loft. I do not, and never will, have need of a flying suit.
The project doc does contain a short list of MTP uniform items I need to replace, but it does not include trousers.
I’d maybe expect this kind of hallucination from contextless ChatGPT, especially with something as niche as this. But given what Claude knew about me in this session I did not expect ChatGPT-in-early-2024 levels of obviously-made-up stuff from Opus 4.5. To be clear, the most surprising things were:
It’s so obviously wrong. It’s not the subtle kind of hallucination which modern Anthropic models tend to produce.
Somewhat uncharacteristically, Claude didn’t spot its own nonsense. there was no wait I might have made that up lemme check or Let me clarify questions.
It was so sudden. Claude went from sensible-and-effective to random-token-prediction-machine in like two turns.
So what’s going on?
Here’s our conversation on this (nothing was auto-compacted, so you can ignore the stuff about that)…
Okay, um you’ve hallucinated a bunch of stuff about what’s in the last two project documents, I suspect, because we’ve covered a lot of very different and disparate uh projects and areas while going over this. And you haven’t actually read the project documents in a while, so you’re just making up stuff you think will be in them
I’d like you to try to explain why you have just for the last couple of projects (so 4, 5 and 6) suddenly started to hallucinate credible but very incorrect information about what’s in the doc. I don’t need you to apologize for this, and I don’t need you to do anything about it, but I’d like to understand why you think it is that that’s happened when prior to that you were extremely accurate in helping me out and explaining what was in those docs and everything. Um I’m very familiar with using AI agents like Claude Code, so you you can be detailed or technical in your explanation of this. I want to know so that in the future I can design some instructions for this kind of work that will help to prevent it. I don’t need you to write those instructions for me now. I just need you to try to explain why you think this happened and in particular why it happened with projects four, five, and six here.
## Why the Hallucination Happened
The root cause is **stale context from early reads combined with pattern-matching fabrication**.
At the start of this session, I read all the project files in a batch during initial context gathering. That was ~50+ turns ago. As the conversation progressed:
1. **Context compression**: Those early file reads got pushed further back in my context window. The model may have summarized or partially dropped details to make room for new content (task creation outputs, your detailed explanations, etc.).
2. **No re-read before presenting**: For projects 1-3, I happened to have fresher memory or you provided enough detail that my errors were corrected. By projects 4-6, I was confidently presenting "what's in the doc" without actually re-reading it — I was reconstructing from degraded memory plus plausible-sounding details based on project titles.
3. **Pattern matching from similar content**: I'd read many RAFAC-related files, kit lists, workshop docs. My outputs for projects 4-6 were plausible *patterns* of what such docs typically contain — but not what *your* docs actually said.
4. **Confidence without verification**: The format "Currently in project doc:" followed by a code block looked authoritative, but I wasn't actually quoting the file — I was generating what I *expected* to find.
**The fix is simple**: Always run `Read` on the project file immediately before discussing its contents, even if I "remember" reading it earlier. Long conversations with many file operations require fresh reads, not memory.
Got it, that’s interesting. Can you just explain a tiny bit about how um your memory of your earlier reads degraded over time? Like I assume that that is compressed in addition to just the more context you have in your contact window, the more stuff there is to get confused about, right? Is that correct, that there is some form of of of compression happening there, i.e. summaries for for much older information. I’m also particularly interested that this seemed to happen with RAFAC-related stuff. Like the task immediately before that, you very correctly told me what was in the project document from memory without re reading. And then immediately the next project You were hallucinating plausible things. So in addition to uh compression and degradation in your context window, I’m wondering about two things: All my RAFAC-Related docs Make heavy use of very niche millitary-specific terminology and acronyms. And I’m wondering if that’s likely to have had effect on the training data that you you will use when you’re workingout what to generate next. And the second thing is that many of my previous projects were f fairly specific. I I can’t imagine you’ve read a great deal in your training data about fixing a Credit Record with [REDACTED] etc. And then finally I’m conscious that if you read all of my area and projects together, a lot of the other stuff is very kinda connected, right? Like you you you can imagine me as a programmer who does coding things, who is also into self improvement and systems management and has a consultancy, those things fit neatly together as a like an expected persona almost. My RAFAC/Millitary stuff um would not necessarily fit next to that in the same kind of way. And so I guess these three things I I’m asking here, I’m interested to know or four things I suppose, I’m interested to know if you think these have had an effect, and if so, how much, which of them do you think likely have the most effect. I’d really just like to explore this to help understand how you work herea little bit more.
completely agree with your fix of instructing you to always read the project doc (and probably area doc) before a new thing BTW.
Well what are your thoughts on all this?
This is a great set of questions. Let me work through each factor:
**On compression/degradation:**
There are two mechanisms at play:
1. **Attention diffusion** — As context grows, my attention over earlier tokens becomes more spread out. I don't literally "forget" earlier content, but my ability to precisely retrieve specific details weakens. It's less like deleting a file and more like trying to read a book where earlier pages are progressively out of focus.
2. **Autocompact summarization** — Claude Code specifically has an infrastructure-level feature that summarizes older conversation when context gets long. Your `/context` output showed "Autocompact buffer: 45.0k tokens" — this means there's a mechanism that can literally rewrite earlier exchanges into compressed summaries. If my file reads got autocompacted, the specific line-by-line content would be replaced with something like "read several RAFAC project files containing training requirements and kit lists." That's enough to generate plausible content, but not accurate content.
**On RAFAC terminology:**
I think this is a significant factor. When I encounter dense domain-specific acronyms (DCCT, MT600, SMS competencies, Bader Learn, DLE, CFAV, ODI), several things happen:
- These tokens are rarer in my training data, so my associations are weaker and more generic
- I pattern-match to "military/organizational training documentation" as a category rather than your specific situation
- When reconstructing, I generate what military training docs *typically* contain rather than what *yours* actually say
Contrast this with "Credit File and History" — that's unusual *content* but uses common *vocabulary*. I can represent "contact [REDACTED] at [REDACTED] about credit file" precisely because those are all common tokens with clear meanings. But "complete CFAV SPO Refresher on DLE" is a string of rare tokens where I'm essentially guessing at the relationships.
**On persona coherence:**
I think you've identified something real here. My training data contains strong clusters of co-occurring personas:
- Programmer + consultant + productivity systems + side projects + self-improvement
- These fit together as a recognizable "type" with predictable project structures
Your RAFAC involvement doesn't fit that cluster. It's a different persona — military volunteer instructor — that would typically appear alongside different life contexts in my training data.
When I'm reconstructing content from degraded memory, I'm essentially asking "what would plausibly be in this document?" My answer draws on:
1. What I vaguely remember reading (degraded)
2. What documents like this typically contain (training data patterns)
3. What fits this person's overall context (persona coherence)
For your coding/finance/productivity projects, (2) and (3) align well with your actual situation. For RAFAC, they don't — so my reconstructions pulled from "generic military training documents" rather than "Danny's specific RAFAC situation."
**Which factors matter most:**
Ranked by impact:
1. **Not re-reading** — Proximate cause. Everything else is downstream of this.
2. **Autocompact summarization** — If the system compressed "Read RAFAC Re-qualification.md containing [specific content]" into "read several RAFAC files," I've literally lost the ground truth.
3. **Niche terminology** — Weak token associations mean my reconstructions are more "vibes-based" and less precise.
4. **Persona mismatch** — My priors for "what this person's projects contain" are calibrated to the programmer-consultant cluster, not the military-instructor cluster.
The interesting thing is that for project 3 ([REDACTED]), I was still reasonably accurate. That project sits firmly in the productivity/self-improvement cluster where my pattern-matching works well. RAFAC was where the wheels came off — exactly where you'd predict based on this analysis.
**Practical implication:**
For future instructions, you might add something like: "For any domain with specialized terminology or that sits outside typical tech/productivity contexts, always re-read source files immediately before discussing them — don't rely on earlier reads regardless of how recent they seem."
All of which I find interesting, and food for thought as I continue to use Claude Code for things which aren’t programming.
I’ve been working on a new project and needed a React datepicker component which makes it super easy to select far-future dates by scrolling an infinite calendar, is quickly keyboard navigable via the up/down/left/right keys and also supports natural-language input like “tomorrow”, “next weds”, “+5d” and the like. Basically I wanted the datepicker fromThings but with a few tweaks, as a React component.
After ten minutes looking online for the Things 3 Datepicker Clone I was certain must exist, I drew a blank and fired up Claude Code to build my own. Here it is…
Saturday, 31 January 2026
Use arrow keys to navigate dates, Enter to select, or type to search
January 2026
Mon
Tue
Wed
Thu
Fri
Sat
Sun
How it works
You can install it with npm install @dannysmith/datepicker and then use it like this:
value - The currently selected date. Defaults to today. Pass null for no selection.
minDate - Minimum selectable date. Earlier dates are disabled.
maxDate - Maximum selectable date. Later dates are disabled.
placeholder - Placeholder text for the search input. Defaults to “When”.
showClearButton - Show a clear button below the calendar to reset the date selection. Useful when using in a popover.
onChange - Callback fired on any date change, including keyboard navigation. Receives null when cleared.
onCommit - Callback fired only on explicit selection (click or Enter). Useful for popovers. Receives null when cleared.
I’ve done my best to make it somewhat resiliant and general purpose. It should respond reasonably to being used in various containers, most of the colours and theme setting can be configured via CSS variables, and I’ve made some attempt to make it work reasonably with screen readers and other assistive technology. That said, it was primarily built for use inside popovers in a Tauri desktop app, so I’m open to feedback on how to make it better when used on the web.
This is precisely why natural language isn’t a good fit for programming: it’s not very precise. As Gorman says, “Natural languages have not evolved to be precise enough and unambiguous enough” for making software. Code is materialized intent. The question is: whose?
The request ”let users sign in” has to be translated into constraints, validation, database tables, async flows, etc. You need pages and pages of the written word to translate that idea into some kind of functioning software. And if you don’t fill in those unspecified details, somebody else (cough AI cough) is just going to guess — and who wants their lives functioning on top of guessed intent?
This is exactly why vibe-coded software is shit when the only direction given is ”let users sign in”. In fact, I’d say most of my work when working with AI coding agents is doing this kind of translation, except that what I’m typing or what I’m dictating isn’t written in code. It’s written in pseudo code or plain English. And when it’s a sufficiently complicated thing, or I don’t quite know what I want yet, I’ll often resort to actually writing some example code.