In Praise of Memory
Offloading must follow internalization
Everything we see, hear, and think about is critically dependent on and influenced by our long-term memory.
Kirschner, Sweller, & Clark (2006)
When is it appropriate for a student to use an LLM? It’s a hotly debated topic, and not one I expect to resolve here. However, it forces open quite a few issues and assumptions. We’ll take one of them today. AI as a tool for cognitive offloading—a means of displacing thinking tasks onto external systems—reveals something about our assumptions regarding the role of memory in education.
For decades, dominant pedagogical models in American public education have downplayed the role of memorization. In particular, Constructivist approaches1, despite their descriptive accuracy of learning as a process, do not lead to effective pedagogical strategies because they ignore human cognitive architecture (Kirschner et al., 2006; Kirschner et al., 2004). Constructivist models presume that novices can learn like experts through inquiry and discovery, yet this overlooks the limited capacity of working memory and the essential role of prior knowledge (this is the expertise reversal effect: unguided methods benefit only the knowledgeable.) Yet, the aims were always well-intentioned. We only de-emphasized factual knowledge in order to elevate skills (even though they are two sides of the same coin). We had lofty goals to teach students to “learn how to learn,” to “think critically,” and become researchers of their own understanding. The teacher in these models was, to use an old adage, a guide on the side, rather than a sage on the stage.
The common refrain was as follows: why commit something to memory when a search bar, chatbot, or AI tutor can retrieve it in seconds? The logic is seductive: free up working memory for “higher-order” thinking. Yet, seductive as it may be, the logic is built on a misunderstanding of how the brain works and how humans learn.
Let’s state it up front. You may not agree with what follows. In fact, I hope some of you don’t. Even still, I’ll build the argument piece by piece, and along the way, I invite you to note where you diverge, and I'm sure that can open up an interesting conversation.
Here’s my thesis: Declarative knowledge is the substrate for higher-order thinking. You cannot evaluate what you do not understand. You cannot synthesize ideas that were never explained. Moreover, we cannot effortlessly assimilate declarative academic knowledge. Knowledge and competence do not spread via osmosis. We require deliberate practice and repeated retrieval.
Constantly looking things up instead of internalizing them results in shallow fluency and fragile understanding2. This implies that, despite an abundance of external data and information—constantly available on Google, Wikipedia, and ChatGPT—insight and wisdom still emerge from internal knowledge. Underuse of the brain's declarative3 and procedural4 memory systems undermines reasoning, impedes learning, and diminishes productivity in novel problem-solving. When learners chronically offload information—definitions, examples, methods—they fail to form the durable memory traces that underlie comprehension and transfer. The issue, to be clear, is not offloading per se, but premature offloading before memory traces are formed. Domain expertise necessarily means a command of some set of facts core to that discipline, which, when committed to long-term memory, supports problem-solving within that domain as working memory capacity is freed up (Deans for Impact, 2015). Moreover, looking something up is not a free lunch. It incurs a cognitive cost, involves task switching, and requires reconciling what we just looked up with the argument or task at hand. Knowledge committed to memory is accompanied by a level of contextual awareness, not provided by a Google Search or zero-shot prompt.
I am, therefore, not advocating for taking a bunch of disconnected facts and banging them back. I understand the need for simple frameworks and mental models as scaffolding to hang information and knowledge on. Yet it is through the very processes of retrieval, spacing, and interleaving that we build the factual bedrock necessary for the construction of mental models. We can’t simply short-circuit the effortful process of learning and jump to high-order thinking. Experiences leave traces that shape perception, guide attention, and influence judgment. The twinge we feel when we see an incorrect instance of something we know to be true tells us as much. We do not always know we know something, but our ability to think well often depends on knowledge that has quietly become automatic. This is a type of transfer from declarative to procedural knowledge through practice, retrieval, and application.
When language models serve as permanent proxies for memory, they erode the very capacities they aim to support. The student who never memorizes anything becomes fluent only in searching. But search fluency is not mastery. It’s a simulation of knowing, not knowing itself. In this sense, internal knowledge enables us to filter out noise, detect patterns, ask meaningful questions, and identify contradictions. Without it, there is no ground on which to build higher thought.
Offloading, again, is not the enemy. Used judiciously, it extends our capabilities. But used indiscriminately, it displaces them. The paradox is not that we use tools to help us think. It’s that we begin to mistake the tools for the thinking itself.
Even if you can retrieve a fact, that retrieval is not the same as integration. Knowing that 12 * 5 = 60 is different from seeing that fact as part of a larger schema of multiplication, number sense, and problem-solving ability. A language model might provide an elegant answer. However, the human learner must still possess the cognitive structure to make sense of it, remember it, and apply it.
That said, not all uses of AI displace memory. There are important and growing exceptions. When used to generate low-stakes quizzes, summarize key ideas for later review, or support retrieval practice, LLMs can enhance memory formation rather than hinder it. For example, a student using GPT to self-test on key terms from a biology unit—then correcting and explaining their errors—is engaging in exactly the kind of elaborative retrieval that supports durable learning5.
Likewise, students with learning disabilities or executive functioning challenges may benefit from structured offloading. AI tools can reduce extraneous cognitive load by organizing information, chunking tasks, or modeling step-by-step thinking.
Even expert thinkers offload routinely. Mathematicians use symbolic tools like WolframAlpha not because they lack understanding, but because they’ve built the internal schemas necessary to interpret, verify, and apply results. LLMs can serve similar purposes to augment rather than substitute human reasoning if the foundational knowledge is already in place.
The key distinction, then, is not whether we offload, but when and why we do so. Offloading must follow, not precede, internalization. It must support reflection, not shortcut it. In this sense, responsible use of AI in education is not about refusing offloading altogether, but about building the internal infrastructure that makes offloading useful in the first place.
References
Deans for Impact. (2015). The science of learning. Austin, TX: Deans for Impact.
Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during instruction does not work: An analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist, 41(2), 75–86. https://doi.org/10.1207/s15326985ep4102_1
Kirschner, P. A., Martens, R. L., & Strijbos, J.-W. (2004). CSCL in higher education? A framework for designing multiple collaborative environments. In P. Dillenbourg (Series Ed.) & J.-W. Strijbos, P. A. Kirschner, & R. L. Martens (Vol. Eds.), Computer-supported collaborative learning: Vol. 3. What we know about CSCL … and implementing it in higher education (pp. 3–30). Boston, MA: Kluwer Academic.
I am painting with a broad brush here, and to steelman the case for Constructivist approaches, we should note the range of ideas before us. Vygotskian models emphasize scaffolding. Montessori methods combine memorization with autonomy. Problem-based learning in medicine can work with strong guidance.
1.1. If I were to attempt an answer to the timing problem, it might look like: Guided judgment and gradual release. Early learning demands structure, modeling, and retrieval. As fluency develops, AI can be introduced as a supplement, first for reinforcement, then for extension. The transition is not a binary switch but a tapering scaffold. What matters most is that students don’t mistake access for understanding, or fluency in search for fluency in thought.
This could be correlation, or there might be confounding variables (like the quality of instruction or the nature of what's being looked up). More work needs to be done in this area.
Declarative memory = factual knowledge (what something is).
Procedural memory = how to perform tasks (how to do it).
This does raise a timing problem that I don’t have a great answer for. If my central claim is that offloading should follow internalization, I create a chicken-and-egg problem. If students need foundational knowledge to use tools effectively, but tools could help them acquire that knowledge more efficiently, when exactly should the transition occur?
Cover Image: The Disintegration of the Persistence of Memory (Dali, 1954). It felt fitting.

