Large language models, explained: how AI chatbots actually work

If you have used an AI chatbot in the last year, you have interacted with a large language model, or LLM. These systems can draft an email, explain a tax form or write code — but very few people know what is actually happening under the hood. Here is a plain-English explanation.

Prediction, not understanding

At its core, a large language model is a prediction machine. Given some text, its only real task is to guess the next chunk of text — a word or piece of a word known as a “token.” Ask it “The capital of France is” and it will predict “Paris,” not because it knows geography the way you do, but because in the vast amount of text it was trained on, “Paris” almost always follows that phrase.

String millions of these predictions together and something remarkable emerges: fluent, often useful language. But it is worth remembering that the model is pattern-matching at enormous scale, not reasoning from facts it “believes.”

Training in two stages

Modern chatbots are built in roughly two phases. First comes pre-training, where the model reads a huge volume of text and learns statistical patterns of language. This is expensive and slow, and it produces a model that can complete text but is not especially helpful or safe.

The second phase is fine-tuning, including a technique called reinforcement learning from human feedback. Here, people rate the model’s answers, and those ratings nudge it toward responses that are helpful, honest and harmless. This second stage is what turns a raw text-predictor into something that feels like an assistant.

Why chatbots “hallucinate”

Because the model generates the most plausible-sounding continuation rather than retrieving verified facts, it can state falsehoods with total confidence. This is called hallucination, and it is the single most important limitation to understand. An LLM does not know what it does not know. When accuracy matters, its output should always be checked against a reliable source.

Context windows and memory

An LLM has no memory between conversations unless a system is built around it to provide one. Within a single conversation, it can only “see” a limited amount of text at once — its context window. Everything beyond that window effectively disappears. Recent models have dramatically larger context windows, which is why they can now summarize long documents that older models could not.

What this means for you

Treat a chatbot as a fast, fluent, occasionally unreliable assistant. It is excellent for drafting, brainstorming, summarizing and explaining. It is weakest at anything requiring guaranteed factual precision, current events beyond its training, or genuine reasoning about numbers. Used with that mental model — powerful helper, not oracle — LLMs become genuinely useful tools rather than a source of confident-sounding mistakes.