DevFlow
Lesson

What Are Large Language Models?

Part of LLM Fundamentals in the AI Engineering Foundations learning path.

15 min read
Abstract visualization of tokens flowing through a layered language model and into structured outputs.
Abstract visualization of tokens flowing through a layered language model and into structured outputs.

Why LLMs Matter

Large language models are prediction engines trained on enormous text corpora. Their job is simple at inference time: given a sequence of tokens, predict the next most likely token.

That sounds narrow, but the emergent behavior is powerful. Summarization, code generation, question answering, extraction, translation, and reasoning-like output all come from repeating that next-token prediction loop under different constraints.

The Three Core Ideas

1. Tokens, Not Words

LLMs do not read text the way humans do. They operate on tokens, which are chunks of text created by a tokenizer.

typescript
11 lines
1const input = "Playwright makes browser automation reliable."
2
3const tokens = [
4 "Play",
5 "wright",
6 " makes",
7 " browser",
8 " automation",
9 " reliable",
10 ".",
11]

Tokenization matters because cost, latency, and context limits are usually measured in tokens rather than characters or sentences.

2. Context Windows

The model only knows what fits inside its context window: the current prompt, system instructions, chat history, retrieved documents, and generated output so far.

If important information does not fit in context, the model cannot reliably use it. That is why chunking, retrieval, and prompt structure become architectural concerns in real systems.

3. Probability Over Certainty

An LLM does not fetch a hard-coded answer from a database. It samples from a probability distribution shaped by its training and your prompt.

The model is not "searching its memory" in the human sense. It is producing the statistically most plausible continuation based on the tokens it sees.

Training vs. Inference

Training is the expensive phase where the model learns patterns from large datasets. Inference is the runtime phase where your application sends prompts and receives outputs.

For most engineers building products, inference is the operational concern:

  • prompt design controls behavior
  • context design controls relevance
  • model choice controls cost, latency, and quality
  • evaluation controls trust

What Makes LLMs Useful in Products

CapabilityTypical Product Use
GenerationDrafting documentation, email, UI copy
TransformationSummarization, rewriting, translation
ExtractionPulling entities, dates, actions, and metadata
ClassificationRouting tickets, tagging content, moderation

Practical Engineering Implications

When you design with LLMs, think in systems terms instead of prompt-only terms.

  1. The prompt is just one layer of control.
  2. Context assembly is part of product architecture.
  3. Evaluation is mandatory because outputs are probabilistic.
  4. Guardrails matter when the system can take action or affect users.

Key Takeaways

  • LLMs generate output token by token.
  • Context window limits directly shape product behavior.
  • Prompting works best when paired with retrieval, evaluation, and system design.
  • Understanding the token-level model helps you make better engineering decisions upstream.