AI Glossary: Training, Fine-Tuning & Data

AI Glossary: Training, Fine-Tuning & Data

Understanding how models are built and adapted helps designers recognize capability sources, appreciate customization options, and understand training-related behaviors and limitations.

Pre-Training Concepts

Pre-training

The initial, computationally intensive phase of training on massive unlabeled text. During pre-training, models learn language patterns, grammar, factual knowledge, and reasoning capabilities by processing billions to trillions of tokens. This creates "foundation models" later adapted through fine-tuning.

Reference: Devlin, J. et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", NAACL 2019

Self-Supervised Learning

Learning where models generate their own training labels from data structure rather than human annotation. For LLMs, this involves predicting masked or next tokens—the model learns by checking predictions against actual text. This enables training on vast unlabeled internet text.

Reference: Devlin, J. et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", NAACL 2019

Next-Token Prediction

The core training objective for decoder models: predict the most probable next token given all preceding tokens. Despite simplicity, this objective underlies pre-training, fine-tuning, and inference for all causal language models.

Reference: Radford, A. et al., "Language Models are Unsupervised Multitask Learners" (GPT-2), OpenAI 2019

Masked Language Modeling (MLM)

Self-supervised technique used by encoder models like BERT: randomly mask ~15% of tokens, then predict them from surrounding bidirectional context. Effective for understanding tasks rather than generation.

Reference: Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", NAACL 2019

Training Corpus

The complete dataset used for training, typically containing billions to trillions of tokens from diverse sources. Corpus quality, diversity, and size directly impact model capabilities. Modern corpora undergo extensive filtering for quality, deduplication, and safety.

Reference: Brown, T. et al., "Language Models are Few-Shot Learners" (GPT-3), NeurIPS 2020

Data Curation

Filtering, cleaning, and organizing training data for quality and diversity—removing duplicates, toxic content, OCR errors, and PII while balancing domain representation. Well-curated smaller datasets can outperform larger unfiltered ones.

Reference: Gao, L. et al., "The Pile: An 800GB Dataset of Diverse Text for Language Modeling", 2020

Scaling Laws

Mathematical relationships describing how performance improves predictably as you increase parameters, data, and compute. The "Chinchilla" laws established optimal training requires ~20 tokens per parameter. Recent "overtrained" models use 10-75x more tokens for better inference efficiency.

Reference: Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., et al., "Scaling Laws for Neural Language Models", OpenAI 2020

Compute Budget

Total computational resources for training, measured in FLOPs or GPU-hours. Training frontier models costs millions of dollars, so teams carefully balance model size, dataset size, and training duration.

Reference: Kaplan, J. et al., "Scaling Laws for Neural Language Models", OpenAI 2020

Fine-Tuning Methods

Fine-tuning

Adapting pre-trained models to specific tasks by continuing training on targeted datasets. Far more efficient than training from scratch—requires orders of magnitude less data and compute while leveraging general pre-trained knowledge.

Reference: Howard, J. & Ruder, S., "Universal Language Model Fine-tuning for Text Classification" (ULMFiT), ACL 2018

Supervised Fine-Tuning (SFT)

Training on labeled input-output pairs with human-created examples. For chat models, this involves conversations demonstrating desired behavior. SFT teaches format and style; it's typically the first step after pre-training and before alignment techniques.

Reference: Ouyang, L. et al., "Training language models to follow instructions with human feedback" (InstructGPT), NeurIPS 2022

Instruction Tuning

Fine-tuning to follow natural language instructions across diverse tasks. Training data consists of (instruction, response) pairs covering Q&A, summarization, coding. Models like FLAN-T5 demonstrate that instruction tuning dramatically improves generalization to new tasks.

Reference: Wei, J. et al., "Finetuned Language Models Are Zero-Shot Learners" (FLAN), ICLR 2022

RLHF (Reinforcement Learning from Human Feedback)

Three-step alignment technique: (1) collect human preference data comparing outputs, (2) train a reward model on preferences, (3) use reinforcement learning (PPO) to optimize against the reward model. RLHF enables training on subjective qualities like helpfulness and safety.

Reference: Christiano, P., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D., "Deep Reinforcement Learning from Human Preferences", NeurIPS 2017

RLAIF (RL from AI Feedback)

RLHF variant where AI models generate preference labels instead of humans, enabling scalable alignment without extensive human annotation. Research shows comparable performance while dramatically reducing cost.

Reference: Lee, H. et al., "RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback", 2023

DPO (Direct Preference Optimization)

Simpler RLHF alternative that skips reward model training, directly optimizing on preference data via classification loss. Introduced in 2023, DPO requires only two model copies (vs. four for PPO), is more stable, and has become widely adopted in open-source training.

Reference: Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D., & Finn, C., "Direct Preference Optimization: Your Language Model is Secretly a Reward Model", NeurIPS 2023

PPO (Proximal Policy Optimization)

Reinforcement learning algorithm used in RLHF, updating the model while constraining changes to stay close to previous behavior. Requires significant resources (four model copies) but remains the method used by leading labs.

Reference: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O., "Proximal Policy Optimization Algorithms", OpenAI 2017

Reward Model

Model trained to predict human preferences, assigning scores based on helpfulness, accuracy, and safety. In RLHF, reward models provide feedback signals during RL training. They can suffer from "reward hacking" where LLMs achieve high scores without actual quality improvement.

Reference: Ouyang, L. et al., "Training language models to follow instructions with human feedback" (InstructGPT), NeurIPS 2022

Constitutional AI (CAI)

Anthropic's alignment technique using written principles (a "constitution") to train systems to be helpful, harmless, and honest. CAI combines self-critique, revision, and RLAIF—enabling scalable oversight with transparent, adjustable values.

Reference: Bai, Y. et al., "Constitutional AI: Harmlessness from AI Feedback", Anthropic 2022

Efficient Training

LoRA (Low-Rank Adaptation)

Parameter-efficient fine-tuning injecting small trainable matrices into transformer layers while freezing pre-trained weights. Trains only 0.1-1% of parameters while achieving comparable performance to full fine-tuning. Makes fine-tuning accessible on consumer hardware.

Reference: Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W., "LoRA: Low-Rank Adaptation of Large Language Models", ICLR 2022

QLoRA (Quantized LoRA)

Combines 4-bit quantization with LoRA, enabling fine-tuning massive models on a single GPU. Keeps base model in 4-bit precision while training adapters in higher precision. Achieves performance comparable to full 16-bit fine-tuning with dramatically reduced memory.

Reference: Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L., "QLoRA: Efficient Finetuning of Quantized LLMs", NeurIPS 2023

PEFT (Parameter-Efficient Fine-Tuning)

Umbrella term for techniques fine-tuning only small parameter subsets—LoRA, adapters, prefix tuning, prompt tuning. Benefits include reduced costs (10-100x), smaller checkpoints (MBs vs GBs), faster training, and lower overfitting risk.

Reference: Houlsby, N. et al., "Parameter-Efficient Transfer Learning for NLP", ICML 2019

Quantization

Reducing weight precision from 32/16-bit floating point to 8/4-bit integers. Dramatically reduces memory and speeds inference with minimal accuracy loss. Essential for deploying LLMs at scale and on edge devices.

Reference: Jacob, B. et al., "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", CVPR 2018

Knowledge Distillation

Compression technique where smaller "student" models learn to mimic larger "teacher" models by learning from probability distributions (soft labels) rather than hard labels. NVIDIA's Minitron shows 40x reduction in training tokens compared to from-scratch training.

Reference: Hinton, G., Vinyals, O., & Dean, J., "Distilling the Knowledge in a Neural Network", NeurIPS Deep Learning Workshop 2015

Pruning

Removing unnecessary parameters to reduce size and inference cost. Structured pruning removes entire layers or attention heads; unstructured pruning zeroes individual weights. LLMs can be compressed 2-4x while maintaining performance.

Reference: Frankle, J. & Carbin, M., "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks", ICLR 2019

Data Concepts

Training Data

Examples teaching models patterns and behaviors. Quality, diversity, and representativeness fundamentally determine capabilities and biases. Frontier models train on hundreds of billions to trillions of tokens.

Reference: Brown, T. et al., "Language Models are Few-Shot Learners" (GPT-3), NeurIPS 2020

Synthetic Data

Training data generated by AI rather than collected from real sources. Addresses data scarcity, privacy, and annotation costs. Used for fine-tuning, alignment, and pre-training augmentation. Concerns include "model collapse" and bias amplification.

Reference: Wang, Y. et al., "Self-Instruct: Aligning Language Models with Self-Generated Instructions", ACL 2023

Data Contamination

When evaluation benchmark data appears in training sets, inflating performance scores. Studies show models achieve 4.9x higher scores on leaked samples. Detection methods include membership inference and perplexity analysis.

Reference: Sainz, O. et al., "NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark", EMNLP 2023

Memorization

When LLMs reproduce verbatim training sequences rather than generating novel text. Raises privacy concerns (personal information, contact details) and copyright issues. Research shows over 1% of outputs can be copied verbatim.

Reference: Carlini, N. et al., "Extracting Training Data from Large Language Models", USENIX Security 2021

Deduplication

Removing duplicate content from training datasets—critical because duplicated data increases memorization, reduces efficiency, and inflates test scores. Modern pipelines remove 20-30% of raw web data as redundant.

Reference: Lee, K. et al., "Deduplicating Training Data Makes Language Models Better", ACL 2022

Data Poisoning

Security attack inserting malicious data into training sets to compromise model behavior. Anthropic research (2025) showed just 250 poisoned documents can backdoor LLMs regardless of model size.

Reference: Wallace, E. et al., "Concealed Data Poisoning Attacks on NLP Models", NAACL 2021

Dataset Bias

Systematic training data patterns leading to unfair model behaviors—demographic underrepresentation, historical prejudices, selection bias. Manifests as stereotyping, discriminatory recommendations, or disparate performance across groups.

Reference: Bolukbasi, T. et al., "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings", NeurIPS 2016


This glossary is part of a series covering AI and LLM concepts for product designers. Terms without authoritative references are noted for tracking.

Read more