Glossary of AI Terms
A comprehensive glossary of artificial intelligence terms and definitions.
A
A/B testing
Compare two variants (A and B) to measure which performs better for a defined goal (e.g., click‑through rate).
Accuracy
The proportion of correct predictions out of all predictions; best for balanced class distributions.
Activation function
Nonlinear function in a neuron that enables networks to model complex relationships (e.g., ReLU, sigmoid, GELU).
Active learning
Machine learning approach where the model selects the most informative data points for labeling to improve performance efficiently.
Actor–critic
Reinforcement learning architecture combining a policy model (actor) and a value model (critic).
Adagrad
Optimizer that adapts the learning rate for each parameter based on historical gradients.
Adam Optimizer
Adaptive Moment Estimation optimizer combining momentum and RMSProp techniques.
Adaptive Boosting (AdaBoost)
Ensemble method that combines weak learners into a strong classifier by focusing on misclassified examples.
Adaptive learning rate
Optimization technique where the learning rate changes during training based on performance or gradient history.
Adversarial example
Input crafted with subtle perturbations to mislead a model into incorrect predictions.
Agent
An autonomous system that perceives an environment and takes actions to maximize cumulative reward or achieve goals.
Agent-based Modeling
Simulation of actions and interactions of autonomous agents to assess their effects on a system.
AI (Artificial Intelligence)
Broad field focused on building systems that perform tasks requiring human‑like intelligence such as learning and reasoning.
AI Accelerator
Specialized hardware designed to speed up AI computations, such as TPUs or GPUs.
AI ethics
Principles and practices to ensure fairness, accountability, transparency, and safety in AI systems.
AIOps
Application of AI/ML to automate and enhance IT operations, including anomaly detection and incident response.
Alignment
Ensuring model objectives and behaviors are consistent with human values, intent, and safety constraints.
Alpha-Beta Pruning
Search algorithm optimization that reduces the number of nodes evaluated in minimax.
AlphaGo
DeepMind’s AI system that defeated human champions in the game of Go using deep reinforcement learning.
Analytical Engine
Charles Babbage’s proposed mechanical general-purpose computer, a precursor to modern computing.
Anchor box
Predefined bounding box shape used in object detection models to predict object locations.
Anchor Text Embedding
Vector representation of hyperlink anchor text for semantic search.
Anisotropic Filtering
Image processing technique to enhance texture quality at oblique viewing angles.
Anomaly detection
Identifying rare items, events, or observations that deviate significantly from the majority of data.
Anonymization
Process of removing personally identifiable information from data to protect privacy.
API
Defined interface that allows software to communicate, commonly used to deploy and consume AI services.
Approximate nearest neighbor search
Fast method for finding points in high‑dimensional space that are close to a query point.
Approximation Error
Difference between the true function and the best possible model within a hypothesis space.
Argmax
Function that returns the index of the maximum value in a list or array.
Artificial Life (ALife)
Study of life-like behaviors in artificial systems.
ASR (Automatic Speech Recognition)
Technology that converts spoken language into text.
Attention head
One of multiple parallel attention mechanisms in a transformer layer, each learning different relationships.
Attention Score
Weight assigned to a token or feature in an attention mechanism.
Auto-regressive Integrated Moving Average (ARIMA)
Statistical model for time series forecasting.
Autoencoder
Neural network trained to compress input into a latent representation and reconstruct it, useful for denoising and anomaly detection.
Automatic Differentiation
Technique to compute derivatives efficiently for optimization.
AutoML
Automated machine learning — tools and methods to automate model selection, training, and tuning.
Autoregressive model
Generates the next token conditioned on previous tokens; common in language modeling.
B
Backpropagation
Training algorithm that computes gradients of loss w.r.t. weights by propagating error backward through the network.
Bag-of-Words (BoW)
Text representation counting word occurrences without order.
Bagging
Ensemble method that trains multiple models on bootstrapped datasets and averages their predictions.
Balanced Accuracy
Average of recall obtained on each class, useful for imbalanced datasets.
Batch Gradient Descent
Gradient descent variant that uses the entire dataset for each update.
Batch normalization
Technique to stabilize and speed up training by normalizing layer inputs within a batch.
Bayes Theorem
Mathematical formula for conditional probability.
Bayesian inference
Probabilistic approach updating beliefs (priors) with data (likelihood) to obtain posteriors.
Beam search
Search algorithm that explores a subset of possible sequences to find the most likely output in sequence generation.
Beam Width
Number of sequences kept at each step in beam search.
Bellman Equation
Recursive equation for optimal policy value in reinforcement learning.
Bias (in AI)
Systematic error caused by skewed data or model design that leads to unfair or inaccurate outcomes.
Bias-Variance Tradeoff
Balance between underfitting and overfitting in model performance.
Big data
Extremely large datasets requiring specialized storage and processing to extract value.
Binary Cross-Entropy
Loss function for binary classification tasks.
Binary Search Tree
Data structure for efficient searching, insertion, and deletion.
BLEU
Metric for machine translation quality based on n‑gram overlap with reference translations.
BLEURT
Learned metric for evaluating text generation quality.
Bloom Filter
Probabilistic data structure for set membership testing.
Bootstrap (resampling)
Statistical method that estimates uncertainty by sampling with replacement from observed data.
Bottleneck Layer
Layer with fewer neurons to force feature compression.
Bounded rationality
Decision making under constraints like limited information, time, or computational resources.
Bounding Box
Rectangle defining the location of an object in an image.
Box-Cox Transformation
Statistical transformation to stabilize variance.
Branch and Bound
Optimization algorithm that systematically explores solution space.
Byte Pair Encoding (BPE)
Subword tokenization method for NLP.
C
Calibration
The alignment between predicted probabilities and observed frequencies (well‑calibrated models output reliable probabilities).
CapsNet Routing
Dynamic routing mechanism in capsule networks.
Capsule network
Neural network architecture that groups neurons into capsules to preserve spatial hierarchies.
Cascade Classifier
Series of classifiers applied sequentially to improve detection speed.
CatBoost
Gradient boosting library optimized for categorical features.
Categorical Cross-Entropy
Loss function for multi-class classification.
Causal inference
Methods to estimate cause‑and‑effect relationships beyond correlation, often using counterfactual reasoning.
Centroid
Mean vector of points in a cluster; used in algorithms like k‑means.
Centroidal Voronoi Tessellation
Partitioning of space into regions based on distance to centroids.
Chain Rule
Calculus rule for computing derivatives of composite functions.
Character-level Embedding
Vector representation of individual characters.
Checkpoint
Saved state of a model during training, allowing resumption or rollback.
Chi-Square Test
Statistical test for independence between categorical variables.
Class Activation Map (CAM)
Visualization highlighting image regions important for classification.
Class imbalance
When some classes are underrepresented; can degrade metrics like accuracy and require rebalancing strategies.
Classification
Task of assigning labels to inputs (binary or multi‑class), e.g., spam vs. not spam.
Click-through Rate (CTR)
Ratio of clicks to impressions in online systems.
Clustering
Unsupervised grouping of similar items without predefined labels.
Cold Start Problem
Difficulty in recommendations for new users or items.
Collaborative Filtering
Recommendation method based on user-item interactions.
Color Histogram
Representation of image colors and their distribution.
Combinatorial Optimization
Optimization over discrete structures.
Common Crawl
Open repository of web crawl data for NLP training.
Complexity Penalty
Regularization term discouraging overly complex models.
Computational Graph
Graph representation of mathematical operations in a model.
Concept drift
Gradual change in the statistical properties of the target variable over time.
Conditional GAN (cGAN)
GAN variant conditioned on additional information.
Conditional Probability
Probability of an event given another event has occurred.
Confounding Variable
Variable that influences both dependent and independent variables.
Confusion matrix
Table of true/false positives/negatives summarizing classification performance.
Connectionist Approach
AI approach using neural networks to model cognition.
Consensus Clustering
Combining multiple clustering results into one.
Constant Learning Rate
Fixed step size in optimization.
Constraint Satisfaction Problem (CSP)
Problem defined by variables, domains, and constraints.
Content-based Filtering
Recommendation method using item features.
Context window
Maximum number of tokens a language model can attend to at once.
Continuous Variable
Variable that can take any value within a range.
Contrast Ratio
Measure of luminance difference between colors.
Contrastive learning
Learning method that brings similar pairs closer and pushes dissimilar pairs apart in embedding space.
Convergence Criterion
Condition to stop iterative optimization.
Convex Hull
Smallest convex set containing all points in a dataset.
Convolutional neural network (CNN)
Neural architecture using convolutions for grid‑like data (images, audio spectrograms).
Cosine Similarity
Measure of similarity between two vectors based on cosine of angle.
Count Vectorizer
Tool to convert text into a matrix of token counts.
Covariance Matrix
Matrix capturing pairwise covariances between variables.
Cramér’s V
Measure of association between two nominal variables.
Cross-Entropy Loss
Loss function measuring difference between predicted and true probability distributions.
Cross‑validation
Technique to estimate generalization by training and testing on multiple folds of the data.
Curriculum learning
Training strategy that presents easier examples before harder ones to improve learning efficiency.
Curse of Dimensionality
Phenomenon where high-dimensional spaces make learning harder.
CycleGAN
GAN architecture for unpaired image-to-image translation.
D
Data augmentation
Techniques that expand training data via transformations (e.g., flips, noise) to improve robustness.
Data Cleaning
Process of correcting or removing inaccurate records from a dataset.
Data drift
Change in data distribution over time that can degrade model performance.
Data Imbalance
Unequal representation of classes in a dataset.
Data Labeling
Assigning meaningful tags to raw data.
Data lake
Centralized repository for storing raw structured and unstructured data at scale.
Data lineage
Tracking the origin, movement, and transformation of data through a system.
Data Pipeline
Series of steps to process and transform data.
Data Preprocessing
Preparing raw data for modeling.
Data Sampling
Selecting a subset of data for analysis.
Data Sharding
Splitting data across multiple storage systems.
Data Warehouse
Central repository for structured data.
Dataset shift
Mismatch between training and deployment data distributions (e.g., covariate shift).
Decision Boundary
Surface separating different predicted classes.
Decision Stump
One-level decision tree.
Decision tree
Tree‑structured model splitting data based on features to make predictions.
Decoder
Model component that generates outputs from latent representations or encoded inputs.
Deep Belief Network (DBN)
Stack of restricted Boltzmann machines for deep learning.
Deep Q-Network (DQN)
RL algorithm combining Q-learning with deep neural networks.
Deepfake
Synthetic media generated using AI to convincingly replace one person’s likeness with another.
Degree of Freedom
Number of independent values in a statistical calculation.
Dense Layer
Fully connected neural network layer.
Depthwise Convolution
Convolution applied separately to each input channel.
Deterministic Model
Model that produces the same output for a given input.
Dice Coefficient
Overlap measure between two samples, used in segmentation.
Differential privacy
Technique to ensure statistical analyses do not reveal information about any individual in the dataset.
Diffusion model
Generative model that learns to reverse a gradual noising process to create new samples.
Dimensionality reduction
Compressing features while preserving structure (e.g., PCA, t‑SNE, UMAP).
Discriminative Model
Model that learns decision boundaries between classes.
Distance Metric Learning
Learning a distance function tailored to a task.
Distributed Computing
Computing across multiple machines to handle large-scale tasks.
Domain adaptation
Transferring a model trained on one domain to perform well on a related domain.
Domain Generalization
Training models to perform well on unseen domains.
Dropout
Regularization method that randomly deactivates neurons during training to prevent overfitting.
Dynamic Time Warping (DTW)
Algorithm for measuring similarity between temporal sequences.
E
Early Fusion
Combining multiple data modalities at the input level.
Early stopping
Halting training when validation performance stops improving to prevent overfitting.
Edge AI
Running AI models locally on devices rather than in the cloud to reduce latency and bandwidth use.
Elastic Net
Regularization combining L1 and L2 penalties.
Elastic weight consolidation
Regularization technique to prevent catastrophic forgetting in continual learning.
Embedding
Dense vector representation that captures semantic similarity among items like words or images.
Embedding Layer
Neural network layer that maps discrete items to dense vectors.
Empirical Risk Minimization
Minimizing average loss over training data.
Encoder
Component that converts inputs into compact latent representations.
Ensemble
Combination of multiple models to improve accuracy and robustness (e.g., bagging, boosting).
Ensemble Averaging
Combining predictions by averaging outputs of multiple models.
Ensemble Pruning
Reducing the size of an ensemble by removing models that contribute least to performance.
Entropy (information)
Measure of uncertainty or information content in a probability distribution.
Entropy Regularization
Adding entropy to the loss function to encourage exploration in reinforcement learning.
Evaluation metrics
Quantitative measures to assess model performance (e.g., precision, recall, F1, AUC).
Evolutionary Algorithm
Optimization algorithm inspired by natural selection processes.
Exact Match Score
Metric for evaluating NLP tasks like question answering by checking exact string matches.
Explainability
The degree to which the internal mechanics of a machine learning system can be explained in human terms.
Explainable AI (XAI)
Methods that make AI model decisions understandable to humans.
Exponential Moving Average (EMA)
Weighted moving average giving more importance to recent data points.
F
F1 score
Harmonic mean of precision and recall; balances false positives and false negatives.
Factorization Machine
Model that captures interactions between variables using factorized parameters.
Fairlearn
Open-source toolkit for assessing and improving fairness in AI systems.
Fairness
Absence of unjust bias in model outcomes across demographic groups; measured by metrics like demographic parity.
FastText
Efficient text classification and representation learning library by Facebook AI.
Feature Drift
Change in the statistical properties of features over time.
Feature engineering
Process of creating, transforming, or selecting variables to improve model performance.
Feature Map
Output of a convolutional layer representing detected features.
Feature Scaling
Normalizing or standardizing features to improve model training.
Feature store
Centralized repository for storing and serving machine learning features.
Federated Averaging
Algorithm for aggregating model updates in federated learning.
Federated learning
Training models across multiple devices or servers holding local data samples without exchanging them.
Few‑shot learning
Learning to perform tasks from only a handful of examples.
Few‑shot prompting
Providing a small number of examples in a prompt to guide a language model’s output.
Fine‑tuning
Adapting a pretrained model on a smaller, task‑specific dataset to improve performance.
Fisher Score
Feature selection method based on class separability.
Forward Propagation
Process of passing inputs through a network to obtain outputs.
Foundation model
Large pretrained model adaptable to many downstream tasks via prompting or fine‑tuning.
Fourier Transform
Mathematical transform decomposing signals into frequency components.
FP16/BF16
Reduced‑precision number formats for faster training/inference with minimal accuracy loss.
Frame Rate
Number of frames displayed per second in video processing.
G
Gated Recurrent Unit (GRU)
RNN variant that uses gating mechanisms to control information flow.
Gaussian Mixture Model (GMM)
Probabilistic model representing data as a mixture of Gaussian distributions.
General Adversarial Training
Training method to improve robustness against adversarial examples.
Generalization
Model’s ability to perform well on unseen data rather than just training data.
Generalized Linear Model (GLM)
Statistical model generalizing linear regression for various distributions.
Generative adversarial network (GAN)
Two‑network setup (generator vs. discriminator) trained in opposition to produce realistic data.
Geometric Deep Learning
Deep learning methods for non-Euclidean data like graphs and manifolds.
Gibbs sampling
MCMC algorithm sampling each variable from its conditional distribution to approximate joint distributions.
Global Average Pooling
Pooling operation that averages each feature map into a single value.
Gradient Boosting
Ensemble method that builds models sequentially to correct previous errors.
Gradient clipping
Technique to prevent exploding gradients by capping their magnitude during training.
Gradient descent
Optimization method that iteratively updates parameters to minimize loss using gradients.
Gradient vanishing
Problem where gradients become too small for effective learning in deep networks.
Graph neural network (GNN)
Neural architecture operating on graphs via message passing between nodes and edges.
Greedy Algorithm
Algorithm that makes the locally optimal choice at each step.
Grid Search
Exhaustive search over specified hyperparameter values.
H
Hallucination
Confident but incorrect or fabricated output produced by a model, especially in generative tasks.
Hamming Distance
Number of positions at which two strings of equal length differ.
Hard Attention
Attention mechanism that selects discrete parts of the input.
Hashing Trick
Method to map features to indices in a fixed-size vector.
Hebbian Learning
Learning rule stating that neurons that fire together wire together.
Heuristic
Practical rule‑of‑thumb method to solve problems efficiently, not guaranteed to be optimal.
Hidden state
Internal representation maintained by models (e.g., RNNs) across sequences.
Hierarchical Clustering
Clustering method building a hierarchy of clusters.
Hierarchical Softmax
Efficient softmax computation for large vocabularies.
Histogram of Oriented Gradients (HOG)
Feature descriptor for object detection in images.
Homomorphic Encryption
Encryption allowing computation on ciphertexts without decryption.
Hubness
Phenomenon in high-dimensional spaces where some points appear frequently as nearest neighbors.
Human‑in‑the‑loop
Approach where human feedback is integrated into the AI training or decision‑making process.
Hybrid Model
Model combining different AI techniques, e.g., symbolic and neural.
Hyperparameter
Configurable parameter set before training (e.g., learning rate, batch size).
Hyperparameter optimization
Systematic search for best hyperparameters (grid/random search, Bayesian optimization).
I
Image Augmentation
Techniques to increase dataset size by transforming images.
Image Captioning
Generating textual descriptions for images using AI.
Imbalanced Learning
Techniques for handling datasets with unequal class distribution.
Imputation
Filling in missing data using strategies like mean, KNN, or model‑based estimates.
Incremental Learning
Training models continuously with new data without retraining from scratch.
Indexing
Organizing data for fast retrieval in databases or search engines.
Inductive Bias
Assumptions a model makes to generalize beyond training data.
Inference
Using a trained model to make predictions or generate outputs on new inputs.
Instance Segmentation
Detecting and delineating each object instance in an image.
Integrated Gradients
Explainability method attributing model predictions to input features.
Interpretability
Degree to which a human can understand a model’s internal mechanics or reasons for outputs.
Intersection over Union (IoU)
Overlap metric for object detection/segmentation comparing predicted and true regions.
In‑context learning
Model adapts behavior from examples provided in the prompt without updating weights.
J
Jaccard index
Similarity metric equal to intersection over union of sets.
Jacobian Matrix
Matrix of all first-order partial derivatives of a vector-valued function.
Jensen–Shannon divergence
Symmetric measure of similarity between probability distributions derived from KL divergence.
Jittering
Data augmentation technique adding noise to inputs.
Joint Embedding
Mapping multiple modalities into a shared representation space.
Joint Probability
Probability of two events occurring together.
JSON-LD
Lightweight Linked Data format used for semantic web and SEO.
K
K-Nearest Neighbors (KNN)
Instance-based learning algorithm classifying based on nearest neighbors.
Kalman Filter
Algorithm for estimating the state of a system from noisy observations.
Kernel Function
Function used in SVMs to transform data into higher dimensions.
Keyphrase Extraction
Identifying important phrases from text.
Knowledge Base
Structured repository of facts and relationships.
Knowledge distillation
Training a smaller student model to mimic a larger teacher model’s behavior.
Knowledge graph
Structured representation of entities and their relationships, often used for reasoning and search.
Knowledge Tracing
Modeling a learner’s knowledge state over time.
K‑fold cross‑validation
Splits data into k folds; trains on k‑1 folds and validates on the remaining fold, rotating across folds.
k‑means
Clustering algorithm assigning points to the nearest centroid and updating centroids iteratively.
L
Label Encoding
Converting categorical labels into numeric form.
Label leakage
When training data includes information unavailable at prediction time, inflating performance estimates.
Label smoothing
Regularization that softens hard labels to reduce overconfidence.
Lagrange Multiplier
Optimization method for constrained problems.
Language Model
Model predicting the probability of word sequences.
Large language model (LLM)
Model trained on massive corpora to understand and generate human‑like text.
Latent space
Compressed representation space where similar inputs are close together.
Layer Normalization
Normalization applied across features in a layer.
Leaky ReLU
Activation function allowing a small gradient for negative inputs.
Learning Curve
Graph showing model performance over training iterations.
Learning rate
Step size for updating parameters during optimization.
Lexical Semantics
Study of word meanings and relationships.
LightGBM
Gradient boosting framework optimized for speed and memory.
Linear Discriminant Analysis (LDA)
Technique for dimensionality reduction and classification.
Local Outlier Factor (LOF)
Algorithm for detecting density-based anomalies.
Log loss (cross‑entropy)
Loss function measuring the difference between predicted probabilities and true labels.
Logistic Regression
Statistical model for binary classification.
Long Short-Term Memory (LSTM)
RNN variant designed to capture long-term dependencies.
LoRA (Low‑Rank Adaptation)
Parameter‑efficient fine‑tuning method for large language models.
Low-Rank Approximation
Matrix approximation using fewer dimensions.
M
Machine learning (ML)
Field where models learn patterns from data to make predictions or decisions.
Manifold Learning
Techniques for nonlinear dimensionality reduction.
MapReduce
Programming model for processing large datasets in parallel.
Markov Chain
Stochastic process with memoryless transitions.
Markov decision process (MDP)
Framework modeling decision‑making with states, actions, transitions, and rewards.
Masked Language Model
Model predicting masked tokens in a sequence.
Max Pooling
Pooling operation selecting the maximum value in a region.
Mean Average Precision (mAP)
Metric summarizing precision across recall levels; popular in detection tasks.
Mean Squared Error (MSE)
Loss function measuring average squared difference between predictions and targets.
Median Absolute Error
Robust metric measuring median of absolute errors.
Memory Network
Neural network with an explicit memory component.
Meta Reinforcement Learning
RL approach where agents learn to adapt quickly to new tasks.
Meta‑learning
Learning to learn — models that improve their learning process over multiple tasks.
Min-Max Scaling
Scaling features to a fixed range, usually [0,1].
Mini-Batch Gradient Descent
Gradient descent variant using small random subsets of data.
Mixture of experts
Architecture where different sub‑models specialize in different parts of the input space.
MLOps
Practices for reliable model development, deployment, monitoring, and governance in production.
Mode Collapse
GAN failure mode where generator produces limited variety.
Model card
Documentation summarizing a model’s intended use, performance, data, risks, and limitations.
Model Compression
Techniques to reduce model size while preserving accuracy.
Model Ensemble
Combining multiple models to improve performance.
Model Interpretability
Understanding how a model makes predictions.
Momentum
Optimization technique accelerating gradient descent in relevant directions.
Monte Carlo
Methods that rely on repeated random sampling to estimate numeric results.
Monte Carlo Tree Search
Search algorithm combining random sampling and tree search.
Multi-Head Attention
Transformer mechanism with multiple attention layers in parallel.
Multilayer Perceptron (MLP)
Feedforward neural network with multiple layers.
Multimodal Fusion
Combining information from multiple data modalities.
Multi‑modal learning
Training models that process and relate information from multiple data types (e.g., text, image, audio).
N
Naive Bayes
Probabilistic classifier assuming feature independence.
Named Entity Linking
Connecting named entities in text to knowledge base entries.
Named entity recognition (NER)
NLP task to identify and classify entities like people, places, and organizations in text.
Natural language processing (NLP)
Subfield focused on enabling machines to understand and generate human language.
Negative Sampling
Training technique for word embeddings.
NeRF (Neural Radiance Fields)
Technique for synthesizing novel views of complex 3D scenes from 2D images.
Neural architecture search (NAS)
Automated search over network designs to optimize performance under constraints.
Neural Machine Translation (NMT)
Using neural networks for language translation.
Neural network
Composed of layers of neurons with learnable weights and nonlinear activations.
Neuroevolution
Evolving neural network architectures using genetic algorithms.
Noise Contrastive Estimation
Training method using noise samples to estimate probabilities.
Non-Maximum Suppression
Algorithm to remove redundant bounding boxes in object detection.
Normalization
Scaling inputs or features to stabilize training (e.g., z‑score, min‑max).
Normalization Layer
Layer that normalizes inputs to stabilize training.
Numerical Stability
Avoiding numerical errors in computations.
N‑shot learning
Learning from exactly N examples per class; includes few‑shot and one‑shot settings.
O
Object Tracking
Following objects across frames in a video.
Objective function
Quantity optimized during training (loss to minimize or reward to maximize).
One-Class SVM
SVM variant for anomaly detection.
One‑hot encoding
Representing categorical variables as binary vectors.
Online Learning
Updating models incrementally as new data arrives.
Ontology
Structured representation of concepts and relationships within a domain.
Optimization
Algorithms and methods to find parameters that minimize loss (SGD, Adam).
Ordinal Encoding
Encoding categorical variables with integer values respecting order.
Out-of-Distribution Detection
Identifying inputs that differ from training data distribution.
Outlier detection
Identifying data points that differ significantly from the majority.
Over-parameterization
Using more parameters than necessary to fit data.
Overfitting
When a model memorizes training data patterns and performs poorly on new data.
P
PageRank
Algorithm for ranking web pages based on link structure.
Pairwise Ranking
Learning to rank items based on pairwise comparisons.
Parameter
Learnable weight or bias updated during training.
Parameter‑efficient tuning
Fine‑tuning methods that adjust only a small subset of model parameters.
Parzen Window
Non-parametric technique for probability density estimation.
Pattern Recognition
Identifying patterns and regularities in data.
Perceptron
Early type of artificial neuron used in simple linear classifiers.
Perceptron Learning Rule
Algorithm for training a perceptron.
Permutation Importance
Feature importance measure based on shuffling values.
Perplexity
Language modeling metric indicating how well a probability model predicts a sample.
Pipeline
End‑to‑end workflow from data ingestion to deployment and monitoring.
Pixel RNN
RNN architecture for modeling images pixel by pixel.
Poisoning attack
Manipulating training data to cause a model to learn harmful behavior.
Poisson Regression
Regression model for count data.
Policy Gradient
RL method optimizing policy directly via gradient ascent.
Polynomial Regression
Regression model using polynomial features.
Pose Estimation
Detecting positions of key points in objects or humans.
Precision
Proportion of predicted positives that are true positives; controls false positives.
Precision-Recall Curve
Graph showing trade-off between precision and recall across different thresholds.
Principal Component Analysis (PCA)
Dimensionality reduction technique projecting data onto orthogonal components capturing maximum variance.
Probabilistic Graphical Model
Model representing variables and their conditional dependencies via a graph.
Prompt engineering
Designing inputs and instructions to steer model behavior toward desired outputs.
Prompt injection
Adversarial technique where malicious instructions are embedded in prompts to manipulate model output.
Pruning
Removing parameters or connections to reduce model size and latency with minimal accuracy loss.
Q
Quantization
Reducing the precision of model weights/activations (e.g., from 32‑bit to 8‑bit) to shrink size and speed up inference.
Quantum Machine Learning
Applying quantum computing to enhance machine learning algorithms.
Qubit
Quantum computing unit representing both 0 and 1 simultaneously via superposition.
Query Expansion
Adding related terms to a search query to improve retrieval results.
Queueing Theory
Mathematical study of waiting lines, applicable to system performance modeling.
Q‑learning
RL algorithm that learns action‑value function estimating expected future reward for state‑action pairs.
R
Random Forest
Ensemble of decision trees trained on random subsets of data and features.
Random Initialization
Starting model parameters with random values before training.
Ranking Loss
Loss function for learning to rank tasks.
RANSAC
Robust estimation algorithm that fits a model to data containing outliers.
Recall
Proportion of actual positives correctly identified; controls false negatives.
Recall@K
Proportion of relevant items found in the top K results.
Rectified Linear Unit (ReLU)
Activation function outputting zero for negative inputs and the input itself for positive inputs.
Recurrent neural network (RNN)
Neural architecture with loops to process sequential data by maintaining hidden states.
Regularization
Techniques like L1/L2 penalties or dropout to prevent overfitting.
Reinforcement learning (RL)
Learning paradigm where an agent interacts with an environment to maximize cumulative reward.
Reinforcement Learning from Human Feedback (RLHF)
Training method aligning model outputs with human preferences.
Representation learning
Learning useful feature representations automatically from raw data.
Residual connection
Shortcut connection in neural networks that adds input to output to ease training of deep models.
Residual Network (ResNet)
Deep network architecture with skip connections to ease training.
Restricted Boltzmann Machine (RBM)
Stochastic neural network for unsupervised learning.
Reward Function
Function defining the goal in reinforcement learning by assigning rewards to actions.
Re‑ranking
Reordering a list of results based on additional scoring or context.
Ridge Regression
Linear regression with L2 regularization.
ROC Curve
Graph showing true positive rate vs. false positive rate across thresholds.
Root Mean Squared Error (RMSE)
Square root of the average squared differences between predictions and targets.
S
Sample Weighting
Assigning different importance to samples during training.
Sampling Bias
Bias introduced when the sample is not representative of the population.
Satisfiability Problem (SAT)
Problem of determining if there exists an interpretation that satisfies a given Boolean formula.
Scalability
Ability of a system to handle increasing workloads without performance loss.
Scaler
Tool or function to normalize or standardize data.
Self-Organizing Map (SOM)
Unsupervised neural network projecting high-dimensional data to a lower-dimensional grid.
Self‑attention
Mechanism allowing a model to weigh the importance of different parts of the input sequence.
Self‑supervised learning
Learning from unlabeled data by generating labels from the data itself.
Semantic Search
Search technique using meaning rather than exact keyword matching.
Semi-Parametric Model
Model with both parametric and non-parametric components.
Semi‑supervised learning
Training with a small labeled set and a large unlabeled set to improve performance.
Sensitivity
Same as recall; proportion of actual positives correctly identified.
Sentiment analysis
NLP task of determining the emotional tone of text.
Sequence-to-Sequence Model
Model mapping input sequences to output sequences, common in translation.
Shapley values
Game‑theoretic approach to explain the contribution of each feature to a prediction.
Sharding
Splitting a dataset or database into smaller, faster, more easily managed parts.
Sigmoid Function
Activation function mapping inputs to the range (0,1).
Silhouette Score
Metric for evaluating clustering quality.
Simulated Annealing
Probabilistic optimization algorithm inspired by metallurgy.
Skip-Gram
Word2Vec model predicting context words from a target word.
Softmax Function
Function converting logits into probabilities that sum to 1.
Sparse Matrix
Matrix in which most elements are zero.
Spectral Clustering
Clustering method using eigenvalues of similarity matrices.
Speech Synthesis
Generating spoken language from text.
Spike-and-Slab Prior
Bayesian prior combining a point mass at zero and a continuous distribution.
SQL Injection
Security vulnerability allowing execution of arbitrary SQL code.
Stacked Autoencoder
Autoencoder with multiple hidden layers.
State Space Model
Mathematical model describing a system with inputs, outputs, and state variables.
Stemming
Reducing words to their root form.
Stochastic gradient descent (SGD)
Optimization method updating parameters using a random subset (mini‑batch) of data.
Stop Words
Common words often removed in text preprocessing.
Streaming Data
Data generated continuously and processed in real time.
Structural Causal Model
Model representing causal relationships between variables.
Subword Tokenization
Breaking words into smaller units for NLP.
Support Vector Machine (SVM)
Supervised learning model for classification and regression.
Survival Analysis
Statistical analysis of time-to-event data.
Synthetic data
Artificially generated data used to augment or replace real datasets.
Synthetic Minority Oversampling Technique (SMOTE)
Method to address class imbalance by generating synthetic samples.
T
Tabular Data
Data organized in rows and columns.
Target Encoding
Encoding categorical variables using target statistics.
Temperature (in sampling)
Parameter controlling randomness in model output generation.
Temporal Difference Learning
RL method updating value estimates based on other learned estimates.
Tensor
Multidimensional array used in deep learning frameworks.
Tensor Decomposition
Breaking a tensor into simpler components.
Term Frequency-Inverse Document Frequency (TF-IDF)
Statistic reflecting how important a word is to a document in a corpus.
Test Set
Data used to evaluate the final model performance.
Text Classification
Assigning categories to text documents.
Text Mining
Extracting useful information from text data.
Time Series Analysis
Analyzing data points collected or recorded at specific time intervals.
Token Embedding
Vector representation of tokens in NLP.
Tokenization
Splitting text into smaller units (tokens) for processing by NLP models.
Topological Data Analysis
Analyzing the shape of data using topology.
Top‑k sampling
Text generation method that samples from the top k most probable next tokens.
Top‑p (nucleus) sampling
Text generation method that samples from the smallest set of tokens whose cumulative probability exceeds p.
Training Set
Data used to train a model.
Transfer Entropy
Measure of information transfer between variables.
Transfer learning
Reusing a pretrained model on a new but related task to save time and resources.
Transformer
Neural architecture using self‑attention to process sequences in parallel.
Transformer-XL
Transformer variant handling longer context via recurrence.
Transliteration
Converting text from one script to another.
Tree-based Model
Model using decision tree structures for predictions.
True positive rate (TPR)
Same as recall; proportion of actual positives correctly predicted.
Turing Test
Test of a machine’s ability to exhibit intelligent behavior indistinguishable from a human.
U
Under-sampling
Reducing the number of samples in the majority class to balance data.
Underfitting
When a model is too simple to capture underlying patterns in the data.
Uniform Distribution
Probability distribution where all outcomes are equally likely.
Unit Testing
Testing individual components of software for correctness.
Univariate Analysis
Analysis involving a single variable.
Unsupervised learning
Learning patterns from unlabeled data without explicit target outputs.
Unsupervised Pretraining
Training a model on unlabeled data before fine-tuning on labeled data.
Upsampling
Increasing the number of samples in the minority class to balance data.
User-based Collaborative Filtering
Recommendation method using similarities between users.
Utility function
Function that assigns a value to outcomes, guiding decision‑making in AI agents.
V
Validation Curve
Graph showing model performance for different hyperparameter values.
Validation set
Subset of data used to tune model hyperparameters and prevent overfitting.
Value Function
Function estimating expected return in reinforcement learning.
Vanishing Gradient Problem
Issue where gradients become too small for effective learning in deep networks.
Variational autoencoder (VAE)
Generative model that learns a probabilistic latent space for data.
Variational Inference
Approximate Bayesian inference method.
Vector database
Database optimized for storing and querying vector embeddings.
Vector Quantization
Quantizing vectors to a finite set of representative points.
Vectorization
Converting data into numerical vectors for model input.
Version Control
System for tracking changes in code or data.
Video Captioning
Generating textual descriptions for video content.
Virtualization
Creating virtual versions of computing resources.
Vision transformer (ViT)
Transformer architecture adapted for image classification.
Visual Question Answering (VQA)
Answering questions about images using AI.
Voting Classifier
Ensemble method combining predictions via majority vote.
W
Weak Supervision
Training with noisy, limited, or imprecise labels.
Weight decay
Regularization technique adding a penalty proportional to weights’ magnitude.
Weight Initialization
Setting initial values for model parameters before training.
Weighted Average Precision
Precision metric weighted by class importance.
Whisper
OpenAI’s automatic speech recognition (ASR) system.
Whitening Transformation
Transforming data to have zero mean and unit variance with uncorrelated features.
Wide & Deep Model
Model combining wide linear models and deep neural networks.
Window Function
Function applied to a subset of data points in signal processing.
Word embedding
Dense vector representation of words capturing semantic relationships.
Word Sense Disambiguation
Determining which meaning of a word is used in context.
Word2Vec
Neural embedding model that learns vector representations of words from context.
WordPiece
Subword tokenization algorithm used in BERT.
X
X-shape Validation
Cross-validation variant with specific fold arrangements.
Xavier Initialization
Weight initialization method keeping variance constant across layers.
XGBoost
Optimized gradient boosting library for supervised learning tasks.
XLM-R
Cross-lingual language model based on RoBERTa.
XML Parsing
Reading and processing XML data.
XOR Problem
Classic problem showing limitations of simple perceptrons.
Y
YAML
Human-readable data serialization format.
Yield Curve Modeling
Modeling the relationship between interest rates and bond maturities.
YOLO (You Only Look Once)
Real‑time object detection algorithm processing images in a single pass.
YoloX
Advanced YOLO-based object detection model.
Yottabyte
Unit of digital information equal to 10^24 bytes.
YouTube-8M
Large-scale labeled video dataset for machine learning.
Z
Z-Algorithm
String matching algorithm for pattern searching.
Zero Crossing Rate
Rate at which a signal changes sign, used in audio analysis.
Zero Inflation Model
Statistical model for count data with excess zeros.
Zero‑shot learning
Model’s ability to perform tasks without having seen labeled examples during training.
Zipf's Law
Empirical law stating that word frequency is inversely proportional to its rank.
Zonal Statistics
Calculating statistics on values within zones of a raster dataset.
Zoom Augmentation
Image augmentation technique involving zooming in or out.
Z‑score normalization
Scaling method that transforms data to have mean 0 and standard deviation 1.