Glossary of AI Terms

A comprehensive glossary of artificial intelligence terms and definitions.

A

A/B testing

Compare two variants (A and B) to measure which performs better for a defined goal (e.g., click‑through rate).

Accuracy

The proportion of correct predictions out of all predictions; best for balanced class distributions.

Activation function

Nonlinear function in a neuron that enables networks to model complex relationships (e.g., ReLU, sigmoid, GELU).

Active learning

Machine learning approach where the model selects the most informative data points for labeling to improve performance efficiently.

Actor–critic

Reinforcement learning architecture combining a policy model (actor) and a value model (critic).

Adagrad

Optimizer that adapts the learning rate for each parameter based on historical gradients.

Adam Optimizer

Adaptive Moment Estimation optimizer combining momentum and RMSProp techniques.

Adaptive Boosting (AdaBoost)

Ensemble method that combines weak learners into a strong classifier by focusing on misclassified examples.

Adaptive learning rate

Optimization technique where the learning rate changes during training based on performance or gradient history.

Adversarial example

Input crafted with subtle perturbations to mislead a model into incorrect predictions.

Agent

An autonomous system that perceives an environment and takes actions to maximize cumulative reward or achieve goals.

Agent-based Modeling

Simulation of actions and interactions of autonomous agents to assess their effects on a system.

AI (Artificial Intelligence)

Broad field focused on building systems that perform tasks requiring human‑like intelligence such as learning and reasoning.

AI Accelerator

Specialized hardware designed to speed up AI computations, such as TPUs or GPUs.

AI ethics

Principles and practices to ensure fairness, accountability, transparency, and safety in AI systems.

AIOps

Application of AI/ML to automate and enhance IT operations, including anomaly detection and incident response.

Alignment

Ensuring model objectives and behaviors are consistent with human values, intent, and safety constraints.

Alpha-Beta Pruning

Search algorithm optimization that reduces the number of nodes evaluated in minimax.

AlphaGo

DeepMind’s AI system that defeated human champions in the game of Go using deep reinforcement learning.

Analytical Engine

Charles Babbage’s proposed mechanical general-purpose computer, a precursor to modern computing.

Anchor box

Predefined bounding box shape used in object detection models to predict object locations.

Anchor Text Embedding

Vector representation of hyperlink anchor text for semantic search.

Anisotropic Filtering

Image processing technique to enhance texture quality at oblique viewing angles.

Anomaly detection

Identifying rare items, events, or observations that deviate significantly from the majority of data.

Anonymization

Process of removing personally identifiable information from data to protect privacy.

API

Defined interface that allows software to communicate, commonly used to deploy and consume AI services.

Approximate nearest neighbor search

Fast method for finding points in high‑dimensional space that are close to a query point.

Approximation Error

Difference between the true function and the best possible model within a hypothesis space.

Argmax

Function that returns the index of the maximum value in a list or array.

Artificial Life (ALife)

Study of life-like behaviors in artificial systems.

ASR (Automatic Speech Recognition)

Technology that converts spoken language into text.

Attention head

One of multiple parallel attention mechanisms in a transformer layer, each learning different relationships.

Attention Score

Weight assigned to a token or feature in an attention mechanism.

Auto-regressive Integrated Moving Average (ARIMA)

Statistical model for time series forecasting.

Autoencoder

Neural network trained to compress input into a latent representation and reconstruct it, useful for denoising and anomaly detection.

Automatic Differentiation

Technique to compute derivatives efficiently for optimization.

AutoML

Automated machine learning — tools and methods to automate model selection, training, and tuning.

Autoregressive model

Generates the next token conditioned on previous tokens; common in language modeling.

B

Backpropagation

Training algorithm that computes gradients of loss w.r.t. weights by propagating error backward through the network.

Bag-of-Words (BoW)

Text representation counting word occurrences without order.

Bagging

Ensemble method that trains multiple models on bootstrapped datasets and averages their predictions.

Balanced Accuracy

Average of recall obtained on each class, useful for imbalanced datasets.

Batch Gradient Descent

Gradient descent variant that uses the entire dataset for each update.

Batch normalization

Technique to stabilize and speed up training by normalizing layer inputs within a batch.

Bayes Theorem

Mathematical formula for conditional probability.

Bayesian inference

Probabilistic approach updating beliefs (priors) with data (likelihood) to obtain posteriors.

Beam search

Search algorithm that explores a subset of possible sequences to find the most likely output in sequence generation.

Beam Width

Number of sequences kept at each step in beam search.

Bellman Equation

Recursive equation for optimal policy value in reinforcement learning.

Bias (in AI)

Systematic error caused by skewed data or model design that leads to unfair or inaccurate outcomes.

Bias-Variance Tradeoff

Balance between underfitting and overfitting in model performance.

Big data

Extremely large datasets requiring specialized storage and processing to extract value.

Binary Cross-Entropy

Loss function for binary classification tasks.

Binary Search Tree

Data structure for efficient searching, insertion, and deletion.

BLEU

Metric for machine translation quality based on n‑gram overlap with reference translations.

BLEURT

Learned metric for evaluating text generation quality.

Bloom Filter

Probabilistic data structure for set membership testing.

Bootstrap (resampling)

Statistical method that estimates uncertainty by sampling with replacement from observed data.

Bottleneck Layer

Layer with fewer neurons to force feature compression.

Bounded rationality

Decision making under constraints like limited information, time, or computational resources.

Bounding Box

Rectangle defining the location of an object in an image.

Box-Cox Transformation

Statistical transformation to stabilize variance.

Branch and Bound

Optimization algorithm that systematically explores solution space.

Byte Pair Encoding (BPE)

Subword tokenization method for NLP.

C

Calibration

The alignment between predicted probabilities and observed frequencies (well‑calibrated models output reliable probabilities).

CapsNet Routing

Dynamic routing mechanism in capsule networks.

Capsule network

Neural network architecture that groups neurons into capsules to preserve spatial hierarchies.

Cascade Classifier

Series of classifiers applied sequentially to improve detection speed.

CatBoost

Gradient boosting library optimized for categorical features.

Categorical Cross-Entropy

Loss function for multi-class classification.

Causal inference

Methods to estimate cause‑and‑effect relationships beyond correlation, often using counterfactual reasoning.

Centroid

Mean vector of points in a cluster; used in algorithms like k‑means.

Centroidal Voronoi Tessellation

Partitioning of space into regions based on distance to centroids.

Chain Rule

Calculus rule for computing derivatives of composite functions.

Character-level Embedding

Vector representation of individual characters.

Checkpoint

Saved state of a model during training, allowing resumption or rollback.

Chi-Square Test

Statistical test for independence between categorical variables.

Class Activation Map (CAM)

Visualization highlighting image regions important for classification.

Class imbalance

When some classes are underrepresented; can degrade metrics like accuracy and require rebalancing strategies.

Classification

Task of assigning labels to inputs (binary or multi‑class), e.g., spam vs. not spam.

Click-through Rate (CTR)

Ratio of clicks to impressions in online systems.

Clustering

Unsupervised grouping of similar items without predefined labels.

Cold Start Problem

Difficulty in recommendations for new users or items.

Collaborative Filtering

Recommendation method based on user-item interactions.

Color Histogram

Representation of image colors and their distribution.

Combinatorial Optimization

Optimization over discrete structures.

Common Crawl

Open repository of web crawl data for NLP training.

Complexity Penalty

Regularization term discouraging overly complex models.

Computational Graph

Graph representation of mathematical operations in a model.

Concept drift

Gradual change in the statistical properties of the target variable over time.

Conditional GAN (cGAN)

GAN variant conditioned on additional information.

Conditional Probability

Probability of an event given another event has occurred.

Confounding Variable

Variable that influences both dependent and independent variables.

Confusion matrix

Table of true/false positives/negatives summarizing classification performance.

Connectionist Approach

AI approach using neural networks to model cognition.

Consensus Clustering

Combining multiple clustering results into one.

Constant Learning Rate

Fixed step size in optimization.

Constraint Satisfaction Problem (CSP)

Problem defined by variables, domains, and constraints.

Content-based Filtering

Recommendation method using item features.

Context window

Maximum number of tokens a language model can attend to at once.

Continuous Variable

Variable that can take any value within a range.

Contrast Ratio

Measure of luminance difference between colors.

Contrastive learning

Learning method that brings similar pairs closer and pushes dissimilar pairs apart in embedding space.

Convergence Criterion

Condition to stop iterative optimization.

Convex Hull

Smallest convex set containing all points in a dataset.

Convolutional neural network (CNN)

Neural architecture using convolutions for grid‑like data (images, audio spectrograms).

Cosine Similarity

Measure of similarity between two vectors based on cosine of angle.

Count Vectorizer

Tool to convert text into a matrix of token counts.

Covariance Matrix

Matrix capturing pairwise covariances between variables.

Cramér’s V

Measure of association between two nominal variables.

Cross-Entropy Loss

Loss function measuring difference between predicted and true probability distributions.

Cross‑validation

Technique to estimate generalization by training and testing on multiple folds of the data.

Curriculum learning

Training strategy that presents easier examples before harder ones to improve learning efficiency.

Curse of Dimensionality

Phenomenon where high-dimensional spaces make learning harder.

CycleGAN

GAN architecture for unpaired image-to-image translation.

D

Data augmentation

Techniques that expand training data via transformations (e.g., flips, noise) to improve robustness.

Data Cleaning

Process of correcting or removing inaccurate records from a dataset.

Data drift

Change in data distribution over time that can degrade model performance.

Data Imbalance

Unequal representation of classes in a dataset.

Data Labeling

Assigning meaningful tags to raw data.

Data lake

Centralized repository for storing raw structured and unstructured data at scale.

Data lineage

Tracking the origin, movement, and transformation of data through a system.

Data Pipeline

Series of steps to process and transform data.

Data Preprocessing

Preparing raw data for modeling.

Data Sampling

Selecting a subset of data for analysis.

Data Sharding

Splitting data across multiple storage systems.

Data Warehouse

Central repository for structured data.

Dataset shift

Mismatch between training and deployment data distributions (e.g., covariate shift).

Decision Boundary

Surface separating different predicted classes.

Decision Stump

One-level decision tree.

Decision tree

Tree‑structured model splitting data based on features to make predictions.

Decoder

Model component that generates outputs from latent representations or encoded inputs.

Deep Belief Network (DBN)

Stack of restricted Boltzmann machines for deep learning.

Deep Q-Network (DQN)

RL algorithm combining Q-learning with deep neural networks.

Deepfake

Synthetic media generated using AI to convincingly replace one person’s likeness with another.

Degree of Freedom

Number of independent values in a statistical calculation.

Dense Layer

Fully connected neural network layer.

Depthwise Convolution

Convolution applied separately to each input channel.

Deterministic Model

Model that produces the same output for a given input.

Dice Coefficient

Overlap measure between two samples, used in segmentation.

Differential privacy

Technique to ensure statistical analyses do not reveal information about any individual in the dataset.

Diffusion model

Generative model that learns to reverse a gradual noising process to create new samples.

Dimensionality reduction

Compressing features while preserving structure (e.g., PCA, t‑SNE, UMAP).

Discriminative Model

Model that learns decision boundaries between classes.

Distance Metric Learning

Learning a distance function tailored to a task.

Distributed Computing

Computing across multiple machines to handle large-scale tasks.

Domain adaptation

Transferring a model trained on one domain to perform well on a related domain.

Domain Generalization

Training models to perform well on unseen domains.

Dropout

Regularization method that randomly deactivates neurons during training to prevent overfitting.

Dynamic Time Warping (DTW)

Algorithm for measuring similarity between temporal sequences.

E

Early Fusion

Combining multiple data modalities at the input level.

Early stopping

Halting training when validation performance stops improving to prevent overfitting.

Edge AI

Running AI models locally on devices rather than in the cloud to reduce latency and bandwidth use.

Elastic Net

Regularization combining L1 and L2 penalties.

Elastic weight consolidation

Regularization technique to prevent catastrophic forgetting in continual learning.

Embedding

Dense vector representation that captures semantic similarity among items like words or images.

Embedding Layer

Neural network layer that maps discrete items to dense vectors.

Empirical Risk Minimization

Minimizing average loss over training data.

Encoder

Component that converts inputs into compact latent representations.

Ensemble

Combination of multiple models to improve accuracy and robustness (e.g., bagging, boosting).

Ensemble Averaging

Combining predictions by averaging outputs of multiple models.

Ensemble Pruning

Reducing the size of an ensemble by removing models that contribute least to performance.

Entropy (information)

Measure of uncertainty or information content in a probability distribution.

Entropy Regularization

Adding entropy to the loss function to encourage exploration in reinforcement learning.

Evaluation metrics

Quantitative measures to assess model performance (e.g., precision, recall, F1, AUC).

Evolutionary Algorithm

Optimization algorithm inspired by natural selection processes.

Exact Match Score

Metric for evaluating NLP tasks like question answering by checking exact string matches.

Explainability

The degree to which the internal mechanics of a machine learning system can be explained in human terms.

Explainable AI (XAI)

Methods that make AI model decisions understandable to humans.

Exponential Moving Average (EMA)

Weighted moving average giving more importance to recent data points.

F

F1 score

Harmonic mean of precision and recall; balances false positives and false negatives.

Factorization Machine

Model that captures interactions between variables using factorized parameters.

Fairlearn

Open-source toolkit for assessing and improving fairness in AI systems.

Fairness

Absence of unjust bias in model outcomes across demographic groups; measured by metrics like demographic parity.

FastText

Efficient text classification and representation learning library by Facebook AI.

Feature Drift

Change in the statistical properties of features over time.

Feature engineering

Process of creating, transforming, or selecting variables to improve model performance.

Feature Map

Output of a convolutional layer representing detected features.

Feature Scaling

Normalizing or standardizing features to improve model training.

Feature store

Centralized repository for storing and serving machine learning features.

Federated Averaging

Algorithm for aggregating model updates in federated learning.

Federated learning

Training models across multiple devices or servers holding local data samples without exchanging them.

Few‑shot learning

Learning to perform tasks from only a handful of examples.

Few‑shot prompting

Providing a small number of examples in a prompt to guide a language model’s output.

Fine‑tuning

Adapting a pretrained model on a smaller, task‑specific dataset to improve performance.

Fisher Score

Feature selection method based on class separability.

Forward Propagation

Process of passing inputs through a network to obtain outputs.

Foundation model

Large pretrained model adaptable to many downstream tasks via prompting or fine‑tuning.

Fourier Transform

Mathematical transform decomposing signals into frequency components.

FP16/BF16

Reduced‑precision number formats for faster training/inference with minimal accuracy loss.

Frame Rate

Number of frames displayed per second in video processing.

G

Gated Recurrent Unit (GRU)

RNN variant that uses gating mechanisms to control information flow.

Gaussian Mixture Model (GMM)

Probabilistic model representing data as a mixture of Gaussian distributions.

General Adversarial Training

Training method to improve robustness against adversarial examples.

Generalization

Model’s ability to perform well on unseen data rather than just training data.

Generalized Linear Model (GLM)

Statistical model generalizing linear regression for various distributions.

Generative adversarial network (GAN)

Two‑network setup (generator vs. discriminator) trained in opposition to produce realistic data.

Geometric Deep Learning

Deep learning methods for non-Euclidean data like graphs and manifolds.

Gibbs sampling

MCMC algorithm sampling each variable from its conditional distribution to approximate joint distributions.

Global Average Pooling

Pooling operation that averages each feature map into a single value.

Gradient Boosting

Ensemble method that builds models sequentially to correct previous errors.

Gradient clipping

Technique to prevent exploding gradients by capping their magnitude during training.

Gradient descent

Optimization method that iteratively updates parameters to minimize loss using gradients.

Gradient vanishing

Problem where gradients become too small for effective learning in deep networks.

Graph neural network (GNN)

Neural architecture operating on graphs via message passing between nodes and edges.

Greedy Algorithm

Algorithm that makes the locally optimal choice at each step.

Grid Search

Exhaustive search over specified hyperparameter values.

H

Hallucination

Confident but incorrect or fabricated output produced by a model, especially in generative tasks.

Hamming Distance

Number of positions at which two strings of equal length differ.

Hard Attention

Attention mechanism that selects discrete parts of the input.

Hashing Trick

Method to map features to indices in a fixed-size vector.

Hebbian Learning

Learning rule stating that neurons that fire together wire together.

Heuristic

Practical rule‑of‑thumb method to solve problems efficiently, not guaranteed to be optimal.

Hidden state

Internal representation maintained by models (e.g., RNNs) across sequences.

Hierarchical Clustering

Clustering method building a hierarchy of clusters.

Hierarchical Softmax

Efficient softmax computation for large vocabularies.

Histogram of Oriented Gradients (HOG)

Feature descriptor for object detection in images.

Homomorphic Encryption

Encryption allowing computation on ciphertexts without decryption.

Hubness

Phenomenon in high-dimensional spaces where some points appear frequently as nearest neighbors.

Human‑in‑the‑loop

Approach where human feedback is integrated into the AI training or decision‑making process.

Hybrid Model

Model combining different AI techniques, e.g., symbolic and neural.

Hyperparameter

Configurable parameter set before training (e.g., learning rate, batch size).

Hyperparameter optimization

Systematic search for best hyperparameters (grid/random search, Bayesian optimization).

I

Image Augmentation

Techniques to increase dataset size by transforming images.

Image Captioning

Generating textual descriptions for images using AI.

Imbalanced Learning

Techniques for handling datasets with unequal class distribution.

Imputation

Filling in missing data using strategies like mean, KNN, or model‑based estimates.

Incremental Learning

Training models continuously with new data without retraining from scratch.

Indexing

Organizing data for fast retrieval in databases or search engines.

Inductive Bias

Assumptions a model makes to generalize beyond training data.

Inference

Using a trained model to make predictions or generate outputs on new inputs.

Instance Segmentation

Detecting and delineating each object instance in an image.

Integrated Gradients

Explainability method attributing model predictions to input features.

Interpretability

Degree to which a human can understand a model’s internal mechanics or reasons for outputs.

Intersection over Union (IoU)

Overlap metric for object detection/segmentation comparing predicted and true regions.

In‑context learning

Model adapts behavior from examples provided in the prompt without updating weights.

J

Jaccard index

Similarity metric equal to intersection over union of sets.

Jacobian Matrix

Matrix of all first-order partial derivatives of a vector-valued function.

Jensen–Shannon divergence

Symmetric measure of similarity between probability distributions derived from KL divergence.

Jittering

Data augmentation technique adding noise to inputs.

Joint Embedding

Mapping multiple modalities into a shared representation space.

Joint Probability

Probability of two events occurring together.

JSON-LD

Lightweight Linked Data format used for semantic web and SEO.

K

K-Nearest Neighbors (KNN)

Instance-based learning algorithm classifying based on nearest neighbors.

Kalman Filter

Algorithm for estimating the state of a system from noisy observations.

Kernel Function

Function used in SVMs to transform data into higher dimensions.

Keyphrase Extraction

Identifying important phrases from text.

Knowledge Base

Structured repository of facts and relationships.

Knowledge distillation

Training a smaller student model to mimic a larger teacher model’s behavior.

Knowledge graph

Structured representation of entities and their relationships, often used for reasoning and search.

Knowledge Tracing

Modeling a learner’s knowledge state over time.

K‑fold cross‑validation

Splits data into k folds; trains on k‑1 folds and validates on the remaining fold, rotating across folds.

k‑means

Clustering algorithm assigning points to the nearest centroid and updating centroids iteratively.

L

Label Encoding

Converting categorical labels into numeric form.

Label leakage

When training data includes information unavailable at prediction time, inflating performance estimates.

Label smoothing

Regularization that softens hard labels to reduce overconfidence.

Lagrange Multiplier

Optimization method for constrained problems.

Language Model

Model predicting the probability of word sequences.

Large language model (LLM)

Model trained on massive corpora to understand and generate human‑like text.

Latent space

Compressed representation space where similar inputs are close together.

Layer Normalization

Normalization applied across features in a layer.

Leaky ReLU

Activation function allowing a small gradient for negative inputs.

Learning Curve

Graph showing model performance over training iterations.

Learning rate

Step size for updating parameters during optimization.

Lexical Semantics

Study of word meanings and relationships.

LightGBM

Gradient boosting framework optimized for speed and memory.

Linear Discriminant Analysis (LDA)

Technique for dimensionality reduction and classification.

Local Outlier Factor (LOF)

Algorithm for detecting density-based anomalies.

Log loss (cross‑entropy)

Loss function measuring the difference between predicted probabilities and true labels.

Logistic Regression

Statistical model for binary classification.

Long Short-Term Memory (LSTM)

RNN variant designed to capture long-term dependencies.

LoRA (Low‑Rank Adaptation)

Parameter‑efficient fine‑tuning method for large language models.

Low-Rank Approximation

Matrix approximation using fewer dimensions.

M

Machine learning (ML)

Field where models learn patterns from data to make predictions or decisions.

Manifold Learning

Techniques for nonlinear dimensionality reduction.

MapReduce

Programming model for processing large datasets in parallel.

Markov Chain

Stochastic process with memoryless transitions.

Markov decision process (MDP)

Framework modeling decision‑making with states, actions, transitions, and rewards.

Masked Language Model

Model predicting masked tokens in a sequence.

Max Pooling

Pooling operation selecting the maximum value in a region.

Mean Average Precision (mAP)

Metric summarizing precision across recall levels; popular in detection tasks.

Mean Squared Error (MSE)

Loss function measuring average squared difference between predictions and targets.

Median Absolute Error

Robust metric measuring median of absolute errors.

Memory Network

Neural network with an explicit memory component.

Meta Reinforcement Learning

RL approach where agents learn to adapt quickly to new tasks.

Meta‑learning

Learning to learn — models that improve their learning process over multiple tasks.

Min-Max Scaling

Scaling features to a fixed range, usually [0,1].

Mini-Batch Gradient Descent

Gradient descent variant using small random subsets of data.

Mixture of experts

Architecture where different sub‑models specialize in different parts of the input space.

MLOps

Practices for reliable model development, deployment, monitoring, and governance in production.

Mode Collapse

GAN failure mode where generator produces limited variety.

Model card

Documentation summarizing a model’s intended use, performance, data, risks, and limitations.

Model Compression

Techniques to reduce model size while preserving accuracy.

Model Ensemble

Combining multiple models to improve performance.

Model Interpretability

Understanding how a model makes predictions.

Momentum

Optimization technique accelerating gradient descent in relevant directions.

Monte Carlo

Methods that rely on repeated random sampling to estimate numeric results.

Monte Carlo Tree Search

Search algorithm combining random sampling and tree search.

Multi-Head Attention

Transformer mechanism with multiple attention layers in parallel.

Multilayer Perceptron (MLP)

Feedforward neural network with multiple layers.

Multimodal Fusion

Combining information from multiple data modalities.

Multi‑modal learning

Training models that process and relate information from multiple data types (e.g., text, image, audio).

N

Naive Bayes

Probabilistic classifier assuming feature independence.

Named Entity Linking

Connecting named entities in text to knowledge base entries.

Named entity recognition (NER)

NLP task to identify and classify entities like people, places, and organizations in text.

Natural language processing (NLP)

Subfield focused on enabling machines to understand and generate human language.

Negative Sampling

Training technique for word embeddings.

NeRF (Neural Radiance Fields)

Technique for synthesizing novel views of complex 3D scenes from 2D images.

Neural architecture search (NAS)

Automated search over network designs to optimize performance under constraints.

Neural Machine Translation (NMT)

Using neural networks for language translation.

Neural network

Composed of layers of neurons with learnable weights and nonlinear activations.

Neuroevolution

Evolving neural network architectures using genetic algorithms.

Noise Contrastive Estimation

Training method using noise samples to estimate probabilities.

Non-Maximum Suppression

Algorithm to remove redundant bounding boxes in object detection.

Normalization

Scaling inputs or features to stabilize training (e.g., z‑score, min‑max).

Normalization Layer

Layer that normalizes inputs to stabilize training.

Numerical Stability

Avoiding numerical errors in computations.

N‑shot learning

Learning from exactly N examples per class; includes few‑shot and one‑shot settings.

O

Object Tracking

Following objects across frames in a video.

Objective function

Quantity optimized during training (loss to minimize or reward to maximize).

One-Class SVM

SVM variant for anomaly detection.

One‑hot encoding

Representing categorical variables as binary vectors.

Online Learning

Updating models incrementally as new data arrives.

Ontology

Structured representation of concepts and relationships within a domain.

Optimization

Algorithms and methods to find parameters that minimize loss (SGD, Adam).

Ordinal Encoding

Encoding categorical variables with integer values respecting order.

Out-of-Distribution Detection

Identifying inputs that differ from training data distribution.

Outlier detection

Identifying data points that differ significantly from the majority.

Over-parameterization

Using more parameters than necessary to fit data.

Overfitting

When a model memorizes training data patterns and performs poorly on new data.

P

PageRank

Algorithm for ranking web pages based on link structure.

Pairwise Ranking

Learning to rank items based on pairwise comparisons.

Parameter

Learnable weight or bias updated during training.

Parameter‑efficient tuning

Fine‑tuning methods that adjust only a small subset of model parameters.

Parzen Window

Non-parametric technique for probability density estimation.

Pattern Recognition

Identifying patterns and regularities in data.

Perceptron

Early type of artificial neuron used in simple linear classifiers.

Perceptron Learning Rule

Algorithm for training a perceptron.

Permutation Importance

Feature importance measure based on shuffling values.

Perplexity

Language modeling metric indicating how well a probability model predicts a sample.

Pipeline

End‑to‑end workflow from data ingestion to deployment and monitoring.

Pixel RNN

RNN architecture for modeling images pixel by pixel.

Poisoning attack

Manipulating training data to cause a model to learn harmful behavior.

Poisson Regression

Regression model for count data.

Policy Gradient

RL method optimizing policy directly via gradient ascent.

Polynomial Regression

Regression model using polynomial features.

Pose Estimation

Detecting positions of key points in objects or humans.

Precision

Proportion of predicted positives that are true positives; controls false positives.

Precision-Recall Curve

Graph showing trade-off between precision and recall across different thresholds.

Principal Component Analysis (PCA)

Dimensionality reduction technique projecting data onto orthogonal components capturing maximum variance.

Probabilistic Graphical Model

Model representing variables and their conditional dependencies via a graph.

Prompt engineering

Designing inputs and instructions to steer model behavior toward desired outputs.

Prompt injection

Adversarial technique where malicious instructions are embedded in prompts to manipulate model output.

Pruning

Removing parameters or connections to reduce model size and latency with minimal accuracy loss.

Q

Quantization

Reducing the precision of model weights/activations (e.g., from 32‑bit to 8‑bit) to shrink size and speed up inference.

Quantum Machine Learning

Applying quantum computing to enhance machine learning algorithms.

Qubit

Quantum computing unit representing both 0 and 1 simultaneously via superposition.

Query Expansion

Adding related terms to a search query to improve retrieval results.

Queueing Theory

Mathematical study of waiting lines, applicable to system performance modeling.

Q‑learning

RL algorithm that learns action‑value function estimating expected future reward for state‑action pairs.

R

Random Forest

Ensemble of decision trees trained on random subsets of data and features.

Random Initialization

Starting model parameters with random values before training.

Ranking Loss

Loss function for learning to rank tasks.

RANSAC

Robust estimation algorithm that fits a model to data containing outliers.

Recall

Proportion of actual positives correctly identified; controls false negatives.

Recall@K

Proportion of relevant items found in the top K results.

Rectified Linear Unit (ReLU)

Activation function outputting zero for negative inputs and the input itself for positive inputs.

Recurrent neural network (RNN)

Neural architecture with loops to process sequential data by maintaining hidden states.

Regularization

Techniques like L1/L2 penalties or dropout to prevent overfitting.

Reinforcement learning (RL)

Learning paradigm where an agent interacts with an environment to maximize cumulative reward.

Reinforcement Learning from Human Feedback (RLHF)

Training method aligning model outputs with human preferences.

Representation learning

Learning useful feature representations automatically from raw data.

Residual connection

Shortcut connection in neural networks that adds input to output to ease training of deep models.

Residual Network (ResNet)

Deep network architecture with skip connections to ease training.

Restricted Boltzmann Machine (RBM)

Stochastic neural network for unsupervised learning.

Reward Function

Function defining the goal in reinforcement learning by assigning rewards to actions.

Re‑ranking

Reordering a list of results based on additional scoring or context.

Ridge Regression

Linear regression with L2 regularization.

ROC Curve

Graph showing true positive rate vs. false positive rate across thresholds.

Root Mean Squared Error (RMSE)

Square root of the average squared differences between predictions and targets.

S

Sample Weighting

Assigning different importance to samples during training.

Sampling Bias

Bias introduced when the sample is not representative of the population.

Satisfiability Problem (SAT)

Problem of determining if there exists an interpretation that satisfies a given Boolean formula.

Scalability

Ability of a system to handle increasing workloads without performance loss.

Scaler

Tool or function to normalize or standardize data.

Self-Organizing Map (SOM)

Unsupervised neural network projecting high-dimensional data to a lower-dimensional grid.

Self‑attention

Mechanism allowing a model to weigh the importance of different parts of the input sequence.

Self‑supervised learning

Learning from unlabeled data by generating labels from the data itself.

Semantic Search

Search technique using meaning rather than exact keyword matching.

Semi-Parametric Model

Model with both parametric and non-parametric components.

Semi‑supervised learning

Training with a small labeled set and a large unlabeled set to improve performance.

Sensitivity

Same as recall; proportion of actual positives correctly identified.

Sentiment analysis

NLP task of determining the emotional tone of text.

Sequence-to-Sequence Model

Model mapping input sequences to output sequences, common in translation.

Shapley values

Game‑theoretic approach to explain the contribution of each feature to a prediction.

Sharding

Splitting a dataset or database into smaller, faster, more easily managed parts.

Sigmoid Function

Activation function mapping inputs to the range (0,1).

Silhouette Score

Metric for evaluating clustering quality.

Simulated Annealing

Probabilistic optimization algorithm inspired by metallurgy.

Skip-Gram

Word2Vec model predicting context words from a target word.

Softmax Function

Function converting logits into probabilities that sum to 1.

Sparse Matrix

Matrix in which most elements are zero.

Spectral Clustering

Clustering method using eigenvalues of similarity matrices.

Speech Synthesis

Generating spoken language from text.

Spike-and-Slab Prior

Bayesian prior combining a point mass at zero and a continuous distribution.

SQL Injection

Security vulnerability allowing execution of arbitrary SQL code.

Stacked Autoencoder

Autoencoder with multiple hidden layers.

State Space Model

Mathematical model describing a system with inputs, outputs, and state variables.

Stemming

Reducing words to their root form.

Stochastic gradient descent (SGD)

Optimization method updating parameters using a random subset (mini‑batch) of data.

Stop Words

Common words often removed in text preprocessing.

Streaming Data

Data generated continuously and processed in real time.

Structural Causal Model

Model representing causal relationships between variables.

Subword Tokenization

Breaking words into smaller units for NLP.

Support Vector Machine (SVM)

Supervised learning model for classification and regression.

Survival Analysis

Statistical analysis of time-to-event data.

Synthetic data

Artificially generated data used to augment or replace real datasets.

Synthetic Minority Oversampling Technique (SMOTE)

Method to address class imbalance by generating synthetic samples.

T

Tabular Data

Data organized in rows and columns.

Target Encoding

Encoding categorical variables using target statistics.

Temperature (in sampling)

Parameter controlling randomness in model output generation.

Temporal Difference Learning

RL method updating value estimates based on other learned estimates.

Tensor

Multidimensional array used in deep learning frameworks.

Tensor Decomposition

Breaking a tensor into simpler components.

Term Frequency-Inverse Document Frequency (TF-IDF)

Statistic reflecting how important a word is to a document in a corpus.

Test Set

Data used to evaluate the final model performance.

Text Classification

Assigning categories to text documents.

Text Mining

Extracting useful information from text data.

Time Series Analysis

Analyzing data points collected or recorded at specific time intervals.

Token Embedding

Vector representation of tokens in NLP.

Tokenization

Splitting text into smaller units (tokens) for processing by NLP models.

Topological Data Analysis

Analyzing the shape of data using topology.

Top‑k sampling

Text generation method that samples from the top k most probable next tokens.

Top‑p (nucleus) sampling

Text generation method that samples from the smallest set of tokens whose cumulative probability exceeds p.

Training Set

Data used to train a model.

Transfer Entropy

Measure of information transfer between variables.

Transfer learning

Reusing a pretrained model on a new but related task to save time and resources.

Transformer

Neural architecture using self‑attention to process sequences in parallel.

Transformer-XL

Transformer variant handling longer context via recurrence.

Transliteration

Converting text from one script to another.

Tree-based Model

Model using decision tree structures for predictions.

True positive rate (TPR)

Same as recall; proportion of actual positives correctly predicted.

Turing Test

Test of a machine’s ability to exhibit intelligent behavior indistinguishable from a human.

U

Under-sampling

Reducing the number of samples in the majority class to balance data.

Underfitting

When a model is too simple to capture underlying patterns in the data.

Uniform Distribution

Probability distribution where all outcomes are equally likely.

Unit Testing

Testing individual components of software for correctness.

Univariate Analysis

Analysis involving a single variable.

Unsupervised learning

Learning patterns from unlabeled data without explicit target outputs.

Unsupervised Pretraining

Training a model on unlabeled data before fine-tuning on labeled data.

Upsampling

Increasing the number of samples in the minority class to balance data.

User-based Collaborative Filtering

Recommendation method using similarities between users.

Utility function

Function that assigns a value to outcomes, guiding decision‑making in AI agents.

V

Validation Curve

Graph showing model performance for different hyperparameter values.

Validation set

Subset of data used to tune model hyperparameters and prevent overfitting.

Value Function

Function estimating expected return in reinforcement learning.

Vanishing Gradient Problem

Issue where gradients become too small for effective learning in deep networks.

Variational autoencoder (VAE)

Generative model that learns a probabilistic latent space for data.

Variational Inference

Approximate Bayesian inference method.

Vector database

Database optimized for storing and querying vector embeddings.

Vector Quantization

Quantizing vectors to a finite set of representative points.

Vectorization

Converting data into numerical vectors for model input.

Version Control

System for tracking changes in code or data.

Video Captioning

Generating textual descriptions for video content.

Virtualization

Creating virtual versions of computing resources.

Vision transformer (ViT)

Transformer architecture adapted for image classification.

Visual Question Answering (VQA)

Answering questions about images using AI.

Voting Classifier

Ensemble method combining predictions via majority vote.

W

Weak Supervision

Training with noisy, limited, or imprecise labels.

Weight decay

Regularization technique adding a penalty proportional to weights’ magnitude.

Weight Initialization

Setting initial values for model parameters before training.

Weighted Average Precision

Precision metric weighted by class importance.

Whisper

OpenAI’s automatic speech recognition (ASR) system.

Whitening Transformation

Transforming data to have zero mean and unit variance with uncorrelated features.

Wide & Deep Model

Model combining wide linear models and deep neural networks.

Window Function

Function applied to a subset of data points in signal processing.

Word embedding

Dense vector representation of words capturing semantic relationships.

Word Sense Disambiguation

Determining which meaning of a word is used in context.

Word2Vec

Neural embedding model that learns vector representations of words from context.

WordPiece

Subword tokenization algorithm used in BERT.

X

X-shape Validation

Cross-validation variant with specific fold arrangements.

Xavier Initialization

Weight initialization method keeping variance constant across layers.

XGBoost

Optimized gradient boosting library for supervised learning tasks.

XLM-R

Cross-lingual language model based on RoBERTa.

XML Parsing

Reading and processing XML data.

XOR Problem

Classic problem showing limitations of simple perceptrons.

Y

YAML

Human-readable data serialization format.

Yield Curve Modeling

Modeling the relationship between interest rates and bond maturities.

YOLO (You Only Look Once)

Real‑time object detection algorithm processing images in a single pass.

YoloX

Advanced YOLO-based object detection model.

Yottabyte

Unit of digital information equal to 10^24 bytes.

YouTube-8M

Large-scale labeled video dataset for machine learning.

Z

Z-Algorithm

String matching algorithm for pattern searching.

Zero Crossing Rate

Rate at which a signal changes sign, used in audio analysis.

Zero Inflation Model

Statistical model for count data with excess zeros.

Zero‑shot learning

Model’s ability to perform tasks without having seen labeled examples during training.

Zipf's Law

Empirical law stating that word frequency is inversely proportional to its rank.

Zonal Statistics

Calculating statistics on values within zones of a raster dataset.

Zoom Augmentation

Image augmentation technique involving zooming in or out.

Z‑score normalization

Scaling method that transforms data to have mean 0 and standard deviation 1.