CUDA

Platforms & Tools

NVIDIA's parallel computing platform and API that allows developers to use NVIDIA GPUs for general-purpose processing, forming the backbone of most AI training and inference workflows.

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA that allows software developers to use NVIDIA GPUs for general-purpose computation beyond graphics rendering. Released in 2007, CUDA provides a C/C++-like programming interface for writing code that runs on the GPU's thousands of parallel cores.

CUDA became the dominant platform for AI and machine learning because the major deep learning frameworks - PyTorch, TensorFlow, and JAX - all use CUDA under the hood to accelerate tensor operations on NVIDIA hardware. When a model is trained or run on a GPU, CUDA handles the low-level work of distributing matrix multiplications, convolutions, and other operations across the GPU's cores. Libraries like cuDNN (for deep learning primitives) and cuBLAS (for linear algebra) build on CUDA to provide optimized implementations of the operations AI models use most.

NVIDIA's dominance in AI hardware is largely due to the CUDA ecosystem rather than raw hardware superiority alone. Decades of tooling, libraries, documentation, and community knowledge create significant switching costs. Competitors like AMD (with ROCm) and Intel (with oneAPI) offer alternative platforms, but most AI code, tutorials, and production deployments assume CUDA availability. This lock-in effect is why CUDA compatibility is often the first question asked when evaluating AI hardware.

Last updated: February 26, 2026

CUDA

Related Terms