>_TheQuery
← Glossary

Convolutional Neural Network

Computer Vision

A neural network architecture that uses convolutional layers to automatically learn spatial hierarchies of features, primarily used for image and video analysis.

A convolutional neural network (CNN) is a specialized type of neural network designed to process data with a grid-like topology, such as images. CNNs use convolutional layers that apply learnable filters (kernels) across the input, detecting local patterns like edges, textures, and shapes. This parameter-sharing approach is far more efficient than fully connected layers for spatial data.

A typical CNN architecture consists of alternating convolutional and pooling layers followed by one or more fully connected layers. Convolutional layers extract features at increasing levels of abstraction, pooling layers reduce spatial dimensions and provide a degree of translational invariance, and the fully connected layers at the end perform classification or regression based on the extracted features.

CNNs have driven major advances in computer vision, including image classification (AlexNet, ResNet, EfficientNet), object detection (YOLO, Faster R-CNN), semantic segmentation (U-Net), and image generation. While vision transformers have recently challenged CNN dominance in some benchmarks, CNNs remain widely used due to their efficiency, well-understood behavior, and strong performance especially with limited data.

Last updated: February 20, 2026