CuPy: NumPy & SciPy for GPU-Accelerated Computing in Python

CuPy: NumPy & SciPy for GPU-Accelerated Computing in Python

Summary

CuPy is a powerful Python array library that provides NumPy and SciPy-compatible interfaces for GPU-accelerated computing. It enables users to seamlessly run existing numerical code on NVIDIA CUDA or AMD ROCm platforms with minimal changes. This tool also offers direct access to low-level CUDA features for advanced performance tuning and high-performance scientific computing.

Repository Info

Updated on December 29, 2025
View on GitHub

Tags

Click on any tag to explore related repositories

Introduction

CuPy is an open-source Python array library designed for high-performance GPU-accelerated computing. It offers a NumPy and SciPy-compatible interface, allowing developers to easily port existing numerical code to leverage the power of GPUs. CuPy acts as a drop-in replacement, making it straightforward to accelerate scientific computing and data processing tasks on NVIDIA CUDA or AMD ROCm platforms. Beyond its compatibility, CuPy also provides direct access to low-level CUDA features, enabling advanced users to fine-tune performance and integrate with existing CUDA C/C++ programs.

Installation

CuPy can be installed using pip or conda. Choose the appropriate package based on your GPU platform and CUDA/ROCm version.

Pip

Binary packages (wheels) are available on PyPI for Linux and Windows.

  • For CUDA 12.x:
    pip install cupy-cuda12x
    
  • For CUDA 13.x:
    pip install cupy-cuda13x
    
  • For ROCm 7.0 (experimental):
    pip install cupy-rocm-7-0
    

Conda

Binary packages are also available on Conda-Forge.

  • General CUDA installation:
    conda install -c conda-forge cupy
    
  • To specify a CUDA version (e.g., 12.0):
    conda install -c conda-forge cupy cuda-version=12.0
    

Examples

Here's a quick example demonstrating CuPy's NumPy-like syntax for GPU operations:

import cupy as cp

# Create a CuPy array on the GPU
x = cp.arange(6).reshape(2, 3).astype('f')
print("CuPy array x:")
print(x)

# Perform a sum operation on the GPU
sum_result = x.sum(axis=1)
print("\nSum along axis 1:")
print(sum_result)

Output:

CuPy array x:
[[ 0.  1.  2.]
 [ 3.  4.  5.]]
Sum along axis 1:
[ 3. 12.]

This example shows how CuPy arrays behave similarly to NumPy arrays, but computations are executed on the GPU.

Why Use CuPy?

CuPy offers several compelling advantages for developers working with numerical computations:

  • GPU Acceleration: Leverage the massive parallel processing power of GPUs to significantly speed up computationally intensive tasks, outperforming CPU-only solutions for large datasets.
  • NumPy/SciPy Compatibility: Enjoy a familiar API that mirrors NumPy and SciPy, minimizing the learning curve and facilitating the migration of existing codebases to GPU environments.
  • Low-Level CUDA Access: For advanced users, CuPy provides direct interfaces to CUDA features like RawKernels, Streams, and Runtime APIs, allowing for fine-grained control and optimization of GPU operations.
  • Broad Platform Support: CuPy supports both NVIDIA CUDA and AMD ROCm platforms, offering flexibility across different hardware environments.
  • Active Community and Development: Backed by Preferred Networks and a vibrant community, CuPy is continuously evolving with new features and improvements.

Links