top of page
-
We take the advantage of pyCuda, which gives us access to CUDA API.
Implementation
- Since Most of Nvidia devices only support single-precision.
- In the first step, Rewrite all the base function with C-z Kernels.
-For example
- Executing kernels
- Also, We utilized scikit-cuda to implement functions of Gpu-array
--for example: safe_sparse_do
""" Dot product that handle the sparse matrix case correctly
Uses BLAS GEMM as replacement for numpy.dot where possible to avoid unnecessary copies."""
bottom of page