top of page
-
We take the advantage of pyCuda, which gives us access to CUDA API.
Implementation
- Since Most of Nvidia devices only support single-precision.

- In the first step, Rewrite all the base function with C-z Kernels.
-For example

- Executing kernels


- Also, We utilized scikit-cuda to implement functions of Gpu-array
--for example: safe_sparse_do
""" Dot product that handle the sparse matrix case correctly
Uses BLAS GEMM as replacement for numpy.dot where possible to avoid unnecessary copies."""

bottom of page