top of page
  • We take the advantage of pyCuda, which gives us access to CUDA API.

Implementation

  • Since Most of Nvidia devices only support single-precision.
  • In the first step, Rewrite all the base function with C-z Kernels.   

       -For example 

  • Executing kernels
  • Also, We utilized scikit-cuda to implement functions of Gpu-array

      --for example: safe_sparse_do

""" Dot product that handle the sparse matrix case correctly

Uses BLAS GEMM as replacement for numpy.dot where possible to avoid unnecessary copies."""

bottom of page