top of page
-
We take the advantage of pyCuda, which gives us access to CUDA API.
Implementation
- Since Most of Nvidia devices only support single-precision.
data:image/s3,"s3://crabby-images/e9d36/e9d364362b702853f767ccde5a438f07a20510bc" alt=""
- In the first step, Rewrite all the base function with C-z Kernels.
-For example
data:image/s3,"s3://crabby-images/1a2c2/1a2c2bcf5d1e51c4129eed9a8ab577bef27644b4" alt=""
- Executing kernels
data:image/s3,"s3://crabby-images/cbd69/cbd692b32ed7731e4181fec6879df144c87292c1" alt=""
data:image/s3,"s3://crabby-images/e176b/e176bb41cbfdb38768c1f966a443b1d4bc522e01" alt=""
- Also, We utilized scikit-cuda to implement functions of Gpu-array
--for example: safe_sparse_do
""" Dot product that handle the sparse matrix case correctly
Uses BLAS GEMM as replacement for numpy.dot where possible to avoid unnecessary copies."""
data:image/s3,"s3://crabby-images/87762/87762bd124ebf07ccf5b3c59b899496f6053f740" alt=""
bottom of page