cublasDgemv
cublasDgemv is a routine in NVIDIA’s cuBLAS library that performs a general matrix-vector multiply with double-precision real numbers. It computes y = alpha * op(A) * x + beta * y, where op(A) is either A or its transpose, depending on the trans parameter. The operation is carried out on data stored in device memory, and the function uses a cuBLAS handle and the CUDA stream associated with that handle.
The function signature includes: a cuBLAS handle, a trans option, the dimensions m and n, pointers to
cuBLAS uses column-major storage by default, consistent with traditional BLAS. If your data uses row-major layout,