Requirement : Should have NVIDIA Graphic Card Environment : NVIDIA CUDA toolkit 10.1
(Download link : https://developer.nvidia.com/cuda-10.1-download-archive-update2?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal)
These files are to optimize some nuisance loops in multiplying two very big sized matrices. You have to understand the CUDA concept that GPU uses multi-threads when calculating very big but simple operations. If you compile and run the codes, then you can check which programming method is faster than the other (CPU vs GPU).
