GPU Device 0: "Hopper" with compute capability 9.0

Running ........................................................

Overall Time For matrixMultiplyPerf 

Printing Average of 20 measurements in (ms)
Size_KB	 UMhint	UMhntAs	 UMeasy	  0Copy	MemCopy	CpAsync	CpHpglk	CpPglAs
4	  0.210	  0.264	  0.332	  0.014	  0.033	  0.026	  0.037	  0.024
16	  0.201	  0.307	  0.489	  0.025	  0.043	  0.035	  0.046	  0.045
64	  0.311	  0.381	  0.758	  0.067	  0.084	  0.075	  0.074	  0.063
256	  0.545	  0.604	  1.429	  0.323	  0.228	  0.212	  0.197	  0.187
1024	  1.551	  1.444	  2.436	  1.902	  0.831	  0.784	  0.714	  0.728
4096	  4.960	  4.375	  7.863	 11.966	  3.239	  3.179	  2.908	  2.919
16384	 18.911	 17.022	 29.696	 77.375	 13.796	 13.757	 12.874	 12.862

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
