[[histogram]] - Starting...
GPU Device 0: "Hopper" with compute capability 9.0

CUDA device [NVIDIA H100 PCIe] has 114 Multi-Processors, Compute 9.0
Initializing data...
...allocating CPU memory.
...generating input data
...allocating GPU memory and copying input data

Starting up 64-bin histogram...

Running 64-bin GPU histogram for 67108864 bytes (16 runs)...

histogram64() time (average) : 0.00007 sec, 916944.3386 MB/sec

histogram64, Throughput = 916944.3386 MB/s, Time = 0.00007 s, Size = 67108864 Bytes, NumDevsUsed = 1, Workgroup = 64

Validating GPU results...
 ...reading back GPU results
 ...histogram64CPU()
 ...comparing the results...
 ...64-bin histograms match

Shutting down 64-bin histogram...


Initializing 256-bin histogram...
Running 256-bin GPU histogram for 67108864 bytes (16 runs)...

histogram256() time (average) : 0.00018 sec, 379951.1088 MB/sec

histogram256, Throughput = 379951.1088 MB/s, Time = 0.00018 s, Size = 67108864 Bytes, NumDevsUsed = 1, Workgroup = 192

Validating GPU results...
 ...reading back GPU results
 ...histogram256CPU()
 ...comparing the results
 ...256-bin histograms match

Shutting down 256-bin histogram...


Shutting down...

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

[histogram] - Test Summary
Test passed
