./dct8x8 Starting...

GPU Device 0: "Hopper" with compute capability 9.0

CUDA sample DCT/IDCT implementation
===================================
Loading test image: teapot512.bmp... [512 x 512]... Success
Running Gold 1 (CPU) version... Success
Running Gold 2 (CPU) version... Success
Running CUDA 1 (GPU) version... Success
Running CUDA 2 (GPU) version... 82435.220134 MPix/s //0.003180 ms
Success
Running CUDA short (GPU) version... Success
Dumping result to teapot512_gold1.bmp... Success
Dumping result to teapot512_gold2.bmp... Success
Dumping result to teapot512_cuda1.bmp... Success
Dumping result to teapot512_cuda2.bmp... Success
Dumping result to teapot512_cuda_short.bmp... Success
Processing time (CUDA 1)    : 0.021800 ms 
Processing time (CUDA 2)    : 0.003180 ms 
Processing time (CUDA short): 0.033000 ms 
PSNR Original    <---> CPU(Gold 1)    : 32.527462
PSNR Original    <---> CPU(Gold 2)    : 32.527309
PSNR Original    <---> GPU(CUDA 1)    : 32.527184
PSNR Original    <---> GPU(CUDA 2)    : 32.527054
PSNR Original    <---> GPU(CUDA short): 32.501888
PSNR CPU(Gold 1) <---> GPU(CUDA 1)    : 62.845787
PSNR CPU(Gold 2) <---> GPU(CUDA 2)    : 66.982300
PSNR CPU(Gold 2) <---> GPU(CUDA short): 40.958466

Test Summary...
Test passed
