Platform: NVIDIA CUDA
Device: NVIDIA GeForce RTX 4090
Driver version : 550.127.05 (Linux x64)
Compute units : 128
Clock frequency : 2520 MHz
Global memory bandwidth (GBPS)
float : 873.20
float2 : 901.24
float4 : 917.89
float8 : 928.70
float16 : 938.94
Single-precision compute (GFLOPS)
float : 84761.26
float2 : 80760.14
float4 : 80512.55
float8 : 79900.18
float16 : 79513.42
No half precision support! Skipped
Double-precision compute (GFLOPS)
double : 1398.84
double2 : 1397.85
double4 : 1394.48
double8 : 1387.83
double16 : 1374.64
Integer compute (GIOPS)
int : 44124.49
int2 : 44080.14
int4 : 43970.14
int8 : 44089.10
int16 : 44104.19
Integer compute Fast 24bit (GIOPS)
int : 44067.89
int2 : 44081.56
int4 : 44038.71
int8 : 43851.83
int16 : 43369.82
Integer char (8bit) compute (GIOPS)
char : 38655.31
char2 : 38334.73
char4 : 37103.88
char8 : 30839.88
char16 : 28388.27
Integer short (16bit) compute (GIOPS)
short : 36869.31
short2 : 35287.81
short4 : 36894.71
short8 : 32896.40
short16 : 28145.07
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 10.68
enqueueReadBuffer : 15.51
enqueueWriteBuffer non-blocking : 10.08
enqueueReadBuffer non-blocking : 13.46
enqueueMapBuffer(for read) : 19.79
memcpy from mapped ptr : 11.54
enqueueUnmap(after write) : 25.13
memcpy to mapped ptr : 11.41
Kernel launch latency : 4.06 us
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)