OS: ubuntu 2204
Platform: NVIDIA CUDA
Device: Tesla T4
Driver version : 550.54.14 (Linux x64)
Compute units : 40
Clock frequency : 1590 MHz
Global memory bandwidth (GBPS)
float : 233.86
float2 : 246.88
float4 : 252.89
float8 : 263.44
float16 : 252.01
Single-precision compute (GFLOPS)
float : 8013.07
float2 : 8140.24
float4 : 7994.40
float8 : 7880.84
float16 : 7770.41
No half precision support! Skipped
Double-precision compute (GFLOPS)
double : 254.21
double2 : 253.94
double4 : 253.34
double8 : 252.16
double16 : 249.72
Integer compute (GIOPS)
int : 6330.65
int2 : 6220.86
int4 : 6230.67
int8 : 6310.22
int16 : 6196.06
Integer compute Fast 24bit (GIOPS)
int : 6230.43
int2 : 6212.05
int4 : 6231.91
int8 : 6148.55
int16 : 6054.79
Integer char (8bit) compute (GIOPS)
char : 4870.88
char2 : 4821.29
char4 : 4841.00
char8 : 4881.14
char16 : 4226.42
Integer short (16bit) compute (GIOPS)
short : 4885.94
short2 : 4650.36
short4 : 4744.20
short8 : 4296.34
short16 : 4133.46
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 3.83
enqueueReadBuffer : 4.26
enqueueWriteBuffer non-blocking : 3.81
enqueueReadBuffer non-blocking : 4.14
enqueueMapBuffer(for read) : 6.47
memcpy from mapped ptr : 8.20
enqueueUnmap(after write) : 6.08
memcpy to mapped ptr : 8.14
Kernel launch latency : 4.86 us
Top comments (0)