DEV Community

Scout
Scout

Posted on

OpenCL performance on Nvidia Tesla T4 by Clpeak

OS: ubuntu 2204

Platform: NVIDIA CUDA
  Device: Tesla T4
    Driver version  : 550.54.14 (Linux x64)
    Compute units   : 40
    Clock frequency : 1590 MHz

    Global memory bandwidth (GBPS)
      float   : 233.86
      float2  : 246.88
      float4  : 252.89
      float8  : 263.44
      float16 : 252.01

    Single-precision compute (GFLOPS)
      float   : 8013.07
      float2  : 8140.24
      float4  : 7994.40
      float8  : 7880.84
      float16 : 7770.41

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 254.21
      double2  : 253.94
      double4  : 253.34
      double8  : 252.16
      double16 : 249.72

    Integer compute (GIOPS)
      int   : 6330.65
      int2  : 6220.86
      int4  : 6230.67
      int8  : 6310.22
      int16 : 6196.06

    Integer compute Fast 24bit (GIOPS)
      int   : 6230.43
      int2  : 6212.05
      int4  : 6231.91
      int8  : 6148.55
      int16 : 6054.79

    Integer char (8bit) compute (GIOPS)
      char   : 4870.88
      char2  : 4821.29
      char4  : 4841.00
      char8  : 4881.14
      char16 : 4226.42

    Integer short (16bit) compute (GIOPS)
      short   : 4885.94
      short2  : 4650.36
      short4  : 4744.20
      short8  : 4296.34
      short16 : 4133.46

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 3.83
      enqueueReadBuffer               : 4.26
      enqueueWriteBuffer non-blocking : 3.81
      enqueueReadBuffer non-blocking  : 4.14
      enqueueMapBuffer(for read)      : 6.47
        memcpy from mapped ptr        : 8.20
      enqueueUnmap(after write)       : 6.08
        memcpy to mapped ptr          : 8.14

    Kernel launch latency : 4.86 us
Enter fullscreen mode Exit fullscreen mode

Top comments (0)