pgstrom: Checking GPU performance using PostgreSQL

19 responses to “pgstrom: Checking GPU performance”

Ernst-Georg Schmid says:

September 9, 2015 at 7:56 pm

> The interesting thing to notice is that the real improvment can be seen because of the GROUP BY clause.

Having done a bit of programming in OpenCL, I'd say the improvement comes from the GROUP BY implemented as parallel reduction, a pattern where GPGPUs really shine.

Unfortunately, the choice of CUDA limits the range of supported devices. As you mentioned, transferring data between host and device memory can kill performance easily, so integrated GPGPUs like Intel's IRIS/Pro often show comparatively good performance because they use the same slow main memory as the CPU but in return do not need those transfers. In addition, this allows to spread the work over both, CPU and GPGPU.

Since recent GPGPUs allow multiple concurrent kernels, combining GPGPU parallel processing with columnar data stores seems also very promising. Ah, so many options, so little time...

Reply
- Hans-Jürgen Schönig says:
  
  September 10, 2015 at 6:38 am
  
  this seems to be the issue here. i am planning to give it some more tests with more grouping and so on. it seems grouping is where pgstrom really excels. in addition to that i am really looking forward to seeing, what sorts can do once they are done the way they are planned. we got interesting times ahead.
  
  Reply
- KaiGai Kohei says:
  
  September 11, 2015 at 1:03 pm
  
  > Unfortunately, the choice of CUDA limits the range of supported devices.
  
  The previous version of PG-Strom used OpenCL, however, I backed to CUDA because of driver's quality and debug support.
  CUDA has widespread user base, thus gives us stable run-time environment. On the other hands, I faced some strange behavior on *ntel's driver when PG-Strom used OpenCL implementation. It was hard time for me...
  
  Reply
  - Hans-Jürgen Schönig says:
    
    September 11, 2015 at 1:06 pm
    
    what you have achieved is definitely beyond incredible. i was stunned when i tested things. not a single segfault ... despite the size of the code.
    
    Reply
    - KaiGai Kohei says:
      
      September 11, 2015 at 1:12 pm
      
      If you give me the back trace of the crash, it may help to fix.
      Also, I merged cumulative bugfixes around GpuJoin code. If you can, please retry with the latest master branch.
      
      Reply
  - Ernst-Georg Schmid says:
    
    September 11, 2015 at 10:16 pm
    
    Oh, this was no criticism. GPGPU acceleration for PostgreSQL is way cool, no matter what API. 🙂
    
    Reply
  - Ernst-Georg Schmid says:
    
    September 12, 2015 at 8:11 pm
    
    I just tried pg_strom:
    
    LOG: CUDA Runtime version: 6.5.0
    LOG: NVIDIA driver version: 340.76
    LOG: GPU0 Quadro K1100M (384 CUDA cores, 705MHz), L2 256KB, RAM 2047MB (128bits, 1400MHz), capability 3.0
    LOG: NVRTC - CUDA Runtime Compilation vertion 7.5
    
    But when I try to run a query with pg_strom.enabled = on:
    
    ERROR: failed on cuModuleLoadData (CUDA_ERROR_NO_BINARY_FOR_GPU - no kernel image is available for execution on the device)
    
    Any hints what causes this?
    
    Reply
    - Ernst-Georg Schmid says:
      
      September 12, 2015 at 9:35 pm
      
      Seems to be a driver conflict. But on Ubuntu 14.04 the latest official driver is 346 and then pg_strom does not compile:
      
      src/cuda_control.c:2522:4: error: ‘CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED’ undeclared (first use in this function)
      {CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED,
      ^
      
      src/cuda_control.c:2522:4: note: each undeclared identifier is reported only once for each function it appears in
      src/cuda_control.c:2524:4: error: ‘CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED’ undeclared (first use in this function)
      {CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED,
      ^
      
      and so on...
      
      Reply
      - Ernst-Georg Schmid says:
        
        September 12, 2015 at 11:17 pm
        
        Ok, works now
    - KaiGai Kohei says:
      
      September 13, 2015 at 2:48 am
      
      I guess you use unsupported CUDA version (6.5), please ensure CUDA 7.0 or later is installed.
      
      LOG: CUDA Runtime version: 6.5.0
      LOG: NVIDIA driver version: 340.76
      
      Also, PG-Strom should have version check here.
      Thanks for your feedback. If you can, please file your troubles in the project github.
      https://github.com/pg-strom/devel/issues
      
      Reply
  - ⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡ says:
    
    September 11, 2016 at 2:42 pm
    
    Could you revisit the OpenCL implementation in the current landscape (2016)?
    
    Maybe things have improved.
    
    Worst case, ignore *ntel and focus on AMD/Nvidia.
    
    Reply
    - KaiGai Kohei says:
      
      September 11, 2016 at 2:52 pm
      
      No, what I have to do "first" is provision of a working, valuable and stable software for users.
      Other comprehensive might change, however, I already built many stuffs on CUDA.
      Switch of the platform makes the v1.0 delayed. Sorry.
      
      Reply
      - ⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡ says:
        
        September 11, 2016 at 2:56 pm
        
        HIP : C Heterogeneous-Compute Interface for Portability
        http://gpuopen.com/compute-product/hip-convert-cuda-to-portable-c-code/
        
        OpenCL for AMD
        CUDA for NVIDIA
        WIN / WIN
        
        From the link: "To further reduce the learning curve when moving from Cuda to HIP, we developed the hipify tool to automate your application’s core conversion."
        
        Maybe for future consideration?
      - ⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡ says:
        
        September 21, 2016 at 8:34 pm
        
        For future reference: https://github.com/pg-strom/devel/issues/245
  - ⚡ ⚕ Ayy LOLz LMAO ⚕ ⚡ says:
    
    September 11, 2016 at 2:49 pm
    
    GPUOpen is being pushed a lot:
    http://gpuopen.com/
    
    HIP : C Heterogeneous-Compute Interface for Portability:
    http://gpuopen.com/compute-product/hip-convert-cuda-to-portable-c-code/
    
    OpenCL for AMD
    CUDA fro NVIDIA
    WIN / WIN
    
    Reply
Nikos says:

May 4, 2017 at 8:08 pm

Hi! Very interesting article. Is it possible to use PG-Storm with PostgreSQL in Windows 10 Pro? Alternatively, would it be straightforward to rebuild for Windows, or does it depend on Linux-specific libraries?

Reply
- Hans-Jürgen Schönig says:
  
  May 4, 2017 at 8:16 pm
  
  we did not dare to run that on Windows.
  
  Reply
  - Nikos says:
    
    May 4, 2017 at 9:31 pm
    
    Why not, is it not possible? I've downloaded PG-Storm and trying to figure out if/how to compile/build it using Visual Studio or any other appropriate toolchain.
    
    Any help/advise would be kindly accepted! 🙂
    
    Reply
Anthony DeMaio says:

October 14, 2018 at 3:08 pm

Hi Hans-Jürgen, Can you please provide an update to this article with testing of PG-strom 2.0? I'm interested to know if the performance has improved.

3. PG-Strom v2.0 features highlight PG-Strom v2.0 Release Technical Brief (17-Apr-2018)3 ▌Storage Enhancement  SSD-to-GPU Direct SQL Execution  In-memory Columnar Cache  GPU memory store (gstore_fdw) ▌Advanced SQL Infrastructure  PostgreSQL v9.6/v10 support – CPU GPU Hybrid Parallel  SCAN JOIN GROUP BY combined GPU kernel  Utilization of demand paging of GPU device memory ▌Miscellaneous  PL/CUDA related enhancement  New data type support  Documentation and Packaging

Reply

pgstrom: Checking GPU performance

Queries can benefit

Or queries can be slower

One word about sorts

Stability

19 responses to “pgstrom: Checking GPU performance”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts