CYBERTEC PostgreSQL Logo

pgstrom: Checking GPU performance

09.2015 / Category: / Tags: |

Finally I got around to take a more extensive look at pgstrom (a module to make use of GPUs). The entire GPU thing fascinates me, and I was excited to see the first real performance data.

Here is some simple test data:

5 million rows should be enough to get a first impression of what is going on.

Queries can benefit

To make sure that a real difference can actually be observed, I have decided to use no indexes. In real life, this is not too realistic because performance would suffer in a horrible way. pgstrom has not been made to speed up index lookups anyway so this should not be an issue.

The first thing I tried was to filter and group some data on the CPU:

My box (4 GHz AMD) can do that in just under 4 seconds. Note that I am using the standard PostgreSQL storage manager here (no column store or so).

Let us try the same thing on the GPU:

We see a nice improvement here. The speedup is incredible - especially when taking into consideration that getting the data already takes more than a second. It seems moving stuff out to the GPU definitely pays off in this case.

The interesting thing to notice is that the real improvement can be seen because of the GROUP BY clause. A normal filter does not show a benefit:

It certainly makes sense that there is no improvement in this case because moving data around is simply too expensive to make a difference. Remember: GPUs only make sense if things can be done in parallel and if data is coming fast enough. sqrt is not complicated enough to justify the effort of moving data around and PostgreSQL cannot provide data fast enough.

Or queries can be slower

It is important to mention that many queries won't benefit from the GPU at all. In fact, I would expect than the majority of queries in a usual system will not behave differently.

Here is an example of a query, which is actually slower with pgstrom:

In this case the GPU seems like a loss - at least there is no benefit to be expected at this stage.

One word about sorts

According to the main developer of pgstrom sorting is not yet as good as he wants it to be, so I skipped the sort part for now. As sorts are key to many queries, there is still pgstrom functionality I am really looking forward to.

I assume that sorts can greatly benefit from a GPU because there is a lot of intrinsic parallelism in a sort algorithm. Therefore sorting on the GPU could be highly beneficial. The speedup we can expect is hard to predict but I firmly believe that it can be quite substantial.

Stability

What stunned me is that I have not encountered a single segmentation fault during my tests. I definitely did not expect that. My assumption was that there would be more loose ends but actually things worked as expected most of the time - given the stage of the project I am pretty excited. pgstrom certainly feels like the future ...

Find all the latest CYBERTEC blog posts by Hans-Jürgen Schönig, Laurenz Albe, Pavlo Golub and others in our Performance blog spot.

19 responses to “pgstrom: Checking GPU performance”

  1. > The interesting thing to notice is that the real improvment can be seen because of the GROUP BY clause.

    Having done a bit of programming in OpenCL, I'd say the improvement comes from the GROUP BY implemented as parallel reduction, a pattern where GPGPUs really shine.

    Unfortunately, the choice of CUDA limits the range of supported devices. As you mentioned, transferring data between host and device memory can kill performance easily, so integrated GPGPUs like Intel's IRIS/Pro often show comparatively good performance because they use the same slow main memory as the CPU but in return do not need those transfers. In addition, this allows to spread the work over both, CPU and GPGPU.

    Since recent GPGPUs allow multiple concurrent kernels, combining GPGPU parallel processing with columnar data stores seems also very promising. Ah, so many options, so little time...

    • this seems to be the issue here. i am planning to give it some more tests with more grouping and so on. it seems grouping is where pgstrom really excels. in addition to that i am really looking forward to seeing, what sorts can do once they are done the way they are planned. we got interesting times ahead.

    • > Unfortunately, the choice of CUDA limits the range of supported devices.

      The previous version of PG-Strom used OpenCL, however, I backed to CUDA because of driver's quality and debug support.
      CUDA has widespread user base, thus gives us stable run-time environment. On the other hands, I faced some strange behavior on *ntel's driver when PG-Strom used OpenCL implementation. It was hard time for me...

      • what you have achieved is definitely beyond incredible. i was stunned when i tested things. not a single segfault ... despite the size of the code.

        • If you give me the back trace of the crash, it may help to fix.
          Also, I merged cumulative bugfixes around GpuJoin code. If you can, please retry with the latest master branch.

      • I just tried pg_strom:

        LOG: CUDA Runtime version: 6.5.0
        LOG: NVIDIA driver version: 340.76
        LOG: GPU0 Quadro K1100M (384 CUDA cores, 705MHz), L2 256KB, RAM 2047MB (128bits, 1400MHz), capability 3.0
        LOG: NVRTC - CUDA Runtime Compilation vertion 7.5

        But when I try to run a query with pg_strom.enabled = on:

        ERROR: failed on cuModuleLoadData (CUDA_ERROR_NO_BINARY_FOR_GPU - no kernel image is available for execution on the device)

        Any hints what causes this?

        • Seems to be a driver conflict. But on Ubuntu 14.04 the latest official driver is 346 and then pg_strom does not compile:

          src/cuda_control.c:2522:4: error: ‘CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED’ undeclared (first use in this function)
          {CU_DEVICE_ATTRIBUTE_GLOBAL_L1_CACHE_SUPPORTED,
          ^

          src/cuda_control.c:2522:4: note: each undeclared identifier is reported only once for each function it appears in
          src/cuda_control.c:2524:4: error: ‘CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED’ undeclared (first use in this function)
          {CU_DEVICE_ATTRIBUTE_LOCAL_L1_CACHE_SUPPORTED,
          ^

          and so on...

        • I guess you use unsupported CUDA version (6.5), please ensure CUDA 7.0 or later is installed.

          LOG: CUDA Runtime version: 6.5.0
          LOG: NVIDIA driver version: 340.76

          Also, PG-Strom should have version check here.
          Thanks for your feedback. If you can, please file your troubles in the project github.
          https://github.com/pg-strom/devel/issues

      • Could you revisit the OpenCL implementation in the current landscape (2016)?

        Maybe things have improved.

        Worst case, ignore *ntel and focus on AMD/Nvidia.

  2. Hi! Very interesting article. Is it possible to use PG-Storm with PostgreSQL in Windows 10 Pro? Alternatively, would it be straightforward to rebuild for Windows, or does it depend on Linux-specific libraries?

      • Why not, is it not possible? I've downloaded PG-Storm and trying to figure out if/how to compile/build it using Visual Studio or any other appropriate toolchain.

        Any help/advise would be kindly accepted! 🙂

  3. Hi Hans-Jürgen, Can you please provide an update to this article with testing of PG-strom 2.0? I'm interested to know if the performance has improved.

    3. PG-Strom v2.0 features highlight PG-Strom v2.0 Release Technical Brief (17-Apr-2018)3 ▌Storage Enhancement  SSD-to-GPU Direct SQL Execution  In-memory Columnar Cache  GPU memory store (gstore_fdw) ▌Advanced SQL Infrastructure  PostgreSQL v9.6/v10 support – CPU GPU Hybrid Parallel  SCAN JOIN GROUP BY combined GPU kernel  Utilization of demand paging of GPU device memory ▌Miscellaneous  PL/CUDA related enhancement  New data type support  Documentation and Packaging

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram