CYBERTEC PostgreSQL Logo

Beating Uber with a PostgreSQL prototype

05.2016 / Category: / Tags:

During the last days I've read an interesting post, published by Uber. It has caught our attention here at CYBERTEC. Here you can read it by yourself.
The idea behind geo-fencing is to provide information about an area to users. Somebody might want to find a taxi near a certain location, or somebody might simply want to order a Pizza from a nearby restaurant.

According to the blog post Uber has solved this problem using some hand-made GO code. Uber's implementation: 95% <5ms. Another Blog also had an eye on it: https://medium.com/@buckhx/unwinding-uber-s-most-efficient-service-406413c5871d#.vdsg0fhoi

Of course, to a team of PostgreSQL professionals, 5 ms is quite a lot so we tried to do better than that.

Beating Uber's geo-fencing query? Here is how it works!

The first thing we need to compete is some test data. Some nice data can be found here:
https://github.com/buckhx/gofence-profiling/tree/master/nyc

Then the data can be loaded with psql nicely:

The data can already be queried. The following query does what Uber tries to achieve:

However, to be fair:
The sample set is not big enough yet. To increase the amount of data roughly to the size of Uber's database, the following SQL can do the job:

Checking latency

After applying some nice indexing we can already see, how well our query behaves:

The results are very promising. Our version of the geo-fencing query is around 40 times faster than the Uber one. Clearly, Uber should consider using PostgreSQL instead of custom code. Given the fact that we invested around 30 minutes to get this done, even developing the business logic is faster with PostgreSQL.

Foto Copyright: Uber

4 responses to “Beating Uber with a PostgreSQL prototype”

  1. Pretty cool!

    I think one key difference is that not only does Uber have a large amount of data, but the locations of the vehicles update at high frequency (every few seconds) which means that the index will constantly have to be rebuilt. Would this solution still perform as well if the indexes were less static?

    • Another key difference is that space partitioning optimizations (such as what is needed for this kind of search) is much more efficient when data is scattered evenly in space instead of being clustered in small areas, like cities (and even inside cities, where it's probably clustered in a few main streets as well). The search tree depth may very well be 40 times deeper than your random data set.

  2. I've been doing a little work in this area using geohash-36 algorithm in non-spatially aware data stores. Results are looking good.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram