Ever wondered how to import OSM (OpenStreetMap) data into PostGIS [1] for the purpose of visualization and further analytics? Here are the basic steps to do so.
There are a bunch of tools on the market— osm2pgsql; imposm; ogr2org; just to mention some of those. In this article I will focus on osm2pgsql [2] .
Table of Contents
Let’s start with the software prerequisites. PostGIS comes as a PostgreSQL database extension, which must be installed in addition to the core database. Up till now, the latest PostGIS version is 3, which was released some days ago. For the current tasks I utilized PostGIS 2.5 on top of PostgreSQL 11.
This brings me to the basic requirements for the import – PostgreSQL >= 9.4 and PostGIS 2.2 are required, even though I recommend installing PostGIS >=2.5 on your database; it’s supported from 9.4 upwards. Please consult PostGIS’ overall compatibility and support matrix [3] to find a matching pair of components.
Let’s start by setting up osm2pgsql on the OS of your choice – I stick to Ubuntu 18.04.04 Bionic Beaver and compiled osm2gsql from source to get the latest updates.
sudo apt-get install make cmake g++ libboost-dev libboost-system-dev
libboost-filesystem-dev libexpat1-dev zlib1g-dev
libbz2-dev libpq-dev libproj-dev lua5.2 liblua5.2-dev
git clone https://github.com/openstreetmap/osm2pgsql.git
mkdir build && cd build
cmake ..
make
sudo make install
If everything went fine, I suggest checking the resulting binary and its release by executing
./osm2pgsql-version
1 2 3 4 5 6 |
florian@fnubuntu:~$ osm2pgsql –version osm2pgsql version 1.0.0 (64 bit id space) Compiled using the following library versions: Libosmium 2.15.2 Lua 5.2.4 |
In the world of OSM, data acquisition is a topic of its own, and worth writing a separate post discussing different acquisition strategies depending on business needs, spatial extent and update frequency. I won’t get into details here, instead, I’ll just grab my osm data for my preferred area directly from Geofabrik, a company offering data extracts and related daily updates for various regions of the world. This can be very handy when you are just interested in a subregion and therefore don’t want to deal with splitting the whole planet osm depending on your area of interest - even though osm2pgsql offers the possibility to hand over a bounding box as a spatial mask.
As a side note – osm data’s features are delivered as lon/lat by default.
So let’s get your hands dirty and fetch a pbf of your preferred area from Geofabrik’s download servers [4] [5]. For a quick start, I recommend downloading a dataset covering a small area:
wget https://download.geofabrik.de/europe/iceland-latest.osm.pbf
Optionally utilize osmium to check the pbf file by reading its metadata afterwards.
./osm2pgsql-version
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
florian@fn-gis-00:~$ osmium fileinfo ~/osmdata/iceland-latest.osm.pbf file: Name: /home/florian/osmdata/iceland-latest.osm.pbf Format: PBF Compression: none Size: 40133729 Header: Bouncing boxes: (-25.7408,62.8455,-12.4171,67.5008) With history: no Options: generator=osmium/1.8.0 osmosis_replication_base_url=http://download.geofabrik.de/europe/iceland-updates osmosis_replication_sequence_number=2401 osmosis_replication_timestamp=2019-10-14T20:19:02Z pbf_dense_nodes=true timestamp=2019-10-14T20:19:02Z |
Before finally importing the osm into PostGIS, we have to set up a database enabling the PostGIS extension. As easy as it sounds – connect to your database with your preferred database client or pgsql, and enable the extension by executing
create extension postgis
Subsequently, execute
select POSTGIS_VERSION()
to validate the PostGIS installation within your database.
osm2pgsql is a powerful tool to import osmdata into PostGIS offering various parameters to tune. It’s worthwhile to mention the existence of the default-style parameter [6], which defines how osm data is parsed and finally represented in the database. The diagram below shows the common database model generated by osm2pgsql using the default style.
It’s hard to give a recommendation on how this style should be adopted, as this heavily depends on its application. As a rule of thumb, the default style is a good starting point for spatial analysis, visualizations and can even be fed with spatial services, since this layout is supported by various solutions (e.g. Mapnik rendering engine).
I will start off with some basic import routines and then move to more advanced ones. To speed up the process in general, I advise you to define the number of processes and cache in MB to use. Even if this blogpost is not intended as a performance report, I attached some execution times and further numbers for the given parametrized commands to better understand the impact the mentioned parameters have.
The imports were performed on a virtualized Ubuntu 18.04 (KVM) machine equipped with 24 cores (out of 32 logical cores provided by an AMD Ryzen Threadripper 2950X), 24GB RAM, and a dedicated 2TB NVMe SSD (Samsung 970 EVO).
Before going into details, internalize the following main parameters:
-U for database user, -W to prompt for the database password, -d refers to the database and finally -H defines the host. The database schema is not exposed as a parameter and therefore must be adjusted via the search path.
Default import of pbf: existing tables will be overwritten. Features are projected to WebMercator (SRID 3857) by default.
osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 --number-processes 24 -C 20480 iceland-latest.osm.pbf
The command was executed in ~11 seconds. The table below shows generated objects and cardinalities.
table_schema | table_name | row_estimate | total | index | table |
public | planet_osm_line | 142681 | 97 MB | 13 MB | 67 MB |
public | planet_osm_point | 145017 | 19 MB | 7240 kB | 12 MB |
public | planet_osm_polygon | 176204 | 110 MB | 17 MB | 66 MB |
public | planet_osm_roads | 8824 | 14 MB | 696 kB | 5776 kB |
Parameter s (“slim”) forces the tool to store temporary node information in the database instead of in the memory. It is an optional parameter intended to enable huge imports and avoid out-of-memory exceptions. The parameter is mandatory if you want to enable incremental updates instead of full ones. The parameter can be complemented with --flat-nodes
to store this information outside the database as a file.
The command was executed in ~37 seconds. The table below shows generated objects and cardinalities.
table_schema | table_name | row_estimate | total | index | table |
public | planet_osm_line | 142681 | 102 MB | 18 MB | 67 MB |
public | planet_osm_nodes | 6100392 | 388 MB | 131 MB | 258 MB |
public | planet_osm_point | 145017 | 23 MB | 11 MB | 12 MB |
public | planet_osm_polygon | 176204 | 117 MB | 24 MB | 66 MB |
public | planet_osm_rels | 9141 | 9824 kB | 4872 kB | 4736 kB |
public | planet_osm_roads | 8824 | 15 MB | 1000 kB | 5776 kB |
public | planet_osm_ways | 325545 | 399 MB | 306 MB | 91 MB |
By default, tags referenced by a column are exposed as separate columns. Parameter hstore forces the tool to store unreferenced tags in a separate hstore column.
Note: Database extension hstore must be installed beforehand.
osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s --hstore --number-processes 24 -C 20480 iceland-latest.osm.pbf
For the sake of completeness
The command was executed in ~35 seconds. The table below shows generated objects and cardinalities.
table_schema | table_name | row_estimate | total | index | table |
public | planet_osm_line | 142875 | 106 MB | 18 MB | 71 MB |
public | planet_osm_nodes | 6100560 | 388 MB | 131 MB | 258 MB |
public | planet_osm_point | 151041 | 27 MB | 12 MB | 15 MB |
public | planet_osm_polygon | 176342 | 122 MB | 24 MB | 72 MB |
public | planet_osm_rels | 9141 | 9824 kB | 4872 kB | 4736 kB |
public | planet_osm_roads | 8824 | 16 MB | 1000 kB | 6240 kB |
public | planet_osm_ways | 325545 | 399 MB | 306 MB | 91 MB |
By default, objects containing multiple disjoint geometries are stored as separate features within the database. Think of Vienna and its districts, which could be represented as 23 individual polygons or one multi-polygon. Parameter -G forces the tool to store geometries belonging to the same object as multi-polygon.
osm2pgsql -U postgres -W -d osmDatabase -H 192.168.0.195 -s -G --number-processes 24 -C 20480 iceland-latest.osm.pbf
The impact emerges most clearly during spatial operations, since the spatial index utilizes the feature to its fullextent, in order to decide which features must be considered.
The command was executed in ~36 seconds. The table below shows generated objects and cardinalities.
table_schema | table_name | row_estimate | total | index | table |
public | planet_osm_line | 142681 | 102 MB | 18 MB | 67 MB |
public | planet_osm_nodes | 6100392 | 388 MB | 131 MB | 258 MB |
public | planet_osm_point | 145017 | 23 MB | 11 MB | 12 MB |
public | planet_osm_polygon | 174459 | 117 MB | 24 MB | 65 MB |
public | planet_osm_rels | 9141 | 9824 kB | 4872 kB | 4736 kB |
public | planet_osm_roads | 8824 | 15 MB | 1000 kB | 5776 kB |
public | planet_osm_ways | 325545 | 399 MB | 306 MB | 91 MB |
The table highlights the influence of osm2pgsql parameters on execution time, generated objects, cardinalities and subsequently sizing. In addition, it’s worth it to understand the impact of parameters like multi-geometry, which forces the tool to create multi-geometry features instead of single-geometry features. Preferring one over the other might lead to performance issues, especially when executing spatial operators (as those normally take advantage of the extent of the features).
The next posts will complement this post by inspecting and visualizing our import results and subsequently dealing with osm updates to stay up to date.
[1] | „PostGIS Reference,“ [Online]. Available: https://postgis.net/. |
[2] | „osm2pgsql GitHub Repository,“ [Online]. Available: https://github.com/openstreetmap/osm2pgsql. |
[3] | „PostGIS Compatiblity and Support,“ [Online]. Available: https://trac.osgeo.org/postgis/wiki/UsersWikiPostgreSQLPostGIS. |
[4] | „Geofabrik Download Server,“ [Online]. Available: https://download.geofabrik.de/. |
[5] | „PBF Format,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/PBF_Format. |
[6] | „Default Style - osm2pgsql,“ [Online]. Available: https://wiki.openstreetmap.org/wiki/Osm2pgsql/schema. |
[7] | „QGIS Osm styles,“ [Online]. Available: https://github.com/yannos/Beautiful_OSM_in_QGIS. |
[8] | „osm2pgsql Database schema,“ [Online]. Available: https://wiki.openstreetmap.org/w/images/b/bf/UMLclassOf-osm2pgsql-schema.png. |
Generated objects and cardinalities from statistics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECT *, pg_size_pretty(total_bytes) AS total , pg_size_pretty(index_bytes) AS INDEX , pg_size_pretty(table_bytes) AS TABLE FROM ( SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM ( SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME , c.reltuples AS row_estimate , pg_total_relation_size(c.oid) AS total_bytes , pg_indexes_size(c.oid) AS index_bytes , pg_total_relation_size(reltoastrelid) AS toast_bytes FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' ) a ) a where a.table_schema ='public'; |
Generated objects and cardinalities with count
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply