In times of COVID-19, governments contemplate tough measures to identify and trace infected people. These measures include the utilization of mobile phone data to trace down infected individuals and subsequently contacts to curb the epidemic. This article shows how PostGIS’ functions can be used to identify “overlapping” sections of infected and healthy individuals by analysing tracks spatio-temporally.
This time we don’t focus on performance and tuning, rather strive boosting your creativity on PostgreSQL’s spatial extension and its functionalities.
Table of Contents
The article is structured as follows:
Let’s start by defining tables representing tracks and their points.
Table mobile_tracks acts as a helper table storing artificial tracks of individuals, which have been drawn via QGIS.
Table mobile_points stores points, which result from segmenting tracks. Finally, they have been enriched by timestamps, which are used to identify temporal intersections on top of spatial intersections.
The test area is located in Vienna/Austria; therefore I chose MGI/Austria 34 (EPSG as 31286) as appropriate projection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
create table if not exists mobile_tracks ( gid serial not null constraint mobile_tracks_ok primary key, customer_id integer, geom geometry(LineString,31286), infected boolean default false ); create index if not exists mobile_tracks_geom_idx on mobile_tracks using gist (geom); create unique index if not exists mobile_tracks_customer_id_uindex on mobile_tracks (customer_id); |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
create table if not exists mobile_points ( gid serial not null constraint mobile_points_ok primary key, customer_id integer, geom geometry(Point,31286), infected boolean, recorded timestamp ); create index if not exists mobile_points_geom_idx on mobile_points using gist (geom); |
As mentioned in the beginning, for points I decided to generate artificial ones from tracks, I previously digitized via QGIS. Figure 1 shows tracks of infected people in red, healthy individuals are represented as green line strings as foundation.
For our simple example, I made the following assumptions:
To extract individual points for tracks, I utilized PostGIS’s ST_Segmentize function as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
with dumpedPoints as ( select (st_dumppoints(st_segmentize(geom, 1))).geom, ((st_dumppoints(st_segmentize(geom, 1))).path[1]) as path, customer_id, infected, gid from mobile_tracks), aggreg as ( select *, now() + interval '1 second' * row_number() over (partition by customer_id order by path) as tstamp from dumpedPoints) insert into mobile_points(geom, customer_id, infected, recorded) select geom, customer_id, infected, tstamp from aggreg; |
The query creates points every meter, timestamps are subsequently increased by 1000 milliseconds.
Now it’s time to start with our analysis. Did our infected individual meet somebody?
Let’s start with an easy approach and select points of healthy individuals, which have been within 2 meters to infected individuals while respecting a time interval of 2 seconds.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT distinct on (m1.gid) m1.customer_id infectionSourceCust, m2.customer_id infectionTargetCust, m1.gid, m1.recorded, m2.gid, m2.recorded FROM mobile_points m1 inner join mobile_points m2 on st_dwithin(m1.geom, m2.geom, 2) where m1.infected = true and m2.infected = false and m1.gid < m2.gid AND (m2.recorded >= m1.recorded - interval '1seconds' and m2.recorded <= m1.recorded + interval '1seconds') order by m1.gid, st_dwithin(m1.geom, m2.geom, 2) asc |
Figure 3 and 4 show results for the given query by highlighting points of contact for our individuals in blue. As mentioned before, this query covers the most basic solution to identify people who met.
Alternatively, PostGIS’ ST_CPAWithin and ST_ClosestPointOfApproach functions could be used here as well to solve this in a similar manner. To do so, our points must get modelled as trajectories first.
To refine our solution, let’s identify temporal coherent segments of spatial proximity and respective passage times. The goal is to understand, how long people have been close enough together to inherit possible infections more realistically. The full query is attached to the end of the post.
Based on our first query results, for each point we calculate the time interval to its predecessor. Considering our initial assumptions, a non-coherent temporal segment will lead to gaps >1 second.
With our new column indicating the temporal gap between current and previous point, we cluster segments
For each cluster, we subtract min from max timestamp to extract a passage interval.
Finally, for each combination of infected/healthy individual, the segment with the maximum passage time is being extracted and a linestring is generated by utilizing st_makeline (see figure 7).
Even though results can serve as foundation for further analysis, our approach still remains constrained by our initially taken assumptions.
Recently, a very promising PostgreSQL extension named MobilityDB has been published, which offers a lot of functionalities to solve all kinds of spatio-temporal related questions. Take a look at this blogpost to learn more.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
with points as ( SELECT distinct on (m1.gid) m1.customer_id infectionSourceCust, m2.customer_id infectionTargetCust, m1.gid, m1.geom, m1.recorded m1rec, m2.gid, m2.recorded FROM mobile_points m1 inner join mobile_points m2 on st_dwithin(m1.geom, m2.geom, 2) where m1.infected = true and m2.infected = false and m1.gid < m2.gid AND (m2.recorded >= m1.recorded - interval '1seconds' and m2.recorded <= m1.recorded + interval '1seconds') order by m1.gid, m1.recorded), aggregStep1 as ( SELECT *, (m1rec - lag(m1rec, 1) OVER (partition by infectionSourceCust,infectionTargetCust ORDER by m1rec ASC)) as lag from points), aggregStep2 as (SELECT *, SUM(CASE WHEN extract('epoch' from lag) > 1 THEN 1 ELSE 0 END) OVER (partition by infectionSourceCust,infectionTargetCust ORDER BY m1rec ASC) AS legSegment from aggregStep1), aggregStep3 as ( select *, min(m1rec) OVER w minRec, max(m1rec) OVER w maxRec, (max(m1rec) over w) - (min(m1rec) OVER w) recDiff from aggregStep2 window w as (partition by infectionSourceCust,infectionTargetCust,legSegment) ) select distinct on (infectionsourcecust, infectiontargetcust) infectionsourcecust, infectiontargetcust, (extract('epoch' from (recdiff))) passageTime, st_makeline(geom) OVER (partition by infectionSourceCust,infectionTargetCust,legSegment) from aggregStep3 order by infectionsourcecust, infectiontargetcust, passageTime desc |
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply