Annex: Reconstruction of flows from episodic mobility data

General approach


Mobility data are records containing geographic positions of people at different times. A sequence of chronologically ordered position records of the same person is called the trajectory of this person. Trajectories can be constructed from any data containing time stamps, location specifications (such as geographic coordinates), and identifiers of individuals. Examples of such data include geographically referenced Twitter posts and mobile phone use records.

Episodic trajectories [1] are such trajectories where the time steps between consecutive records are mostly large, so that the intermediate positions of the moving objects during these steps cannot be estimated. Trajectories constructed from georeferenced tweets or mobile phone use data are episodic.

Flows are aggregate movements of multiple people between some locations, which are typically geographic areas.

Flow computation

Given a set of trajectories, a division of the underlying territory into places, and a division of the time span of the data into steps, flows are constructed in the following way.

For each pair of places pi and pj and each time step tk, the number of moves from pi to pj is counted. An individual is assumed to have moved from pi to pj at the time tk if there exists such a pair of consecutive position records in the trajectory of this individual where the position of the first (i.e., the earlier) record is within the place pi, the position of the second record is within the place pj, the time stamp of the second record is within the step tk, and the time stamp of the first record is within either the step tk or the step tk-1.

In a case of an episodic trajectory, when the position record in pi has a time stamp in tk-1, it is uncertain whether the move from pi to pj occurred fully during tk-1, or fully during tk, or started in tk-1 and ended in tk, because it cannot be estimated when the individual actually left pi and when he/she arrived to pj. The move from pi to pj may be, in principle, assigned to step tk-1, to step tk, or to both of them. Our rationale for choosing tk is that the individual definitely arrived to place pj by time tk, i.e., not later than tk (more precisely, not later than the end of tk). Hence, the flow from pi to pj for time step tk is the number of moves from pi to pj that occurred by time tk.

The results of the aggregation are time series of flows (i.e., counts of moves) between places for time steps t1, t2, , tk-1, tk, , tN. Each time series refers to some ordered pair of places (pi, pj). Two places pi and pj are said to be linked if there was at least one move from pi to pj during at least one time step. An ordered pair of linked places may be called a link. The set of places and the set of links make together a mobility graph. The places are the nodes of this graph. Unlike general graphs, the nodes of mobility graphs have fixed spatial positions.

Division of a territory

When there is no predefined division of the territory into areas, the territory may be partitioned into Voronoi polygons based on the spatial distribution of characteristic points from the trajectories. For episodic trajectories, all available points are taken. The method is described in detail in paper [2]. The main idea is that the points are organised into spatial clusters that fit into circles with a user-specified maximal radius rmax, and the centres of these clusters are taken as generating seeds for the Voronoi tessellation. Depending on rmax, the territory can be divided into smaller or larger areas. There is an extension of the method that varies the cluster radius between rmax and a smaller value rmin depending on the spatial density of the points in different parts of the territory [1]. The method divides the parts of the territory where the points are dense into smaller areas and the parts where the points are sparse into larger areas.

Time series based on cyclic time

When the time period covered by available mobility data included multiple recurring time cycles (daily, weekly, or yearly), it may be interesting to study the variation of the flows over the time cycles. In particular, when data cover a period of several weeks, it is possible to study the flow variation with regard to the daily and weekly cycles. To do such a study, the time cycle of interest is divided into suitable steps c1, , cZ. For example, the weekly cycle may be divided into hourly steps. The time period of the data is also divided into steps t1, t2, , tN of the same length as c1, , cZ., so that for each data time step tk, there is a corresponding cycle step cm. For example, for the time step [16/06/2015 10:00, 16/06/2015 11:00), the corresponding cycle step is [Tuesday 10:00, Tuesday 11:00).

The data are aggregated into flows by steps t1, t2, , tN as described before. Then, for each link (pi, pj) and each cycle step ci, the mean of the flow values (i.e., move counts) from all corresponding data time steps is computed and taken as the flow value of this link in the cycle step ci.

Preparation of data for the London case study

For the London case study, we used georeferenced Twitter data collected over the period 05/11/2012 till 24/10/2013. The total number of tweets collected for the area of Greater London during this period is more than 21 million, the total number of distinct users is 482,623. Among the users, we identified the likely residents of the Greater London as those users who twitted within this territory in 30 or more different days. We found 40,246 such users, who posted in total more than 15 million tweets. From these tweets, we constructed episodic trajectories.

The territory of the Greater London has been divided into Voronoi polygons of varying sizes using the parameter settings rmax = 2500m and rmin = 150m. As a result, the division is fine in the centre, where the density of the tweets is very high, and coarse on the periphery, where the density is low. In total, we obtained 606 Voronoi polygons (cells).

The time series of the flows between the polygons have been computed for hourly steps of the weekly time cycle; hence, we obtained time series of the length 168 = 7 days of the week * 24 hours of the day.

Preparation of data for the Abidjan case study

The original data used for flow construction are a subset of the data provided by the telecommunication company Orange for the so called D4D (Data for Development) Challenge announced in 2013. The full data set consists of the mobile phone use records of 50,000 customers in Ivory Coast over the time period from December 1, 2011 to April 28, 2012. For the privacy preservation purposes, the user identifiers in the records were changed every two weeks. Therefore, for reconstructing trajectories and flows, only data from one of the bi-weekly steps are suitable. Moreover, the data set has many time gaps, i.e., large time steps for which no data records are available.

We limited the study to the area around Abidjan, the largest city of Ivory Coast, since there was not much cross-country movement. We have chosen the be-weekly step from April 09 to April 22, 2012 because more data are available for this step than for the others.

The positions in the data are specified by identifiers of the antennas of the mobile phone network, the coordinates of which have been given. There are 386 antennas over the chosen territory. We represented the areas covered by the antennas by Voronoi polygons built using the antenna positions as the generating seeds. We have computed the flows between the places for hourly steps within the period 09/04/2012 – 22/04/2012. The lengths of the resulting time series is 332 = 14 days * 24 hours of a day. Unlike in the London case study, we did not transform the data to the weekly cycle. The mobility graph consists of the 386 nodes corresponding to the antenna areas (cells) and 55,832 links between them.


[1]   Andrienko G, Andrienko N, Bak P, Keim D, Wrobel S (2013) Visual analytics of movement. Springer, Berlin

[2]   Andrienko N., Andrienko G. (2011) Spatial generalization and aggregation of massive movement data, IEEE Trans. Visualization and Computer Graphics, 17(2): 205-219


Technische Universität Darmstadt

Interactive Graphics Systems Group

Fraunhoferstr. 5
64283 Darmstadt

Tel. +49 6151 155 679

icon email office@gris.tu-

A A A | Print Print | Legal note Legal Notes | Contact Contact
to topto top