Counting Days in Canada: Using ggmap to track time spent in Canada

TL;DR

I used my timeline from Google Maps along with the ggmap package to count the number of days I have spent in Canada in the last 4 years in order to facilitate my Canadian citizenship application.

⚠️ Disclaimer: n00b: This is the first time I have dabbled with wrangling map data in R (outside of R-Ladies workshops). I am not a geocoder and this is no way relates to what I do for research (aside from wrangling data). In fact, anyone who has spent time with me travelling between any two points can probably attest to the fact that I have absolutely zero geospatial intuition whatsoever. Luckily, that doesn’t matter here. What matters more is that there is a whole industry that works with geographical information systems, and I know virtually nothing about it. Therefore, this information, while functional for this task, is really devoid of any deep knowledge of GIS and the field at large. This is not meant to provide a forray into that arena, merely to demonstrate how I wrangled my own location data to achieve a small goal.

The problem

via GIPHY

I recently became eligible to apply for Canadian citizenship. I am a US citizen, but I have been living in Canada since 2008. I became a permanent resident of Canada in November 2015. I moved back to the US in August 2019. One of the eligibility requirements for applying for Canadian citizenship is, in addition to being a permanent resident for at least 3 years, that you “must have been physically present in Canada for at least 1095 days during the five years before you apply” (Canada Immigration). You are required to list all the trips you took that brought you out of the country. The Canadian immigration website suggests you use a “travel journal” to document your trips out of Canada.

My main challenge is that I travel in and out of Canada all the time. I am the only member of my family in Canada. When I lived in Montreal, I visited my parents and friends in MA regularly. We took frequent trips to Vermont (mainly to score Heady Topper beer). When I moved to London, ON, I travelled even more, sometimes heading down to Detroit for just a day or two. My partner moved overseas in 2015, prompting several international trips during my PhD, not to mention conferences, family meet ups, and vacations.

Did I keep a travel journal in careful preparation of my eventual Canadian citizenship application? No, no I did not.

Having been a PR for just over 4 years, and now having no immediate plans to return to Canada, my window of eligibility for citizenship is closing. I need to determine 1) how many days I have spent in Canada so far to ensure I know when my eligibility will likely expire and 2) document my international trips, which I will need to report on my application.

My cries for help on social media were met with dire responses - there is no easy way around this step. Friends described culling over old emails, Google calendars, passport pages to determine exactly when they were in and out of Canada. You can request an official document of your travels, but this is country specific, takes awhile to process, and, from what I’ve heard, may not even be complete either. Given my frequent back and forths, I needed to figure out a better, faster, more reliable way to document my travels.

The solution: Location tracking

via GIPHY

Every year, Google Maps sends me a cute and slightly creepy summary of all the places I’ve been - a haunting reminder that, since I never turn my location services off on my phone, which is always with me, Google knows exactly where I am at all times, how long I stay there, and how I travel there (estimated based on speed of transport between locations).

While that is a bit jarring and worthy of its own ethical debate, in my case for the above problem, it turned out to be EXTREMELY handy.

Google maps allows you to download your own timeline data in the form of a json file. The ggmap package allows you to parse and plot location data1. In the end, I was able to read in all of my timeline data since before I became a permanent resident2, and use it to 1) figure out how many days I was in Canada, and 2) have a fairly accurate record of the trips I took outside of Canada without having to rely on other sketchy, unreliable forms of documentation (or, god forbid, my MEMORY).

Steps

While looking into possibilities for wrangling my Maps data, I came across this blog post, which gives a nice step-by-step of how to do download your own location data and parse it to look for patterns. I’ve summarized some of those initial steps here, and then used the tidied data to further wrangle for my trip counter.

1. Load libraries

library(jsonlite)
library(ggmap)
library(dplyr)
library(stringr)
library(tidyr)

2. Get a Google API console key

You need this to be able to use the geocoding functions in ggmap, as well as to be able to plot. I learned how to do this from Rebecca Henderson’s R-Ladies workshop, in which she pointed us to this Medium article explaining how and why you need an API. Note that you need to enable billing (i.e., with a credit card), but don’t need to pay to use the service.

Once you have successfully obtained an API console key, you need to enable it in your R session:

ggmap::register_google("your-secret-api-key")

2. Download your location history from your Google Account

You can download your own Google data, including from Maps, via Google takeout. Be sure to only select “Location History.” This can take a little while (it took about 30 minutes for me to download the last 4 years, which is my entire Maps timeline). The final output is a json file called Location History.json. You can also do this from within Google Maps by going to “Your Data in Maps” >> “Download your Maps Data.”

3. Parse the location history file

The jsonlite R package allows you to read in json files into R for further manipulation. I moved my .json output file to my R project folder and ran the following. This took a couple minutes.

file <- "Location History.json"
lh <- read_json(file, simplifyVector = TRUE)

Next you need to extract the location data and convert the time and date into a human-readable format. The output contains more columns of information (including velcocity, activity, and estimated accuracy), which I’ve excluded in the preview.

locations <- lh$locations
locations$time = as.POSIXct(as.numeric(locations$timestampMs)/1000, 
                            origin = "1970-01-01")

locations %>% select(1:3,time) %>% head()
##     timestampMs latitudeE7 longitudeE7                time
## 1 1437674692250  455027169  -735712951 2015-07-23 14:04:52
## 2 1437674752999  455028758  -735716985 2015-07-23 14:05:52
## 3 1437674813904  455028758  -735716985 2015-07-23 14:06:53
## 4 1437675197645  455026308  -735713548 2015-07-23 14:13:17
## 5 1437675258697  455026081  -735713646 2015-07-23 14:14:18
## 6 1437675326946  455026395  -735714586 2015-07-23 14:15:26

You can check your earliest and most recent maps data time stamps using min() and max() functions.

min(locations$time)
## [1] "2015-07-23 14:04:52 EDT"
max(locations$time)
## [1] "2020-01-02 10:32:05 EST"

My location tracking has been on since July 2015. The most recent point reflects the last time I downloaded my Maps data.

4. Tidy the location history

Latitude and longitude are recorded in Google in E7 format. You need to divide the original longitude/latitude values by 10^7 to get standard coordinate formatting.

I also found it helpful to round my final coordinates. The more decimal places you include, the more precise your location identification will be. In my case, all I really cared about was whether I could reliably capture border crossings. Some light Wikipediaing informed me that rounding coordinates to the tens gives you a resolution of 11.1 km, roughly allowing you to distinguish major cities. Zero decimal places, i.e., rounding to the nearest integer, allows for 111.1 km precision, which can unambiguously capture large regions and countries (see also here). Rounding greatly reduces the amount of data you have to work with, which speeds processing up tremendously later on. It’s also simply not necessary, IMHO, at least for this project, to know your location down to the meter.

To prepare for further reduction, I also added a coordinates column (coord).

loc <- loc %>% mutate(
     date = str_sub(time,1,10),
     time = str_sub(time,12,19),
     lon = round((loc$longitudeE7 / 1e7), 0),
     lat = round((loc$latitudeE7 / 1e7), 0),
     coord = paste(lon,lat,sep="_")
) 

I became a PR on November 22, 2015. I’ve filtered out dates prior to this (however, days within the 5 year period prior to obtaining PR status do count as half days, but for now I’m going to omit them).

loc <- locations %>% filter(date >= "2015-11-22")

I grouped the location data by date and coordinates - that is, one row for each unique date-coordinate combination. Since I had rounded my coordinates to 0 decimals earlier, this now gave me one row for approximately every 111 km. Importantly, I was able to collapse over days that I drove around a single city or over shorter distances. This allowed me to speed up further parsing below.

loc <- loc %>% 
  group_by(date,coord) %>% 
  summarize(n=n()) %>% 
  separate(lonlat, c("lon","lat"), sep="_", remove=FALSE) %>%
  mutate(lon = as.numeric(lon),
         lat = as.numeric(lat)) %>%
  ungroup()

To test whether rounding to the nearest integer was sufficient, I made a temporary data frame of 1 day’s worth of location history on a date that I know I crossed from Ontario to New York. This allowed me to confirm that rounding to 0 decimal places was more or less enough to capture what I wanted, knowing that this may mean I don’t capture every single day with 100% accuracy (for example, this runs the risk of days that I spent near a border being counted as crossing a border - a risk I felt was worth the time saved by the data reduction). You could just as easily a month or other specified time span.

tmp_day <- loc %>% 
  filter(date == "2019-10-20") # Look just at Oct 20 2019
  
tmp_day %>% select(1:6)
## # A tibble: 3 x 6
##   date       lonlat   lon   lat     n country
##   <date>     <chr>  <dbl> <dbl> <int> <chr>  
## 1 2019-10-20 -79_43   -79    43   423 USA    
## 2 2019-10-20 -80_43   -80    43   325 Canada 
## 3 2019-10-20 -81_43   -81    43   294 Canada

The n column reflects how many data points were logged for that coordinate on that day.

5. Coordinates 👉 Country: Reverse geocoding

Geocoding allows you to convert addresses into geographic coordinates. Reverse geocoding allows you to do the opposite: convert coordinates to addresses. The ggmap package provides a function revgeocode() to do just this.

I created a small function to use revgeocode() to get the address from my location data. I used the stringr package to extract the country I was in for each observation - the country of the address is always(?) the last element of the list produced by revgeocode(). This function returns either just the country, or the full address retrieved from Google.

get_country <- function(lon,lat, out=c("country","address")){
  address <- suppressMessages(
    revgeocode(
      as.vector(
        c(as.numeric(lon),
          as.numeric(lat)))))
  
  # Split address by commas
  address_split <- str_split(address, pattern = ", ")
  # Country is always the last element
  l <- length(address_split[[1]]) 
  country <- address_split[[1]][l]
  
  # Convert NAs to characters to make life easier later
  if(is.na(country)){country <- "NA"}
  if(is.na(address)){address <- "NA"}
    
  if(out=="country"){
    return(country)
  }else if(out=="address"){
    return(address)
  }
}

I then created another function to determine, for a given data point, whether I was or was not in a given country (e.g., Canada). This function returns 1 if “country” equals the string “Canada”, and 0 if not. This allows me to eventually sum the days spent in Canada.

visited_country <- function(x, country = "Canada"){
  if(x==country){
    visit <- 1
  }else{visit <- 0}
}

Next I used mapply() to iterate over each observation in my location history data, applying my get_country() and visited_country() functions to each row.

# Get country
loc$country <- mapply(get_country, 
                      loc$lon,loc$lat, 
                      out = "country")
# Get address
loc$address <- mapply(get_country,
                      loc$lon,loc$lat,
                      out = "address")

# Log whether I visited Canada
loc$visited <- mapply(visited_country, 
                      loc$countries, 
                      "Canada")

head(loc) # roads were renamed for privacy
## # A tibble: 6 x 8
##   date       lonlat   lon   lat     n country address               visited
##   <date>     <chr>  <dbl> <dbl> <int> <chr>   <chr>                   <dbl>
## 1 2015-12-01 -81_43   -81    43   442 Canada  2111 Swan Rd, Dorche…       1
## 2 2015-12-02 -81_43   -81    43   403 Canada  2111 Swan Rd, Dorche…       1
## 3 2015-12-03 -81_43   -81    43   446 Canada  2111 Swan Rd, Dorche…       1
## 4 2015-12-04 -79_44   -79    44     2 Canada  815 Martin Rd W, Whi…       1
## 5 2015-12-04 -81_43   -81    43   330 Canada  2111 Swan Rd, Dorche…       1
## 6 2015-12-04 -83_42   -83    42    35 Canada  717 McCormick Rd, Ha…       1

Note that I modfied the street names so as not to show actual addresses in this post. Because of the coordinate rounding, the addresses also likely do not reflect my exact location (just a location within 111km).

Finally, I computed how many days I was in Canada by grouping my data by date and summing my visits.

days_in_country <- loc %>%
  group_by(date) %>%
  summarize(n_in_canada = sum(visited),
            countries = paste(country,collapse=", ")) %>%
  ungroup() %>%
  mutate(in_canada = ifelse(n_in_canada > 0,1,0))
sum(days_in_country$in_canada)
## [1] 1101

Looks like, since November 22, 2015, I’ve spent 1101 days in Canada, just over the required 1095! Since my pre-PR days count for some to use as padding (and since I do still go back and forth to Canada periodically), this is good news. I’ll be eligible to apply for another year.

6. Logging Trips

I could also now collapse over countries and days for a log of the trips that I took. I only care about trips outside of Canada, so I’ll filter Canada out first, then set up columns for trip “start” and “end” dates.

trips <- loc %>%
  filter(country!="Canada")%>%
  group_by(date, country) %>%
  summarize(start = min(date),
            end = max(date)) %>%
  ungroup()

head(trips)
## # A tibble: 6 x 4
##   date       country start      end       
##   <date>     <chr>   <date>     <date>    
## 1 2015-12-04 USA     2015-12-04 2015-12-04
## 2 2015-12-05 USA     2015-12-05 2015-12-05
## 3 2015-12-06 USA     2015-12-06 2015-12-06
## 4 2015-12-09 USA     2015-12-09 2015-12-09
## 5 2015-12-10 USA     2015-12-10 2015-12-10
## 6 2015-12-11 USA     2015-12-11 2015-12-11

Next, I “rolled up” the dates of each trip by using the lag() function (thank you to this helpful StackOverflow post because I had no clue what I was doing here). I imagine the lubridate package would also be super handy here, but I didn’t go down that route.

trips <- trips %>%
  mutate(gr = cumsum(start-lag(end, default=1) != 1)) %>%
  group_by(gr, country) %>%
  summarise(FromDate = min(start), 
            ToDate   = max(end),
            dur = as.Date(ToDate) - as.Date(FromDate)) %>%
  ungroup()

This glimpse of the earliest part of my trip log captures a couple short trips to the US, a trip to Isreal, a layover in Belgium, and a return to the US.

head(trips,5)
## # A tibble: 5 x 5
##      gr country FromDate   ToDate     dur   
##   <int> <chr>   <date>     <date>     <drtn>
## 1     1 USA     2015-12-04 2015-12-06 2 days
## 2     2 USA     2015-12-09 2015-12-13 4 days
## 3     3 Israel  2015-12-18 2015-12-22 4 days
## 4     4 Belgium 2015-12-31 2015-12-31 0 days
## 5     4 USA     2015-12-24 2015-12-30 6 days

Some odd findings

While for the most part, these points appear to be confirmed by memory and calendar notes, a few oddities arose. According to my maps data, I visited the following countries:

xtabs(~country, data=loc)
## country
##     Belgium      Canada      France     Iceland   Indonesia      Israel 
##           2        1697          75           4           1           5 
##       Italy      Jordan          NA      Sweden Switzerland    Thailand 
##           1           4          82           1           1           1 
##      Turkey          UK         USA 
##           1           9         780

This breakdown isn’t 100% accurate. I have never actually visited Thailand or Indonesia. The high number of NAs are a bit worrisome too.

Closer inspection reveals that most, but not all, of these blips occured on days I was travelling to or around Israel (where my partner was a post-doc for 2 years). I recall Maps having a hard time localizing us sometimes, so perhaps there is something to do with Google’s satellite signal or access in this region. To be fair, the Google Maps timeline interface doesn’t show these blips, so whatever’s happening is getting parsed out by Google Map’s final algorithms.

At the end of the day, all I care about is whether it accurately logs whether I was in Canada (even for just a single time point) on a given day. This does seem to be accurate. Even if misses a few days (e.g., sometimes my phone registers me as being in the US when I am at the a Canadian border city, and vice versa), that’s still accurate enough for me at this stage.

Plotting (optional)

As much as I wish I’d get bonus points from the Canadian government for being able to provide a dataviz of my time in the country, this is sadly just not how offical governmental procedures work, apparently. Practically speaking, then, plotting my maps is just a little additional dopamine rush for me. I did want to be able to visually confirm that my coordinates made sense while getting this set up, though, so I’ve included some basic code for how to do that here. Be it known that there are many other blogs that have much more interesting dataviz tutorials that harness ggmap and Google Map timelines, however.

Since I was mostly interested in my presence in Canada in the last five years, during which time I’ve mostly been in Ontario, we’ll plot just that region for now. The get_map() function in ggmap allows you to download maps of various locations. Deciding the appropriate “zoom” level takes a bit of trial and error and depends on your goals. In this case, 5 works well for seeing the province of Ontario and some surrounding area.

The following maps clearly show the difference between different degrees of coordinate rounding precision.

on <- ggmap::get_map("Ontario", zoom=5)

library(ggplot2)
# Get map for southwestern Ontario
swon <- get_map("Toronto", zoom=5)

ggmap(swon) + 
  geom_point(data = loc, 
             aes(x = lon, y = lat), 
             alpha = 0.5, color = "red")

Here’s the same map but with coordinates rounded to 2 decimal places instead, for a resolution of 1.11 km (i.e., unambiguously distinguish towns/villages. Big difference!

Guess now I shoud finally get to the actual application…


  1. Rebecca Henderson gave a workshop last year for our R-Ladies group on ggmap and plotting local bike collision data in R. Her presentation and materials were super helpful. See her materials here.

  2. Technically if you have been a Canadian PR for less than five years but were living in Canada, you are also allowed to count those days as well. Days spent in Canada before you became a PR within the 5 year time frame count as 0.5 days a piece, up to 365 days (Candian Immigration). In my case, I’m not worrying too much about these days at this point since I have enough days since becoming a PR.

Related