library(tidyverse)
library(sf)
library(here)
library(DT)
<- here("data/raw_data/v2/estudios_basicos/por-municipios/pernoctaciones/ficheros-diarios/2022-01/20220101_Pernoctaciones_municipios.csv.gz")
overnight_stays_file <- here("data/raw_data/v2/zonificacion/zonificacion_municipios/zonificacion_municipios.shp")
municipal_boundaries_data_file <- here("data/raw_data/v2/zonificacion/zonificacion_distritos/zonificacion_distritos.shp") district_boundaries_data_file
District zone IDs in municipal overnight stays data
Status: ⚠️ active
Importance: 2 - medium
Note: According to the official methodology this is by design. However, still needs to be addressed in workflows within the {spanishoddata}
R package.
Summary: The zone IDs in the municipal overnight stays data are not consistent with the zone IDs in the municipal boundaries data. The zone IDs in the municipal overnight stays data for the residential location column zona_residencia
are a combination of district and municipal IDs.
Expected Results: The zone IDs in the municipal overnight stays data should be consistent with the zone IDs in the municipal boundaries data (unless the use of district IDs in the residential location column zona_residencia
is intentional).
Steps to Reproduce
- Load Data
Load libraries and define data files.
Load the data.
<- readr::read_delim(overnight_stays_file, delim = "|", show_col_types = FALSE, name_repair = "unique_quiet")
overnight_stays # municipal_boundaries <- read_sf(municipal_boundaries_data_file) |> filter(!grepl("PT|FR|externo", ID))
# district_boundaries <- read_sf(district_boundaries_data_file) |> filter(!grepl("PT|FR|externo", ID))
<- read_sf(municipal_boundaries_data_file)
municipal_boundaries <- read_sf(district_boundaries_data_file) district_boundaries
glimpse(overnight_stays)
Rows: 515,748
Columns: 4
$ fecha <dbl> 20220101, 20220101, 20220101, 20220101, 20220101, 20…
$ zona_residencia <chr> "01001", "01001", "01001", "01001", "01001", "01001"…
$ zona_pernoctacion <chr> "01001", "01017_AM", "01047_AM", "01051", "01059", "…
$ personas <dbl> 2447.613, 9.000, 2.514, 5.780, 181.970, 3.266, 3.266…
glimpse(municipal_boundaries)
Rows: 2,735
Columns: 2
$ ID <chr> "01001", "01002", "01004_AM", "01009_AM", "01010", "01017_AM"…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((537856.7 47..., MULTIPOLYGON (((…
glimpse(district_boundaries)
Rows: 3,909
Columns: 2
$ ID <chr> "01001", "01002", "01004_AM", "01009_AM", "01010", "01017_AM"…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((538090.2 47..., MULTIPOLYGON (((…
Results
- Not all residence location IDs in the municipal level overnight stays dataset can be found in the municipal boundaries dataset.
Not all residence location (zona_residencia
) IDs in the municipal level overnight stays dataset can be found in the municipal boundaries dataset. Residence locations in municipal level overnight stays uses district IDs in addition to municipal IDs. That is, just using the municipal boundaries dataset is not enough to match all the residence locations in the overnight stays dataset.
sum(!unique(overnight_stays$zona_residencia) %in% unique(municipal_boundaries$ID))
[1] 1596
Meanwhile, all residence location (zona_residencia
) IDs in the municipal level overnight stays can be found in the district boundaries dataset.
sum(!unique(overnight_stays$zona_residencia) %in% unique(district_boundaries$ID))
[1] 0
Only municipal IDs are used in the zona_pernoctacion
column of the overnight stays dataset.
sum(!unique(overnight_stays$zona_pernoctacion) %in% unique(municipal_boundaries$ID))
[1] 0
- Sample of rows with residence location IDs coming from district breakdown in the municipal boundaries overnight stays
::datatable(overnight_stays |> filter(!zona_residencia %in% unique(municipal_boundaries$ID)) |> sample_n(100)) DT
Links to the original files
source(here("R/901-download-helpers.R"))
<- load_latest_v2_xml()
files
# Filter relevant files
<- files |>
relevant_files filter(basename(local_path) %in% basename(c(
overnight_stays_file,
municipal_boundaries_data_file,
district_boundaries_data_file
)) )
# Create HTML links
<- relevant_files |>
relevant_files mutate(target_url = paste0("<a href='", target_url, "' target='_blank'>", target_url, "</a>"))
# Render the DT table with links
datatable(relevant_files, escape = FALSE, options = list(pageLength = 5))