Municipal population CSV missing population counts

tabular data
missing data
importance: low
Author
Published

July 16, 2024

Modified

July 16, 2024


Status: ⚠️ active

Importance: 1 - low

Summary: The dataset contains missing population counts for 49 municipalities in the CSV file poblacion_municipios.csv in the zonificacion_municipios folder, as well as in the poblacion.csv file in the zonificacion folder.

Expected Results: The population data for all municipalities should be available in the CSV file poblacion_municipios.csv in the zonificacion_municipios folder and/or in the poblacion.csv file in the zonificacion folder.

Steps to Reproduce

  1. Load Data

Load libraries and define data files.

library(tidyverse)
library(sf)
library(here)
library(DT)


municipal_boundaries_data_file <- here("data/raw_data/v2/zonificacion/zonificacion_municipios/zonificacion_municipios.shp")
municipality_names_file <- here("data/raw_data/v2/zonificacion/zonificacion_municipios/nombres_municipios.csv")
municipality_population_file <- here("data/raw_data/v2/zonificacion/zonificacion_municipios/poblacion_municipios.csv")
all_population_file <- here("data/raw_data/v2/zonificacion/poblacion.csv")

Load the data and join the municipality names to the boundaries, as well as population from all population file and municipality population file.

municipality_boundaries <- read_sf(municipal_boundaries_data_file)
municipality_boundaries_spain_only <- municipality_boundaries |> 
    filter(! grepl("FR|PT|externo", ID) )

municipality_names <- read_delim(municipality_names_file,
    delim = "|", show_col_types = FALSE, name_repair = "unique_quiet")

municipality_population <- read_delim(municipality_population_file, col_names = c("ID", "population"),
    delim = "|", show_col_types = FALSE, name_repair = "unique_quiet")

all_population <- read_delim(all_population_file,
    delim = "|", show_col_types = FALSE, name_repair = "unique_quiet")

municipality_boundaries_spain_only <- municipality_boundaries_spain_only |>
    left_join(municipality_names |> select(ID, name), by = c("ID")) |> 
    left_join(municipality_population, by = c("ID")) |> 
    left_join(all_population |>
                  group_by(municipio) |>
                  summarise(population_all = sum(poblacion, na.rm = TRUE), .groups = "drop") |> 
                  rename(ID = municipio),
              by = c("ID"))

Results

  1. Missing population (loaded from the poblacion_municipios.csv in zonificacion_municipios folder)
municipality_boundaries_spain_only |>
    filter(is.na(population)) |>
    nrow()
[1] 49
  1. Population data for municipalities also unavailable in the poblacion.csv in the zonificacion

There are no population counts for these districts in the poblacion.csv file either.

municipality_boundaries_spain_only |>
    filter(!is.na(population_all)) |>
    filter(population_all == 0) |> 
    nrow()
[1] 49
  1. Names of Municipalities with missing population data
DT::datatable(municipality_boundaries_spain_only |>
                  st_drop_geometry() |> 
                  filter(is.na(population))
                  )