To assess the integrity of the panel data, we assess the attrition that characterizes it. That is, for each survey reiteration, we compute the percentage of households identified in the previous round that are still present in the subsequent round. Some adjustment must be made as households’ identification numbering system between 1995 and 1996. We also have to remove 2005 survey in Marovoay from the analysis, as it was a specific tracking survey aimed at identifying individuals from households that could not be re-interviewed in previous years in one of the observatory sites (Vaillant 2013). This produces the following result:
library(tidyverse) # A series of packages for data manipulationlibrary(haven) # Required for reading STATA files (.dta)library(labelled) # To work with labelled data from STATAlibrary(readxl) # Read data frames to Excel format# Obtain all years from the directory structureros_data_loc <-"data/dta_format/"years <-list.dirs(ros_data_loc, recursive =FALSE, full.names =FALSE)# Add observatory approximate locationlocations <-tibble(code =c(1, 2, 3, 4, 12, 13, 15, 16, 21, 22, 23, 24, 31, 25, 41, 42, 43, 51, 44, 45, 61, 17, 18, 19, 71, 52),name =c("Antalaha", "Antsirabe", "Marovoay", "Toliara coastal", "Antsohihy", "Tsiroanomandidy", "Farafangana", "Ambovombe", "Alaotra", "Manjakandriana", "Toliara North", "Fenerive East", "Bekily", "Mahanoro", "Itasy", "Menabe-Belo", "Fianarantsoa", "Tsivory", "Morondava", "Manandriana", "Tanandava", "Ihosy", "Ambohimahasoa", "Manakara", "Tolanaro", "Menabe North-East"),latitude =c(-14.8833, -19.8659, -16.1000, -23.7574, -14.8796, -18.7713, -22.8167, -25.1667, -17.8319, -18.9167, -23.2941, -17.3500, -24.6900, -19.9000, -19.1686, -19.6975, -21.4527, -24.4667, -20.2833, -20.2333, -22.5711, -22.4000, -20.7145, -22.1333, -25.0381, -20.5486),longitude =c(50.2833, 47.0333, 46.6333, 43.6770, 47.9875, 46.0546, 47.8333, 46.0833, 48.4167, 47.8000, 43.7761, 49.4167, 45.1700, 48.8000,46.7354, 44.5419, 47.0857, 45.4667, 44.2833, 47.3833, 45.0439, 46.1167, 47.0389, 48.0167, 46.9562, 47.1597))# Sort locations by latitude to generate sequence numberslocations <- locations %>%arrange(desc(latitude)) %>%mutate(seq_num =1:n())# Function to read and process each fileread_and_process <-function(year) { file_path <-file.path(ros_data_loc, as.character(year), "res_deb.dta") data <-read_dta(file_path) %>%select(j0, j5) %>%mutate(year = year)return(data)}# Use map to read and process files, then combine with bind_rowsconsolidated_data <-map_dfr(years, read_and_process) %>%mutate(year =as.numeric(year))# NB : j5 codes have been modified in 1996# so we need to replace the ones from 1995hh_96 <-read_dta(paste0(ros_data_loc, "1996/res_deb.dta")) %>%select(j0, year, j5_96 = j5, j_1995, j12b) %>%filter(j_1995 ==1) %>%select(-j_1995) %>%mutate(year =1995) %>%distinct(j12b, .keep_all =TRUE)consolidated_data <- consolidated_data %>%left_join(hh_96, by =c("j0", "year", "j5"="j12b")) %>%mutate(j5 =ifelse(year ==1995&!, j5_96, j5)) %>%select(j0, j5, year)# We need also to discard the 2004 survey in Marovoay that is very particular# cf. Vaillant 2013.consolidated_data <- consolidated_data %>%filter(!(j0 ==3& year ==2005))# Remove duplicates and create the hh_all tablehh_all <- consolidated_data %>%distinct(j0, j5, year, .keep_all =TRUE) %>%arrange(j0, j5)hh_grouped <- hh_all %>%group_by(j0, year) %>%summarise(j5_list =list(j5), .groups ='drop') %>%# Count the number of j5 in j5_listmutate(j5_count =map_int(j5_list, length)) %>%# Create a column to identify the most recent previous year with data for the same observatorygroup_by(j0) %>%mutate(previous_year =lag(year)) %>%ungroup()# Self-join to create previous_year_j5_listattrition_rates_detail <- hh_grouped %>%left_join(hh_grouped %>%select(j0, year, previous_year_j5_list = j5_list,j5_count_previous_year = j5_count), by =c("j0", "previous_year"="year")) %>%mutate(repeated_j5 =map_int(seq_along(j5_list), ~length(intersect(j5_list[[.]], previous_year_j5_list[[.]]))),attrition_rate = (j5_count_previous_year - repeated_j5) / j5_count_previous_year *100)# Pivot the data to have years as columns and observatory numbers as rowsattrition_rates <- attrition_rates_detail %>%select(j0, year, attrition_rate) %>%left_join(locations %>%mutate(observatory_with_num =paste0(seq_num, ". ", name),observatory_with_num =fct_reorder(observatory_with_num, latitude)) %>%select(code, name, observatory_with_num), by =c("j0"="code")) %>%drop_na(name)average_wo_outliers <- attrition_rates %>%filter(attrition_rate <75) %>%summarise(mean =mean(attrition_rate))average_wo_outliers <-round(average_wo_outliers$mean, 1)compound_avg_10y <-round((1-(1-(average_wo_outliers/100))^10)*100) attrition_plot <-ggplot(attrition_rates, aes(x = year, y = observatory_with_num, fill = attrition_rate)) +geom_tile() +# Create the heatmap tilesgeom_text(aes(label =ifelse(, "", round(attrition_rate))), color ="black", size =2.5) +scale_fill_gradient2(low ="darkgreen", mid ="yellow", high ="red", midpoint =30, na.value ="grey", name ="Attrition Rate (%)") +labs(x ="Year",y ="Observatory (j0)") +theme_minimal() +labs(y =NULL, x =NULL) +theme(axis.text.y =element_text(size =8)) ggsave("output/figure_3.pdf", plot = attrition_plot, width =7, height =4, dpi =300)ggsave("output/figure_3.png", plot = attrition_plot, width =7, height =4, dpi =300)attrition_plot
Annual attrition rates superior to 75% for a specific observatory are likely to be induced by new reshuffles of the household identification codes and we hope to be able to solve such issue later on. If we discard these outliers (attrition rates over 75%), we have an average attrition year of {r} print(average_wo_outliers)%, which is very high, leading to a compound attrition rate of {r} print(compound_avg_10y)% over 10 years. Attrition on ROS data has been further studied in focused publications (Gubert and Robilliard 2008; Vaillant 2013).
Gubert, Flore, and Anne-Sophie Robilliard. 2008. “Risk and Schooling Decisions in Rural Madagascar: A Panel Data-Analysis.”Journal of African Economies 17 (2): 207–38.
Vaillant, Julia. 2013. “Attrition and Follow-Up Rules in Panel Surveys: Insights from a Tracking Experience in Madagascar.”Review of Income and Wealth 59 (3): 509–38.