BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment

Validation

Compliance with the Common Data Model specification

We check whether the imported dataset complies with the data model specification (https://doi.org/10.5281/zenodo.6913045).

To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.

Validation rule Name rule Items Passes Fails Percentage of fails Number of NAs Percentage of NAs Error Warning
is.na(age_nm) | age_nm - 5 >= -1e-08 & age_nm - 115 <= 1e-08 V01 10000 10000 0 0% 0 0% FALSE FALSE
is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) V02 10000 10000 0 0% 0 0% FALSE FALSE
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V03 10000 10000 0 0% 0 0% FALSE FALSE
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V04 10000 10000 0 0% 0 0% FALSE FALSE
is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 V05 10000 10000 0 0% 0 0% FALSE FALSE
fully_vaccinated_bl == FALSE | fully_vaccinated_bl == TRUE & !is.na(vaccination_schedule_cd) V06 10000 10000 0 0% 0 0% FALSE FALSE
is.na(test_type_cd) | test_type_cd %vin% c(“PCR”, “AG”, “other”) V07 10000 10000 0 0% 0 0% FALSE FALSE
is.na(variant_cd) | variant_cd %vin% c(“alpha”, “beta”, “gamma”, “delta”, “omicron”, “epsilon”, “zeta”, “eta”, “theta”, “iota”, “kappa”, “lambda”, “mu”) V08 10000 10000 0 0% 0 0% FALSE FALSE
is.na(pregnancy_bl) | pregnancy_bl == FALSE | (pregnancy_bl == TRUE & sex_cd == “2” & age_nm - 12 >= -1e-08 & age_nm - 55 <= 1e-08) V09 10000 9269 731 7.31% 0 0% FALSE FALSE
is.na(essential_worker_bl) | essential_worker_bl == FALSE | (essential_worker_bl == TRUE & age_nm - 16 >= -1e-08 & age_nm - 70 <= 1e-08) V10 10000 10000 0 0% 0 0% FALSE FALSE
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) V11 10000 10000 0 0% 0 0% FALSE FALSE
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) V12 10000 10000 0 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(confirmed_case_dt) | !is.na(previous_infection_dt) & !is.na(confirmed_case_dt) & (previous_infection_dt < confirmed_case_dt) V13 10000 10000 0 0% 0 0% FALSE FALSE
is.na(confirmed_case_dt) | is.na(exitus_dt) | !is.na(confirmed_case_dt) & !is.na(exitus_dt) & (confirmed_case_dt <= exitus_dt) V14 10000 10000 0 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(exitus_dt) | !is.na(previous_infection_dt) & !is.na(exitus_dt) & (previous_infection_dt <= exitus_dt) V15 10000 10000 0 0% 0 0% FALSE FALSE
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt V16 10000 10000 0 0% 0 0% FALSE FALSE
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) V17 10000 9827 173 1.73% 0 0% FALSE FALSE
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V18 10000 10000 0 0% 0 0% FALSE FALSE
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V19 10000 10000 0 0% 0 0% FALSE FALSE
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V20 10000 10000 0 0% 0 0% FALSE FALSE
(dose_1_brand_cd == “JJ” & !is.na(dose_1_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (dose_1_brand_cd != “JJ” & !is.na(dose_2_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (is.na(dose_1_brand_cd) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) | (dose_1_brand_cd != “JJ” & is.na(dose_2_dt) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) V21 10000 10000 0 0% 0 0% FALSE FALSE

The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’

Non-compliance with the Common Data Model specification

The set of validation rules are considered ‘essential’ not to be violated to be considered for the subsequent analysis. A logical variable flag_violation_val is created in the cohort_data table in the BY-COVID-WP5-BaselineUseCase-VE.duckdb database and set to TRUE when at least one of the validation rules in the pre-specified set is violated (otherwise this variable is set to FALSE).

flag_violating_val==TRUE flag_violating_val==FALSE
890 9110