Compliance with the Common Data Model specification
We check whether the imported dataset complies with the data model specification (https://doi.org/10.5281/zenodo.6913045).
To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.
Validation rule | Name rule | Items | Passes | Fails | Percentage of fails | Number of NAs | Percentage of NAs | Error | Warning |
---|---|---|---|---|---|---|---|---|---|
is.na(age_nm) | age_nm - 5 >= -1e-08 & age_nm - 115 <= 1e-08 | V01 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) | V02 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V03 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V04 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 | V05 | 10000 | 10000 |
|
0% | 0 |
|
|
|
fully_vaccinated_bl == FALSE | fully_vaccinated_bl == TRUE & !is.na(vaccination_schedule_cd) | V06 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(test_type_cd) | test_type_cd %vin% c(“PCR”, “AG”, “other”) | V07 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(variant_cd) | variant_cd %vin% c(“alpha”, “beta”, “gamma”, “delta”, “omicron”, “epsilon”, “zeta”, “eta”, “theta”, “iota”, “kappa”, “lambda”, “mu”) | V08 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(pregnancy_bl) | pregnancy_bl == FALSE | (pregnancy_bl == TRUE & sex_cd == “2” & age_nm - 12 >= -1e-08 & age_nm - 55 <= 1e-08) | V09 | 10000 | 9269 |
|
7.31% | 0 |
|
|
|
is.na(essential_worker_bl) | essential_worker_bl == FALSE | (essential_worker_bl == TRUE & age_nm - 16 >= -1e-08 & age_nm - 70 <= 1e-08) | V10 | 10000 | 10000 |
|
0% | 0 |
|
|
|
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) | V11 | 10000 | 10000 |
|
0% | 0 |
|
|
|
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) | V12 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(previous_infection_dt) | is.na(confirmed_case_dt) | !is.na(previous_infection_dt) & !is.na(confirmed_case_dt) & (previous_infection_dt < confirmed_case_dt) | V13 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(confirmed_case_dt) | is.na(exitus_dt) | !is.na(confirmed_case_dt) & !is.na(exitus_dt) & (confirmed_case_dt <= exitus_dt) | V14 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(previous_infection_dt) | is.na(exitus_dt) | !is.na(previous_infection_dt) & !is.na(exitus_dt) & (previous_infection_dt <= exitus_dt) | V15 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt | V16 | 10000 | 10000 |
|
0% | 0 |
|
|
|
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) | V17 | 10000 | 9827 |
|
1.73% | 0 |
|
|
|
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V18 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V19 | 10000 | 10000 |
|
0% | 0 |
|
|
|
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V20 | 10000 | 10000 |
|
0% | 0 |
|
|
|
(dose_1_brand_cd == “JJ” & !is.na(dose_1_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (dose_1_brand_cd != “JJ” & !is.na(dose_2_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (is.na(dose_1_brand_cd) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) | (dose_1_brand_cd != “JJ” & is.na(dose_2_dt) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) | V21 | 10000 | 10000 |
|
0% | 0 |
|
|
|
The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’
Non-compliance with the Common Data Model specification
The set of validation rules are considered ‘essential’ not to be violated to be considered for the subsequent analysis. A logical variable flag_violation_val is created in the cohort_data table in the BY-COVID-WP5-BaselineUseCase-VE.duckdb database and set to TRUE when at least one of the validation rules in the pre-specified set is violated (otherwise this variable is set to FALSE).
flag_violating_val==TRUE | flag_violating_val==FALSE |
---|---|
890 | 9110 |