Overview

Dataset statistics

Number of variables42
Number of observations650000
Missing cells3825516
Missing cells (%)14.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.3 GiB
Average record size in memory2.1 KiB

Variable types

Categorical10
Numeric2
Boolean22
DateTime7
Unsupported1

Alerts

country_cd has constant value "ESP" Constant
person_id has a high cardinality: 650000 distinct values High cardinality
fully_vaccinated_bl is highly correlated with vaccination_schedule_cd and 1 other fieldsHigh correlation
pregnancy_bl is highly correlated with country_cdHigh correlation
dose_2_brand_cd is highly correlated with vaccination_schedule_cd and 2 other fieldsHigh correlation
hiv_infection_bl is highly correlated with country_cdHigh correlation
hypertension_bl is highly correlated with country_cdHigh correlation
residence_area_cd is highly correlated with country_cdHigh correlation
vaccination_schedule_cd is highly correlated with fully_vaccinated_bl and 3 other fieldsHigh correlation
copd_bl is highly correlated with country_cdHigh correlation
immunosuppression_bl is highly correlated with country_cdHigh correlation
solid_tumor_without_metastasis_bl is highly correlated with country_cdHigh correlation
chronic_kidney_disease_bl is highly correlated with country_cdHigh correlation
transplanted_bl is highly correlated with country_cdHigh correlation
dose_3_brand_cd is highly correlated with country_cdHigh correlation
dose_1_brand_cd is highly correlated with dose_2_brand_cd and 2 other fieldsHigh correlation
sickle_cell_disease_bl is highly correlated with country_cdHigh correlation
foreign_bl is highly correlated with country_cdHigh correlation
previous_infection_bl is highly correlated with country_cdHigh correlation
institutionalized_bl is highly correlated with country_cdHigh correlation
confirmed_case_bl is highly correlated with test_type_cd and 1 other fieldsHigh correlation
sex_cd is highly correlated with country_cdHigh correlation
exitus_bl is highly correlated with country_cdHigh correlation
heart_failure_bl is highly correlated with country_cdHigh correlation
socecon_lvl_cd is highly correlated with country_cdHigh correlation
chronic_liver_disease_bl is highly correlated with country_cdHigh correlation
blood_cancer_bl is highly correlated with country_cdHigh correlation
test_type_cd is highly correlated with confirmed_case_bl and 1 other fieldsHigh correlation
essential_worker_bl is highly correlated with country_cdHigh correlation
obesity_bl is highly correlated with country_cdHigh correlation
primary_immunodeficiency_bl is highly correlated with country_cdHigh correlation
country_cd is highly correlated with fully_vaccinated_bl and 29 other fieldsHigh correlation
diabetes_bl is highly correlated with country_cdHigh correlation
dose_1_brand_cd is highly correlated with dose_2_brand_cd and 1 other fieldsHigh correlation
dose_2_brand_cd is highly correlated with dose_1_brand_cd and 1 other fieldsHigh correlation
number_doses is highly correlated with fully_vaccinated_blHigh correlation
fully_vaccinated_bl is highly correlated with number_dosesHigh correlation
vaccination_schedule_cd is highly correlated with dose_1_brand_cd and 1 other fieldsHigh correlation
residence_area_cd has 12931 (2.0%) missing values Missing
country_cd has 19498 (3.0%) missing values Missing
dose_1_brand_cd has 61842 (9.5%) missing values Missing
dose_1_dt has 61842 (9.5%) missing values Missing
dose_2_brand_cd has 76725 (11.8%) missing values Missing
dose_2_dt has 77001 (11.8%) missing values Missing
dose_3_brand_cd has 316951 (48.8%) missing values Missing
dose_3_dt has 317613 (48.9%) missing values Missing
fully_vaccinated_dt has 65276 (10.0%) missing values Missing
vaccination_schedule_cd has 65276 (10.0%) missing values Missing
confirmed_case_dt has 409896 (63.1%) missing values Missing
previous_infection_dt has 633977 (97.5%) missing values Missing
test_type_cd has 412321 (63.4%) missing values Missing
variant_cd has 650000 (100.0%) missing values Missing
exitus_dt has 644367 (99.1%) missing values Missing
person_id is uniformly distributed Uniform
person_id has unique values Unique
variant_cd is an unsupported type, check if it needs cleaning or further analysis Unsupported
number_doses has 61842 (9.5%) zeros Zeros

Reproduction

Analysis started2023-01-25 16:52:05.611139
Analysis finished2023-01-25 16:53:56.481764
Duration1 minute and 50.87 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

person_id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct650000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size41.5 MiB
fGlUXqtSAe
 
1
tCQpJfodiZ
 
1
XebUQnIPfS
 
1
SBAdHNBcQl
 
1
CDQrGDjijP
 
1
Other values (649995)
649995 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique650000 ?
Unique (%)100.0%

Sample

1st rowfGlUXqtSAe
2nd rowmTAiOjWlCE
3rd rowSqCZDQQsye
4th rowZBYnvWCNxj
5th rowNxljjAbkxT

Common Values

ValueCountFrequency (%)
fGlUXqtSAe1
 
< 0.1%
tCQpJfodiZ1
 
< 0.1%
XebUQnIPfS1
 
< 0.1%
SBAdHNBcQl1
 
< 0.1%
CDQrGDjijP1
 
< 0.1%
XReSfpbUAs1
 
< 0.1%
aOSCSoBUUR1
 
< 0.1%
fJANMilZpP1
 
< 0.1%
nspMwvNNSW1
 
< 0.1%
HgfgDxGVVw1
 
< 0.1%
Other values (649990)649990
> 99.9%

Length

2023-01-25T17:53:56.540083image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fgluxqtsae1
 
< 0.1%
bucdemoqgq1
 
< 0.1%
jyfykavxif1
 
< 0.1%
dcpffnddie1
 
< 0.1%
cckxwrvsym1
 
< 0.1%
sqczdqqsye1
 
< 0.1%
zbynvwcnxj1
 
< 0.1%
nxljjabkxt1
 
< 0.1%
ltdrvqgqiu1
 
< 0.1%
ubtfwplnsm1
 
< 0.1%
Other values (649990)649990
> 99.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

age_nm
Real number (ℝ≥0)

Distinct111
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.97360769
Minimum5
Maximum115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 MiB
2023-01-25T17:53:56.639369image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile30
Q145
median55
Q365
95-th percentile80
Maximum115
Range110
Interquartile range (IQR)20

Descriptive statistics

Standard deviation14.99833228
Coefficient of variation (CV)0.2728278697
Kurtosis-0.01803060068
Mean54.97360769
Median Absolute Deviation (MAD)10
Skewness0.005977280489
Sum35732845
Variance224.9499711
MonotonicityNot monotonic
2023-01-25T17:53:56.757420image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5417391
 
2.7%
5517371
 
2.7%
5317128
 
2.6%
5717070
 
2.6%
5617024
 
2.6%
5817015
 
2.6%
5216939
 
2.6%
5116756
 
2.6%
5916591
 
2.6%
6016436
 
2.5%
Other values (101)480279
73.9%
ValueCountFrequency (%)
5311
< 0.1%
673
 
< 0.1%
797
 
< 0.1%
8115
 
< 0.1%
9159
 
< 0.1%
10206
< 0.1%
11253
< 0.1%
12259
< 0.1%
13344
0.1%
14412
0.1%
ValueCountFrequency (%)
11534
< 0.1%
1146
 
< 0.1%
1139
 
< 0.1%
11214
 
< 0.1%
11118
 
< 0.1%
11027
< 0.1%
10919
 
< 0.1%
10840
< 0.1%
10740
< 0.1%
10652
< 0.1%

sex_cd
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.0 MiB
1
318999 
2
317949 
0
 
13052

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row2
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
1318999
49.1%
2317949
48.9%
013052
 
2.0%

Length

2023-01-25T17:53:56.866456image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:56.926883image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1318999
49.1%
2317949
48.9%
013052
 
2.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

socecon_lvl_cd
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.0 MiB
3
236724 
4
222080 
2
89107 
5
88152 
1
 
13937

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4
2nd row5
3rd row3
4th row5
5th row2

Common Values

ValueCountFrequency (%)
3236724
36.4%
4222080
34.2%
289107
 
13.7%
588152
 
13.6%
113937
 
2.1%

Length

2023-01-25T17:53:56.987246image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:57.049332image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
3236724
36.4%
4222080
34.2%
289107
 
13.7%
588152
 
13.6%
113937
 
2.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

residence_area_cd
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing12931
Missing (%)2.0%
Memory size36.8 MiB
733
454865 
732
91194 
731
91010 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row731
2nd row733
3rd row733
4th row733
5th row732

Common Values

ValueCountFrequency (%)
733454865
70.0%
73291194
 
14.0%
73191010
 
14.0%
(Missing)12931
 
2.0%

Length

2023-01-25T17:53:57.118007image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:57.177960image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
733454865
71.4%
73291194
 
14.3%
73191010
 
14.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

country_cd
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing19498
Missing (%)3.0%
Memory size36.7 MiB
ESP
630502 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowESP
2nd rowESP
3rd rowESP
4th rowESP
5th rowESP

Common Values

ValueCountFrequency (%)
ESP630502
97.0%
(Missing)19498
 
3.0%

Length

2023-01-25T17:53:57.236607image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:57.292331image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
esp630502
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

foreign_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
552955 
True
97045 
ValueCountFrequency (%)
False552955
85.1%
True97045
 
14.9%
2023-01-25T17:53:57.317492image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

essential_worker_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
630624 
True
 
19376
ValueCountFrequency (%)
False630624
97.0%
True19376
 
3.0%
2023-01-25T17:53:57.347228image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

institutionalized_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
630569 
True
 
19431
ValueCountFrequency (%)
False630569
97.0%
True19431
 
3.0%
2023-01-25T17:53:57.376971image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

dose_1_brand_cd
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing61842
Missing (%)9.5%
Memory size35.0 MiB
BP
409711 
MD
89065 
AZ
60006 
JJ
 
29376

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBP
2nd rowBP
3rd rowBP
4th rowBP
5th rowBP

Common Values

ValueCountFrequency (%)
BP409711
63.0%
MD89065
 
13.7%
AZ60006
 
9.2%
JJ29376
 
4.5%
(Missing)61842
 
9.5%

Length

2023-01-25T17:53:57.431770image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:57.491143image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
bp409711
69.7%
md89065
 
15.1%
az60006
 
10.2%
jj29376
 
5.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

dose_1_dt
Date

MISSING

Distinct356
Distinct (%)0.1%
Missing61842
Missing (%)9.5%
Memory size5.0 MiB
Minimum2020-12-01 00:00:00
Maximum2021-11-30 00:00:00
2023-01-25T17:53:57.568161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:57.681499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

dose_2_brand_cd
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing76725
Missing (%)11.8%
Memory size34.6 MiB
BP
425326 
MD
89567 
AZ
58382 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBP
2nd rowBP
3rd rowBP
4th rowBP
5th rowBP

Common Values

ValueCountFrequency (%)
BP425326
65.4%
MD89567
 
13.8%
AZ58382
 
9.0%
(Missing)76725
 
11.8%

Length

2023-01-25T17:53:57.785370image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:57.843029image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
bp425326
74.2%
md89567
 
15.6%
az58382
 
10.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

dose_2_dt
Date

MISSING

Distinct476
Distinct (%)0.1%
Missing77001
Missing (%)11.8%
Memory size5.0 MiB
Minimum2020-12-22 00:00:00
Maximum2022-06-05 00:00:00
2023-01-25T17:53:57.914021image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:58.030793image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

dose_3_brand_cd
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing316951
Missing (%)48.8%
Memory size28.4 MiB
BP
299916 
MD
33133 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBP
2nd rowBP
3rd rowBP
4th rowBP
5th rowMD

Common Values

ValueCountFrequency (%)
BP299916
46.1%
MD33133
 
5.1%
(Missing)316951
48.8%

Length

2023-01-25T17:53:58.134054image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:58.190310image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
bp299916
90.1%
md33133
 
9.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

dose_3_dt
Date

MISSING

Distinct406
Distinct (%)0.1%
Missing317613
Missing (%)48.9%
Memory size5.0 MiB
Minimum2021-05-21 00:00:00
Maximum2022-07-30 00:00:00
2023-01-25T17:53:58.261625image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:58.381464image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

number_doses
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.849112308
Minimum0
Maximum5
Zeros61842
Zeros (%)9.5%
Negative0
Negative (%)0.0%
Memory size5.0 MiB
2023-01-25T17:53:58.484118image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q34
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.470351371
Coefficient of variation (CV)0.5160735036
Kurtosis-0.6839744431
Mean2.849112308
Median Absolute Deviation (MAD)1
Skewness-0.1665344134
Sum1851923
Variance2.161933153
MonotonicityNot monotonic
2023-01-25T17:53:58.559706image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2227402
35.0%
3115562
17.8%
5115134
17.7%
4114901
17.7%
061842
 
9.5%
115159
 
2.3%
ValueCountFrequency (%)
061842
 
9.5%
115159
 
2.3%
2227402
35.0%
3115562
17.8%
4114901
17.7%
5115134
17.7%
ValueCountFrequency (%)
5115134
17.7%
4114901
17.7%
3115562
17.8%
2227402
35.0%
115159
 
2.3%
061842
 
9.5%
Distinct378
Distinct (%)0.1%
Missing65276
Missing (%)10.0%
Memory size5.0 MiB
Minimum2020-12-10 00:00:00
Maximum2022-01-11 00:00:00
2023-01-25T17:53:58.657620image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:58.775400image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

fully_vaccinated_bl
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
True
584724 
False
65276 
ValueCountFrequency (%)
True584724
90.0%
False65276
 
10.0%
2023-01-25T17:53:58.857104image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

vaccination_schedule_cd
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing65276
Missing (%)10.0%
Memory size36.5 MiB
BP-BP
409276 
MD-MD
87777 
AZ-AZ
58334 
JJ
 
29337

Length

Max length5
Median length5
Mean length4.849482833
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBP-BP
2nd rowBP-BP
3rd rowBP-BP
4th rowBP-BP
5th rowBP-BP

Common Values

ValueCountFrequency (%)
BP-BP409276
63.0%
MD-MD87777
 
13.5%
AZ-AZ58334
 
9.0%
JJ29337
 
4.5%
(Missing)65276
 
10.0%

Length

2023-01-25T17:53:58.914299image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:58.977070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
bp-bp409276
70.0%
md-md87777
 
15.0%
az-az58334
 
10.0%
jj29337
 
5.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct816
Distinct (%)0.3%
Missing409896
Missing (%)63.1%
Memory size5.0 MiB
Minimum2020-02-29 00:00:00
Maximum2022-05-24 00:00:00
2023-01-25T17:53:59.059201image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:59.177983image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

confirmed_case_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
409896 
True
240104 
ValueCountFrequency (%)
False409896
63.1%
True240104
36.9%
2023-01-25T17:53:59.263833image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Distinct364
Distinct (%)2.3%
Missing633977
Missing (%)97.5%
Memory size5.0 MiB
Minimum2021-02-07 00:00:00
Maximum2022-02-18 00:00:00
2023-01-25T17:53:59.332210image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:59.446036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

previous_infection_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
633977 
True
 
16023
ValueCountFrequency (%)
False633977
97.5%
True16023
 
2.5%
2023-01-25T17:53:59.715476image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

test_type_cd
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing412321
Missing (%)63.4%
Memory size26.1 MiB
PCR
143984 
AG
84152 
other
 
9543

Length

Max length5
Median length3
Mean length2.72624422
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAG
2nd rowPCR
3rd rowPCR
4th rowAG
5th rowPCR

Common Values

ValueCountFrequency (%)
PCR143984
 
22.2%
AG84152
 
12.9%
other9543
 
1.5%
(Missing)412321
63.4%

Length

2023-01-25T17:53:59.771252image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2023-01-25T17:53:59.831034image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
pcr143984
60.6%
ag84152
35.4%
other9543
 
4.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

variant_cd
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing650000
Missing (%)100.0%
Memory size5.0 MiB

diabetes_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
617788 
True
 
32212
ValueCountFrequency (%)
False617788
95.0%
True32212
 
5.0%
2023-01-25T17:53:59.863867image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

obesity_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
584838 
True
65162 
ValueCountFrequency (%)
False584838
90.0%
True65162
 
10.0%
2023-01-25T17:53:59.892726image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

heart_failure_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
639256 
True
 
10744
ValueCountFrequency (%)
False639256
98.3%
True10744
 
1.7%
2023-01-25T17:53:59.921333image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

copd_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
622988 
True
 
27012
ValueCountFrequency (%)
False622988
95.8%
True27012
 
4.2%
2023-01-25T17:53:59.950051image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

solid_tumor_without_metastasis_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
643445 
True
 
6555
ValueCountFrequency (%)
False643445
99.0%
True6555
 
1.0%
2023-01-25T17:53:59.977963image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

chronic_kidney_disease_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
595959 
True
 
54041
ValueCountFrequency (%)
False595959
91.7%
True54041
 
8.3%
2023-01-25T17:54:00.005647image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

sickle_cell_disease_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
584833 
True
65167 
ValueCountFrequency (%)
False584833
90.0%
True65167
 
10.0%
2023-01-25T17:54:00.033497image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

hypertension_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
574256 
True
75744 
ValueCountFrequency (%)
False574256
88.3%
True75744
 
11.7%
2023-01-25T17:54:00.060628image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

chronic_liver_disease_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
596059 
True
 
53941
ValueCountFrequency (%)
False596059
91.7%
True53941
 
8.3%
2023-01-25T17:54:00.087493image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

blood_cancer_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
643571 
True
 
6429
ValueCountFrequency (%)
False643571
99.0%
True6429
 
1.0%
2023-01-25T17:54:00.114502image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

transplanted_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
585408 
True
64592 
ValueCountFrequency (%)
False585408
90.1%
True64592
 
9.9%
2023-01-25T17:54:00.140796image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

hiv_infection_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
596297 
True
 
53703
ValueCountFrequency (%)
False596297
91.7%
True53703
 
8.3%
2023-01-25T17:54:00.167069image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

primary_immunodeficiency_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
585184 
True
64816 
ValueCountFrequency (%)
False585184
90.0%
True64816
 
10.0%
2023-01-25T17:54:00.193501image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

immunosuppression_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
643472 
True
 
6528
ValueCountFrequency (%)
False643472
99.0%
True6528
 
1.0%
2023-01-25T17:54:00.219694image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

pregnancy_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
641955 
True
 
8045
ValueCountFrequency (%)
False641955
98.8%
True8045
 
1.2%
2023-01-25T17:54:00.245965image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

exitus_dt
Date

MISSING

Distinct806
Distinct (%)14.3%
Missing644367
Missing (%)99.1%
Memory size5.0 MiB
Minimum2020-03-10 00:00:00
Maximum2022-05-24 00:00:00
2023-01-25T17:54:00.305444image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:54:00.409318image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

exitus_bl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size634.9 KiB
False
644367 
True
 
5633
ValueCountFrequency (%)
False644367
99.1%
True5633
 
0.9%
2023-01-25T17:54:00.486926image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Interactions

2023-01-25T17:53:41.089812image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:39.219032image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:42.010845image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-01-25T17:53:40.448466image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2023-01-25T17:54:00.564153image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-01-25T17:54:00.814464image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-01-25T17:53:45.165541image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-01-25T17:53:48.513587image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-01-25T17:53:53.073418image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2023-01-25T17:53:55.422443image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

person_idage_nmsex_cdsocecon_lvl_cdresidence_area_cdcountry_cdforeign_blessential_worker_blinstitutionalized_bldose_1_brand_cddose_1_dtdose_2_brand_cddose_2_dtdose_3_brand_cddose_3_dtnumber_dosesfully_vaccinated_dtfully_vaccinated_blvaccination_schedule_cdconfirmed_case_dtconfirmed_case_blprevious_infection_dtprevious_infection_bltest_type_cdvariant_cddiabetes_blobesity_blheart_failure_blcopd_blsolid_tumor_without_metastasis_blchronic_kidney_disease_blsickle_cell_disease_blhypertension_blchronic_liver_disease_blblood_cancer_bltransplanted_blhiv_infection_blprimary_immunodeficiency_blimmunosuppression_blpregnancy_blexitus_dtexitus_bl
0fGlUXqtSAe4804731ESPTrueFalseFalseBP2021-04-19BP2021-05-09BP2021-10-164.02021-05-09TrueBP-BPNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
1mTAiOjWlCE2825733ESPFalseFalseFalseBP2021-05-17BP2021-06-07BP2021-12-315.02021-06-07TrueBP-BPNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
2SqCZDQQsye4923733ESPFalseFalseFalseBP2021-04-13BP2021-05-04BP2021-10-294.02021-05-04TrueBP-BP2022-01-01TrueNaTFalseAGNaNFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseTrueFalseFalseFalseFalseNaTFalse
3ZBYnvWCNxj6125733ESPFalseFalseFalseBP2021-02-06BP2021-02-28BP2021-08-143.02021-02-28TrueBP-BPNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
4NxljjAbkxT6312732ESPFalseFalseFalseBP2021-04-30BP2021-05-21MD2021-12-094.02021-05-21TrueBP-BP2022-01-31TrueNaTFalsePCRNaNFalseTrueFalseFalseFalseFalseFalseTrueTrueFalseFalseFalseFalseFalseFalseNaTFalse
5ltdRvqgqIU5513731NaNFalseFalseFalseBP2021-06-13BP2021-07-03BP2021-12-163.02021-07-03TrueBP-BPNaTFalseNaTFalseNaNNaNFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
6BUCDemoQgq4913733ESPFalseFalseFalseBP2021-04-04BP2021-04-25BP2021-09-283.02021-04-25TrueBP-BPNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
7ubTFwpLnsm4524732ESPFalseTrueFalseBP2021-05-13BP2021-06-04BP2021-12-213.02021-06-04TrueBP-BPNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseFalseNaTFalse
8YdCCUKjYLR7224733ESPFalseFalseFalseBP2021-02-16BP2021-03-07BP2021-09-053.02021-03-07TrueBP-BPNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseTrueFalseFalseFalseFalseNaTFalse
9cCKxWrvsym4924733ESPFalseFalseFalseBP2021-03-06BP2021-03-27BP2021-08-255.02021-03-27TrueBP-BP2021-12-14TrueNaTFalsePCRNaNFalseTrueFalseFalseFalseFalseFalseTrueFalseFalseFalseTrueFalseTrueFalseNaTFalse

Last rows

person_idage_nmsex_cdsocecon_lvl_cdresidence_area_cdcountry_cdforeign_blessential_worker_blinstitutionalized_bldose_1_brand_cddose_1_dtdose_2_brand_cddose_2_dtdose_3_brand_cddose_3_dtnumber_dosesfully_vaccinated_dtfully_vaccinated_blvaccination_schedule_cdconfirmed_case_dtconfirmed_case_blprevious_infection_dtprevious_infection_bltest_type_cdvariant_cddiabetes_blobesity_blheart_failure_blcopd_blsolid_tumor_without_metastasis_blchronic_kidney_disease_blsickle_cell_disease_blhypertension_blchronic_liver_disease_blblood_cancer_bltransplanted_blhiv_infection_blprimary_immunodeficiency_blimmunosuppression_blpregnancy_blexitus_dtexitus_bl
649990PBkHoSLhWm5923733ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaNNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
649991yeyBlYEyat6614733ESPTrueFalseTrueNaNNaTNaNNaTNaNNaT0.0NaTFalseNaNNaTFalseNaTFalseNaNNaNFalseFalseFalseTrueFalseFalseFalseFalseTrueFalseFalseTrueTrueFalseFalseNaTFalse
649992uuwvWaKrgy5523733ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaN2021-12-04TrueNaTFalsePCRNaNFalseFalseFalseFalseFalseTrueTrueFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
649993JwMmeuicLD6922732ESPTrueFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaNNaTFalseNaTFalseNaNNaNFalseFalseFalseTrueFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
649994qTWunusBds5922733ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaN2022-02-13TrueNaTFalsePCRNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
649995EkqloIPHyk3814732ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaNNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
649996KCITuUpkyV5914733ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaN2020-10-07TrueNaTFalseAGNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse
649997CHZApTPduO6823732ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaNNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseTrueTrueFalseFalseFalseFalseFalseFalseNaTFalse
649998ehMdaYLUZw3725733ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaNNaTFalseNaTFalseNaNNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalseFalseFalseFalseNaTFalse
649999KmReUzIdWC6213733ESPFalseFalseFalseNaNNaTNaNNaTNaNNaT0.0NaTFalseNaN2021-07-03TrueNaTFalsePCRNaNFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseNaTFalse