Dataset statistics
Number of variables | 42 |
---|---|
Number of observations | 650000 |
Missing cells | 3825516 |
Missing cells (%) | 14.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 1.3 GiB |
Average record size in memory | 2.1 KiB |
Variable types
Categorical | 10 |
---|---|
Numeric | 2 |
Boolean | 22 |
DateTime | 7 |
Unsupported | 1 |
country_cd has constant value "ESP" | Constant |
person_id has a high cardinality: 650000 distinct values | High cardinality |
fully_vaccinated_bl is highly correlated with vaccination_schedule_cd and 1 other fields | High correlation |
pregnancy_bl is highly correlated with country_cd | High correlation |
dose_2_brand_cd is highly correlated with vaccination_schedule_cd and 2 other fields | High correlation |
hiv_infection_bl is highly correlated with country_cd | High correlation |
hypertension_bl is highly correlated with country_cd | High correlation |
residence_area_cd is highly correlated with country_cd | High correlation |
vaccination_schedule_cd is highly correlated with fully_vaccinated_bl and 3 other fields | High correlation |
copd_bl is highly correlated with country_cd | High correlation |
immunosuppression_bl is highly correlated with country_cd | High correlation |
solid_tumor_without_metastasis_bl is highly correlated with country_cd | High correlation |
chronic_kidney_disease_bl is highly correlated with country_cd | High correlation |
transplanted_bl is highly correlated with country_cd | High correlation |
dose_3_brand_cd is highly correlated with country_cd | High correlation |
dose_1_brand_cd is highly correlated with dose_2_brand_cd and 2 other fields | High correlation |
sickle_cell_disease_bl is highly correlated with country_cd | High correlation |
foreign_bl is highly correlated with country_cd | High correlation |
previous_infection_bl is highly correlated with country_cd | High correlation |
institutionalized_bl is highly correlated with country_cd | High correlation |
confirmed_case_bl is highly correlated with test_type_cd and 1 other fields | High correlation |
sex_cd is highly correlated with country_cd | High correlation |
exitus_bl is highly correlated with country_cd | High correlation |
heart_failure_bl is highly correlated with country_cd | High correlation |
socecon_lvl_cd is highly correlated with country_cd | High correlation |
chronic_liver_disease_bl is highly correlated with country_cd | High correlation |
blood_cancer_bl is highly correlated with country_cd | High correlation |
test_type_cd is highly correlated with confirmed_case_bl and 1 other fields | High correlation |
essential_worker_bl is highly correlated with country_cd | High correlation |
obesity_bl is highly correlated with country_cd | High correlation |
primary_immunodeficiency_bl is highly correlated with country_cd | High correlation |
country_cd is highly correlated with fully_vaccinated_bl and 29 other fields | High correlation |
diabetes_bl is highly correlated with country_cd | High correlation |
dose_1_brand_cd is highly correlated with dose_2_brand_cd and 1 other fields | High correlation |
dose_2_brand_cd is highly correlated with dose_1_brand_cd and 1 other fields | High correlation |
number_doses is highly correlated with fully_vaccinated_bl | High correlation |
fully_vaccinated_bl is highly correlated with number_doses | High correlation |
vaccination_schedule_cd is highly correlated with dose_1_brand_cd and 1 other fields | High correlation |
residence_area_cd has 12931 (2.0%) missing values | Missing |
country_cd has 19498 (3.0%) missing values | Missing |
dose_1_brand_cd has 61842 (9.5%) missing values | Missing |
dose_1_dt has 61842 (9.5%) missing values | Missing |
dose_2_brand_cd has 76725 (11.8%) missing values | Missing |
dose_2_dt has 77001 (11.8%) missing values | Missing |
dose_3_brand_cd has 316951 (48.8%) missing values | Missing |
dose_3_dt has 317613 (48.9%) missing values | Missing |
fully_vaccinated_dt has 65276 (10.0%) missing values | Missing |
vaccination_schedule_cd has 65276 (10.0%) missing values | Missing |
confirmed_case_dt has 409896 (63.1%) missing values | Missing |
previous_infection_dt has 633977 (97.5%) missing values | Missing |
test_type_cd has 412321 (63.4%) missing values | Missing |
variant_cd has 650000 (100.0%) missing values | Missing |
exitus_dt has 644367 (99.1%) missing values | Missing |
person_id is uniformly distributed | Uniform |
person_id has unique values | Unique |
variant_cd is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
number_doses has 61842 (9.5%) zeros | Zeros |
Reproduction
Analysis started | 2023-01-25 16:52:05.611139 |
---|---|
Analysis finished | 2023-01-25 16:53:56.481764 |
Duration | 1 minute and 50.87 seconds |
Software version | pandas-profiling v3.1.0 |
Download configuration | config.json |
Distinct | 650000 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 41.5 MiB |
fGlUXqtSAe | 1 |
---|---|
tCQpJfodiZ | 1 |
XebUQnIPfS | 1 |
SBAdHNBcQl | 1 |
CDQrGDjijP | 1 |
Other values (649995) |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 650000 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | fGlUXqtSAe |
---|---|
2nd row | mTAiOjWlCE |
3rd row | SqCZDQQsye |
4th row | ZBYnvWCNxj |
5th row | NxljjAbkxT |
Common Values
Value | Count | Frequency (%) |
fGlUXqtSAe | 1 | < 0.1% |
tCQpJfodiZ | 1 | < 0.1% |
XebUQnIPfS | 1 | < 0.1% |
SBAdHNBcQl | 1 | < 0.1% |
CDQrGDjijP | 1 | < 0.1% |
XReSfpbUAs | 1 | < 0.1% |
aOSCSoBUUR | 1 | < 0.1% |
fJANMilZpP | 1 | < 0.1% |
nspMwvNNSW | 1 | < 0.1% |
HgfgDxGVVw | 1 | < 0.1% |
Other values (649990) | 649990 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
fgluxqtsae | 1 | < 0.1% |
bucdemoqgq | 1 | < 0.1% |
jyfykavxif | 1 | < 0.1% |
dcpffnddie | 1 | < 0.1% |
cckxwrvsym | 1 | < 0.1% |
sqczdqqsye | 1 | < 0.1% |
zbynvwcnxj | 1 | < 0.1% |
nxljjabkxt | 1 | < 0.1% |
ltdrvqgqiu | 1 | < 0.1% |
ubtfwplnsm | 1 | < 0.1% |
Other values (649990) | 649990 |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
age_nm
Real number (ℝ≥0)
Distinct | 111 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 54.97360769 |
Minimum | 5 |
---|---|
Maximum | 115 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 5.0 MiB |
Quantile statistics
Minimum | 5 |
---|---|
5-th percentile | 30 |
Q1 | 45 |
median | 55 |
Q3 | 65 |
95-th percentile | 80 |
Maximum | 115 |
Range | 110 |
Interquartile range (IQR) | 20 |
Descriptive statistics
Standard deviation | 14.99833228 |
---|---|
Coefficient of variation (CV) | 0.2728278697 |
Kurtosis | -0.01803060068 |
Mean | 54.97360769 |
Median Absolute Deviation (MAD) | 10 |
Skewness | 0.005977280489 |
Sum | 35732845 |
Variance | 224.9499711 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
54 | 17391 | 2.7% |
55 | 17371 | 2.7% |
53 | 17128 | 2.6% |
57 | 17070 | 2.6% |
56 | 17024 | 2.6% |
58 | 17015 | 2.6% |
52 | 16939 | 2.6% |
51 | 16756 | 2.6% |
59 | 16591 | 2.6% |
60 | 16436 | 2.5% |
Other values (101) | 480279 |
Value | Count | Frequency (%) |
5 | 311 | |
6 | 73 | < 0.1% |
7 | 97 | < 0.1% |
8 | 115 | < 0.1% |
9 | 159 | < 0.1% |
10 | 206 | |
11 | 253 | |
12 | 259 | |
13 | 344 | |
14 | 412 |
Value | Count | Frequency (%) |
115 | 34 | |
114 | 6 | < 0.1% |
113 | 9 | < 0.1% |
112 | 14 | < 0.1% |
111 | 18 | < 0.1% |
110 | 27 | |
109 | 19 | < 0.1% |
108 | 40 | |
107 | 40 | |
106 | 52 |
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 36.0 MiB |
1 | |
---|---|
2 | |
0 | 13052 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 2 |
3rd row | 2 |
4th row | 2 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 318999 | |
2 | 317949 | |
0 | 13052 | 2.0% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
1 | 318999 | |
2 | 317949 | |
0 | 13052 | 2.0% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 36.0 MiB |
3 | |
---|---|
4 | |
2 | |
5 | |
1 | 13937 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 4 |
---|---|
2nd row | 5 |
3rd row | 3 |
4th row | 5 |
5th row | 2 |
Common Values
Value | Count | Frequency (%) |
3 | 236724 | |
4 | 222080 | |
2 | 89107 | 13.7% |
5 | 88152 | 13.6% |
1 | 13937 | 2.1% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
3 | 236724 | |
4 | 222080 | |
2 | 89107 | 13.7% |
5 | 88152 | 13.6% |
1 | 13937 | 2.1% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 12931 |
Missing (%) | 2.0% |
Memory size | 36.8 MiB |
733 | |
---|---|
732 | |
731 |
Length
Max length | 3 |
---|---|
Median length | 3 |
Mean length | 3 |
Min length | 3 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 731 |
---|---|
2nd row | 733 |
3rd row | 733 |
4th row | 733 |
5th row | 732 |
Common Values
Value | Count | Frequency (%) |
733 | 454865 | |
732 | 91194 | 14.0% |
731 | 91010 | 14.0% |
(Missing) | 12931 | 2.0% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
733 | 454865 | |
732 | 91194 | 14.3% |
731 | 91010 | 14.3% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 19498 |
Missing (%) | 3.0% |
Memory size | 36.7 MiB |
ESP |
---|
Length
Max length | 3 |
---|---|
Median length | 3 |
Mean length | 3 |
Min length | 3 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | ESP |
---|---|
2nd row | ESP |
3rd row | ESP |
4th row | ESP |
5th row | ESP |
Common Values
Value | Count | Frequency (%) |
ESP | 630502 | |
(Missing) | 19498 | 3.0% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
esp | 630502 |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 552955 | |
True | 97045 | 14.9% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 19376 |
Value | Count | Frequency (%) |
False | 630624 | |
True | 19376 | 3.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 19431 |
Value | Count | Frequency (%) |
False | 630569 | |
True | 19431 | 3.0% |
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 61842 |
Missing (%) | 9.5% |
Memory size | 35.0 MiB |
BP | |
---|---|
MD | |
AZ | |
JJ | 29376 |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | BP |
---|---|
2nd row | BP |
3rd row | BP |
4th row | BP |
5th row | BP |
Common Values
Value | Count | Frequency (%) |
BP | 409711 | |
MD | 89065 | 13.7% |
AZ | 60006 | 9.2% |
JJ | 29376 | 4.5% |
(Missing) | 61842 | 9.5% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
bp | 409711 | |
md | 89065 | 15.1% |
az | 60006 | 10.2% |
jj | 29376 | 5.0% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 356 |
---|---|
Distinct (%) | 0.1% |
Missing | 61842 |
Missing (%) | 9.5% |
Memory size | 5.0 MiB |
Minimum | 2020-12-01 00:00:00 |
---|---|
Maximum | 2021-11-30 00:00:00 |
Histogram with fixed size bins (bins=50)
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 76725 |
Missing (%) | 11.8% |
Memory size | 34.6 MiB |
BP | |
---|---|
MD | |
AZ |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | BP |
---|---|
2nd row | BP |
3rd row | BP |
4th row | BP |
5th row | BP |
Common Values
Value | Count | Frequency (%) |
BP | 425326 | |
MD | 89567 | 13.8% |
AZ | 58382 | 9.0% |
(Missing) | 76725 | 11.8% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
bp | 425326 | |
md | 89567 | 15.6% |
az | 58382 | 10.2% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 476 |
---|---|
Distinct (%) | 0.1% |
Missing | 77001 |
Missing (%) | 11.8% |
Memory size | 5.0 MiB |
Minimum | 2020-12-22 00:00:00 |
---|---|
Maximum | 2022-06-05 00:00:00 |
Histogram with fixed size bins (bins=50)
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 316951 |
Missing (%) | 48.8% |
Memory size | 28.4 MiB |
BP | |
---|---|
MD |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | BP |
---|---|
2nd row | BP |
3rd row | BP |
4th row | BP |
5th row | MD |
Common Values
Value | Count | Frequency (%) |
BP | 299916 | |
MD | 33133 | 5.1% |
(Missing) | 316951 |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
bp | 299916 | |
md | 33133 | 9.9% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 406 |
---|---|
Distinct (%) | 0.1% |
Missing | 317613 |
Missing (%) | 48.9% |
Memory size | 5.0 MiB |
Minimum | 2021-05-21 00:00:00 |
---|---|
Maximum | 2022-07-30 00:00:00 |
Histogram with fixed size bins (bins=50)
Distinct | 6 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.849112308 |
Minimum | 0 |
---|---|
Maximum | 5 |
Zeros | 61842 |
Zeros (%) | 9.5% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 5.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 2 |
median | 3 |
Q3 | 4 |
95-th percentile | 5 |
Maximum | 5 |
Range | 5 |
Interquartile range (IQR) | 2 |
Descriptive statistics
Standard deviation | 1.470351371 |
---|---|
Coefficient of variation (CV) | 0.5160735036 |
Kurtosis | -0.6839744431 |
Mean | 2.849112308 |
Median Absolute Deviation (MAD) | 1 |
Skewness | -0.1665344134 |
Sum | 1851923 |
Variance | 2.161933153 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
Value | Count | Frequency (%) |
2 | 227402 | |
3 | 115562 | |
5 | 115134 | |
4 | 114901 | |
0 | 61842 | 9.5% |
1 | 15159 | 2.3% |
Value | Count | Frequency (%) |
0 | 61842 | 9.5% |
1 | 15159 | 2.3% |
2 | 227402 | |
3 | 115562 | |
4 | 114901 | |
5 | 115134 |
Value | Count | Frequency (%) |
5 | 115134 | |
4 | 114901 | |
3 | 115562 | |
2 | 227402 | |
1 | 15159 | 2.3% |
0 | 61842 | 9.5% |
Distinct | 378 |
---|---|
Distinct (%) | 0.1% |
Missing | 65276 |
Missing (%) | 10.0% |
Memory size | 5.0 MiB |
Minimum | 2020-12-10 00:00:00 |
---|---|
Maximum | 2022-01-11 00:00:00 |
Histogram with fixed size bins (bins=50)
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
True | |
---|---|
False |
Value | Count | Frequency (%) |
True | 584724 | |
False | 65276 | 10.0% |
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 65276 |
Missing (%) | 10.0% |
Memory size | 36.5 MiB |
BP-BP | |
---|---|
MD-MD | |
AZ-AZ | |
JJ | 29337 |
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 4.849482833 |
Min length | 2 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | BP-BP |
---|---|
2nd row | BP-BP |
3rd row | BP-BP |
4th row | BP-BP |
5th row | BP-BP |
Common Values
Value | Count | Frequency (%) |
BP-BP | 409276 | |
MD-MD | 87777 | 13.5% |
AZ-AZ | 58334 | 9.0% |
JJ | 29337 | 4.5% |
(Missing) | 65276 | 10.0% |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
bp-bp | 409276 | |
md-md | 87777 | 15.0% |
az-az | 58334 | 10.0% |
jj | 29337 | 5.0% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 816 |
---|---|
Distinct (%) | 0.3% |
Missing | 409896 |
Missing (%) | 63.1% |
Memory size | 5.0 MiB |
Minimum | 2020-02-29 00:00:00 |
---|---|
Maximum | 2022-05-24 00:00:00 |
Histogram with fixed size bins (bins=50)
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 409896 | |
True | 240104 |
Distinct | 364 |
---|---|
Distinct (%) | 2.3% |
Missing | 633977 |
Missing (%) | 97.5% |
Memory size | 5.0 MiB |
Minimum | 2021-02-07 00:00:00 |
---|---|
Maximum | 2022-02-18 00:00:00 |
Histogram with fixed size bins (bins=50)
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 16023 |
Value | Count | Frequency (%) |
False | 633977 | |
True | 16023 | 2.5% |
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 412321 |
Missing (%) | 63.4% |
Memory size | 26.1 MiB |
PCR | |
---|---|
AG | |
other | 9543 |
Length
Max length | 5 |
---|---|
Median length | 3 |
Mean length | 2.72624422 |
Min length | 2 |
Characters and Unicode
Total characters | 0 |
---|---|
Distinct characters | 0 |
Distinct categories | 0 ? |
Distinct scripts | 0 ? |
Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | AG |
---|---|
2nd row | PCR |
3rd row | PCR |
4th row | AG |
5th row | PCR |
Common Values
Value | Count | Frequency (%) |
PCR | 143984 | 22.2% |
AG | 84152 | 12.9% |
other | 9543 | 1.5% |
(Missing) | 412321 |
Length
Histogram of lengths of the category
Pie chart
Value | Count | Frequency (%) |
pcr | 143984 | |
ag | 84152 | |
other | 9543 | 4.0% |
Most occurring characters
Value | Count | Frequency (%) |
No values found. |
Most occurring categories
Value | Count | Frequency (%) |
No values found. |
Most frequent character per category
Most occurring scripts
Value | Count | Frequency (%) |
No values found. |
Most frequent character per script
Most occurring blocks
Value | Count | Frequency (%) |
No values found. |
Most frequent character per block
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 32212 |
Value | Count | Frequency (%) |
False | 617788 | |
True | 32212 | 5.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 584838 | |
True | 65162 | 10.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 10744 |
Value | Count | Frequency (%) |
False | 639256 | |
True | 10744 | 1.7% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 27012 |
Value | Count | Frequency (%) |
False | 622988 | |
True | 27012 | 4.2% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 6555 |
Value | Count | Frequency (%) |
False | 643445 | |
True | 6555 | 1.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 54041 |
Value | Count | Frequency (%) |
False | 595959 | |
True | 54041 | 8.3% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 584833 | |
True | 65167 | 10.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 574256 | |
True | 75744 | 11.7% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 53941 |
Value | Count | Frequency (%) |
False | 596059 | |
True | 53941 | 8.3% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 6429 |
Value | Count | Frequency (%) |
False | 643571 | |
True | 6429 | 1.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 585408 | |
True | 64592 | 9.9% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 53703 |
Value | Count | Frequency (%) |
False | 596297 | |
True | 53703 | 8.3% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True |
Value | Count | Frequency (%) |
False | 585184 | |
True | 64816 | 10.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 6528 |
Value | Count | Frequency (%) |
False | 643472 | |
True | 6528 | 1.0% |
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 634.9 KiB |
False | |
---|---|
True | 8045 |
Value | Count | Frequency (%) |
False | 641955 | |
True | 8045 | 1.2% |
Distinct | 806 |
---|---|
Distinct (%) | 14.3% |
Missing | 644367 |
Missing (%) | 99.1% |
Memory size | 5.0 MiB |
Minimum | 2020-03-10 00:00:00 |
---|---|
Maximum | 2022-05-24 00:00:00 |
Histogram with fixed size bins (bins=50)
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
First rows
person_id | age_nm | sex_cd | socecon_lvl_cd | residence_area_cd | country_cd | foreign_bl | essential_worker_bl | institutionalized_bl | dose_1_brand_cd | dose_1_dt | dose_2_brand_cd | dose_2_dt | dose_3_brand_cd | dose_3_dt | number_doses | fully_vaccinated_dt | fully_vaccinated_bl | vaccination_schedule_cd | confirmed_case_dt | confirmed_case_bl | previous_infection_dt | previous_infection_bl | test_type_cd | variant_cd | diabetes_bl | obesity_bl | heart_failure_bl | copd_bl | solid_tumor_without_metastasis_bl | chronic_kidney_disease_bl | sickle_cell_disease_bl | hypertension_bl | chronic_liver_disease_bl | blood_cancer_bl | transplanted_bl | hiv_infection_bl | primary_immunodeficiency_bl | immunosuppression_bl | pregnancy_bl | exitus_dt | exitus_bl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | fGlUXqtSAe | 48 | 0 | 4 | 731 | ESP | True | False | False | BP | 2021-04-19 | BP | 2021-05-09 | BP | 2021-10-16 | 4.0 | 2021-05-09 | True | BP-BP | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | NaT | False |
1 | mTAiOjWlCE | 28 | 2 | 5 | 733 | ESP | False | False | False | BP | 2021-05-17 | BP | 2021-06-07 | BP | 2021-12-31 | 5.0 | 2021-06-07 | True | BP-BP | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
2 | SqCZDQQsye | 49 | 2 | 3 | 733 | ESP | False | False | False | BP | 2021-04-13 | BP | 2021-05-04 | BP | 2021-10-29 | 4.0 | 2021-05-04 | True | BP-BP | 2022-01-01 | True | NaT | False | AG | NaN | False | False | False | False | False | False | False | True | False | False | True | False | False | False | False | NaT | False |
3 | ZBYnvWCNxj | 61 | 2 | 5 | 733 | ESP | False | False | False | BP | 2021-02-06 | BP | 2021-02-28 | BP | 2021-08-14 | 3.0 | 2021-02-28 | True | BP-BP | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
4 | NxljjAbkxT | 63 | 1 | 2 | 732 | ESP | False | False | False | BP | 2021-04-30 | BP | 2021-05-21 | MD | 2021-12-09 | 4.0 | 2021-05-21 | True | BP-BP | 2022-01-31 | True | NaT | False | PCR | NaN | False | True | False | False | False | False | False | True | True | False | False | False | False | False | False | NaT | False |
5 | ltdRvqgqIU | 55 | 1 | 3 | 731 | NaN | False | False | False | BP | 2021-06-13 | BP | 2021-07-03 | BP | 2021-12-16 | 3.0 | 2021-07-03 | True | BP-BP | NaT | False | NaT | False | NaN | NaN | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
6 | BUCDemoQgq | 49 | 1 | 3 | 733 | ESP | False | False | False | BP | 2021-04-04 | BP | 2021-04-25 | BP | 2021-09-28 | 3.0 | 2021-04-25 | True | BP-BP | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
7 | ubTFwpLnsm | 45 | 2 | 4 | 732 | ESP | False | True | False | BP | 2021-05-13 | BP | 2021-06-04 | BP | 2021-12-21 | 3.0 | 2021-06-04 | True | BP-BP | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | False | True | False | False | False | True | False | False | NaT | False |
8 | YdCCUKjYLR | 72 | 2 | 4 | 733 | ESP | False | False | False | BP | 2021-02-16 | BP | 2021-03-07 | BP | 2021-09-05 | 3.0 | 2021-03-07 | True | BP-BP | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | True | False | False | True | False | False | False | False | NaT | False |
9 | cCKxWrvsym | 49 | 2 | 4 | 733 | ESP | False | False | False | BP | 2021-03-06 | BP | 2021-03-27 | BP | 2021-08-25 | 5.0 | 2021-03-27 | True | BP-BP | 2021-12-14 | True | NaT | False | PCR | NaN | False | True | False | False | False | False | False | True | False | False | False | True | False | True | False | NaT | False |
Last rows
person_id | age_nm | sex_cd | socecon_lvl_cd | residence_area_cd | country_cd | foreign_bl | essential_worker_bl | institutionalized_bl | dose_1_brand_cd | dose_1_dt | dose_2_brand_cd | dose_2_dt | dose_3_brand_cd | dose_3_dt | number_doses | fully_vaccinated_dt | fully_vaccinated_bl | vaccination_schedule_cd | confirmed_case_dt | confirmed_case_bl | previous_infection_dt | previous_infection_bl | test_type_cd | variant_cd | diabetes_bl | obesity_bl | heart_failure_bl | copd_bl | solid_tumor_without_metastasis_bl | chronic_kidney_disease_bl | sickle_cell_disease_bl | hypertension_bl | chronic_liver_disease_bl | blood_cancer_bl | transplanted_bl | hiv_infection_bl | primary_immunodeficiency_bl | immunosuppression_bl | pregnancy_bl | exitus_dt | exitus_bl | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
649990 | PBkHoSLhWm | 59 | 2 | 3 | 733 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
649991 | yeyBlYEyat | 66 | 1 | 4 | 733 | ESP | True | False | True | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | NaT | False | NaT | False | NaN | NaN | False | False | False | True | False | False | False | False | True | False | False | True | True | False | False | NaT | False |
649992 | uuwvWaKrgy | 55 | 2 | 3 | 733 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | 2021-12-04 | True | NaT | False | PCR | NaN | False | False | False | False | False | True | True | False | False | False | False | False | False | False | False | NaT | False |
649993 | JwMmeuicLD | 69 | 2 | 2 | 732 | ESP | True | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | NaT | False | NaT | False | NaN | NaN | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
649994 | qTWunusBds | 59 | 2 | 2 | 733 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | 2022-02-13 | True | NaT | False | PCR | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
649995 | EkqloIPHyk | 38 | 1 | 4 | 732 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
649996 | KCITuUpkyV | 59 | 1 | 4 | 733 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | 2020-10-07 | True | NaT | False | AG | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |
649997 | CHZApTPduO | 68 | 2 | 3 | 732 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | True | True | False | False | False | False | False | False | NaT | False |
649998 | ehMdaYLUZw | 37 | 2 | 5 | 733 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | NaT | False | NaT | False | NaN | NaN | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | NaT | False |
649999 | KmReUzIdWC | 62 | 1 | 3 | 733 | ESP | False | False | False | NaN | NaT | NaN | NaT | NaN | NaT | 0.0 | NaT | False | NaN | 2021-07-03 | True | NaT | False | PCR | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaT | False |