Skip to content
Snippets Groups Projects
Commit aca76b37 authored by Jeffrey Post's avatar Jeffrey Post
Browse files

dd

parent af638904
No related branches found
No related tags found
No related merge requests found
Pipeline #111496 passed with stage
in 17 seconds
%% Cell type:markdown id: tags:
# Imputing missing values
%% Cell type:markdown id: tags:
## 0. Introduction to this notebook
### - Objective:
Goal is to resolve [Issue 13: DATA - How to deal with missing values](https://renkulab.io/gitlab/jeffrey.post/ssa_hiv_ml/-/issues/13)
Following the compilation of the main dataset, we still see some missing values. This goal of this notebook is two-fold:
1. See how many missing values there are and for which countries and surveys
2. Look at various imputation methods
NOTE: R_mice.ipynb is an R notebook with the same purpose, but using R libraries.
### - DATA used:
- We use the main dataset "all.xlsx"
### - Methods:
- Analyze how many missing values there are and where
- Simple imputer (0s, median, mean)
- Iterative imputer
### - Outputs:
- Main dataset with no missing values
%% Cell type:code id: tags:
``` python
DATA_DIR = '/home/jovyan/ssa_hiv_ml/data/new/compilation_output/'
from sklearn.decomposition import PCA
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
# Iterative imputer (MICE)
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
```
%% Cell type:code id: tags:
``` python
# Final contains all surveys in one dataset - no imputation yet
final=pd.read_excel(DATA_DIR+'all.xlsx')
```
%% Cell type:code id: tags:
``` python
final
```
%% Output
Country Survey Use.of.contraception Ever.paid.for.sex \
0 Angola 2015 13.3 8.9
1 Benin 2017 14.4 8.3
2 Benin 2011 14.0 8.2
3 Benin 2006 17.1 NaN
4 Benin 2001 17.8 NaN
.. ... ... ... ...
78 Zambia 2007 29.9 NaN
79 Zambia 2001 24.6 21.3
80 Zimbabwe 2015 48.6 18.4
81 Zimbabwe 2010 41.3 16.7
82 Zimbabwe 2005 40.1 NaN
Wife.beating.justified.W Wife.beating.justified.M Unprotected.paid.sex \
0 25.2 19.8 1.2540
1 31.8 14.9 1.9584
2 16.2 14.8 1.4035
3 46.5 13.5 0.9360
4 60.4 31.2 0.4000
.. ... ... ...
78 61.9 48.2 1.9800
79 85.4 69.3 5.1604
80 38.7 32.3 0.3430
81 39.6 33.0 0.3660
82 47.7 36.3 1.0222
General.fertility.rate Married.or.in.union.W Married.or.in.union.M ... \
0 21.6 55.3 47.6 ...
1 19.7 70.1 54.6 ...
2 17.5 70.4 56.8 ...
3 20.4 75.3 59.1 ...
4 19.3 73.4 55.9 ...
.. ... ... ... ...
78 21.4 61.6 52.8 ...
79 20.5 61.3 55.1 ...
80 14.4 61.8 49.9 ...
81 15.0 62.2 50.4 ...
82 13.7 57.7 45.6 ...
iso cow Age Wealth.index.Gini ART Christian Muslim \
0 AGO ANG 44.810127 51.3 25 90.513627 0.209644
1 BEN BEN 39.467312 47.8 52 52.994350 23.841808
2 BEN BEN 37.595908 43.4 39 40.000000 26.470000
3 BEN BEN 37.376238 38.6 0 42.750000 22.880000
4 BEN BEN 40.243902 38.6 0 43.220000 24.700000
.. ... ... ... ... ... ... ...
78 ZMB ZAM 41.105769 54.6 19 87.000000 0.550000
79 ZMB ZAM 44.597701 42.1 0 81.940000 0.680000
80 ZWE ZIM 40.807175 44.3 64 86.953063 0.875099
81 ZWE ZIM 42.035398 43.2 30 81.880000 1.070000
82 ZWE ZIM 46.621622 43.2 2 79.740000 0.580000
Folk.Religion Unaffiliated.Religion Other.Religion
0 4.140461 5.136268 0.000000
1 18.079096 5.084746 0.000000
2 28.400000 3.340000 1.800000
3 30.160000 3.830000 0.380000
4 28.700000 2.050000 1.330000
.. ... ... ...
78 10.490000 0.180000 1.750000
79 15.210000 0.210000 1.870000
80 3.818616 7.875895 0.477327
81 13.540000 2.560000 0.940000
82 13.700000 2.040000 3.920000
[83 rows x 50 columns]
%% Cell type:markdown id: tags:
## 1. Finding missing values
%% Cell type:markdown id: tags:
### I. Missing values per country/survey
%% Cell type:code id: tags:
``` python
# List countries and the number of buckets they are present in:
cb=final.loc[:,['Country','GY']].value_counts().unstack().notnull().sum(axis=1)
```
%% Cell type:code id: tags:
``` python
# Now list the countries, and number of missing values per survey
for country in final.Country.unique():
print("{} has {} surveys".format(country, cb[cb.index == country].values[0]))
for i in range(cb[cb.index == country].values[0]):
print("Survey {} of {} has {} missing values".format(i+1, final[final.Country==country].Survey.reset_index(drop=True)[i], final[final.Country==country].isna().sum(axis=1).reset_index(drop=True)[i]))
```
%% Output
Angola has 1 surveys
Survey 1 of 2015 has 0 missing values
Benin has 4 surveys
Survey 1 of 2017 has 0 missing values
Survey 2 of 2011 has 0 missing values
Survey 3 of 2006 has 3 missing values
Survey 4 of 2001 has 8 missing values
Burkina Faso has 2 surveys
Survey 1 of 2010 has 0 missing values
Survey 2 of 2003 has 8 missing values
Burundi has 2 surveys
Survey 1 of 2016 has 0 missing values
Survey 2 of 2010 has 0 missing values
Cameroon has 3 surveys
Survey 1 of 2018 has 0 missing values
Survey 2 of 2011 has 0 missing values
Survey 3 of 2004 has 4 missing values
Chad has 2 surveys
Survey 1 of 2014 has 0 missing values
Survey 2 of 2004 has 6 missing values
Congo has 2 surveys
Survey 1 of 2011 has 0 missing values
Survey 2 of 2005 has 6 missing values
Congo Democratic Republic has 2 surveys
Survey 1 of 2013 has 0 missing values
Survey 2 of 2007 has 3 missing values
Cote d'Ivoire has 1 surveys
Survey 1 of 2011 has 0 missing values
Ethiopia has 4 surveys
Survey 1 of 2016 has 0 missing values
Survey 2 of 2011 has 0 missing values
Survey 3 of 2005 has 3 missing values
Survey 4 of 2000 has 13 missing values
Gabon has 2 surveys
Survey 1 of 2012 has 0 missing values
Survey 2 of 2000 has 20 missing values
Gambia has 1 surveys
Survey 1 of 2013 has 0 missing values
Ghana has 3 surveys
Survey 1 of 2014 has 0 missing values
Survey 2 of 2008 has 1 missing values
Survey 3 of 2003 has 4 missing values
Kenya has 3 surveys
Survey 1 of 2014 has 0 missing values
Survey 2 of 2008 has 3 missing values
Survey 3 of 2003 has 2 missing values
Lesotho has 3 surveys
Survey 1 of 2014 has 0 missing values
Survey 2 of 2009 has 4 missing values
Survey 3 of 2004 has 5 missing values
Liberia has 2 surveys
Survey 1 of 2013 has 0 missing values
Survey 2 of 2007 has 2 missing values
Malawi has 3 surveys
Survey 1 of 2015 has 0 missing values
Survey 2 of 2010 has 0 missing values
Survey 3 of 2004 has 3 missing values
Mali has 4 surveys
Survey 1 of 2018 has 0 missing values
Survey 2 of 2012 has 0 missing values
Survey 3 of 2006 has 3 missing values
Survey 4 of 2001 has 9 missing values
Mozambique has 2 surveys
Survey 1 of 2011 has 0 missing values
Survey 2 of 2003 has 2 missing values
Namibia has 3 surveys
Survey 1 of 2013 has 0 missing values
Survey 2 of 2006 has 1 missing values
Survey 3 of 2000 has 9 missing values
Niger has 2 surveys
Survey 1 of 2012 has 0 missing values
Survey 2 of 2006 has 3 missing values
Nigeria has 4 surveys
Survey 1 of 2018 has 2 missing values
Survey 2 of 2013 has 0 missing values
Survey 3 of 2008 has 1 missing values
Survey 4 of 2003 has 4 missing values
Rwanda has 3 surveys
Survey 1 of 2014 has 0 missing values
Survey 2 of 2010 has 0 missing values
Survey 3 of 2007 has 27 missing values
Senegal has 3 surveys
Survey 1 of 2018 has 8 missing values
Survey 2 of 2017 has 0 missing values
Survey 3 of 2016 has 0 missing values
Sierra Leone has 2 surveys
Survey 1 of 2013 has 0 missing values
Survey 2 of 2008 has 1 missing values
Togo has 1 surveys
Survey 1 of 2013 has 0 missing values
Uganda has 4 surveys
Survey 1 of 2016 has 0 missing values
Survey 2 of 2011 has 0 missing values
Survey 3 of 2006 has 3 missing values
Survey 4 of 2000 has 6 missing values
Zambia has 4 surveys
Survey 1 of 2018 has 0 missing values
Survey 2 of 2013 has 0 missing values
Survey 3 of 2007 has 1 missing values
Survey 4 of 2001 has 7 missing values
Zimbabwe has 3 surveys
Survey 1 of 2015 has 0 missing values
Survey 2 of 2010 has 2 missing values
Survey 3 of 2005 has 3 missing values
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
### II. Missing values per indicator
%% Cell type:code id: tags:
``` python
# Overall missing values per indicator
final.isnull().sum()
```
%% Output
Country 0
Survey 0
Use.of.contraception 0
Ever.paid.for.sex 21
Wife.beating.justified.W 4
Wife.beating.justified.M 10
Unprotected.paid.sex 4
General.fertility.rate 0
Married.or.in.union.W 0
Married.or.in.union.M 1
Number.of.co.wives.0 3
Number.of.co.wives.1 3
Number.of.co.wives.2 3
Number.of.wives.1 2
Number.of.wives.2 2
First.sex.by.15.W 1
First.sex.by.15.M 2
Knowledge.about.AIDS.W 7
Knowledge.about.AIDS.M 15
Buy.from.shopkeeper.with.AIDS.W 13
Buy.from.shopkeeper.with.AIDS.M 13
Justified.condom.if.husband.has.STI.W 18
Justified.condom.if.husband.has.STI.M 23
Unprotected.higher.risk.sex.W 0
Unprotected.higher.risk.sex.M 0
Mean.number.of.sexual.partners.W.Normalized 23
Mean.number.of.sexual.partners.M.Normalized 23
Ever.receiving.HIV.test.W 8
Ever.receiving.HIV.test.M 5
Men.circumcised 0
Married.women.participating.in.decisions 7
Married.women.who.disagree.with.wife.beating 6
Literate.W 2
Literate.M 3
Access.to.media.W 2
Access.to.media.M 3
Women.who.work 1
Men.who.work 2
Female.headed.household 0
GY 0
iso 0
cow 0
Age 0
Wealth.index.Gini 0
ART 0
Christian 0
Muslim 0
Folk.Religion 0
Unaffiliated.Religion 0
Other.Religion 0
dtype: int64
%% Cell type:code id: tags:
``` python
# Same but grouping by Country
final.set_index('Country').isnull().groupby(final.set_index('Country').index).sum()
```
%% Output
Survey Use.of.contraception Ever.paid.for.sex \
Country
Angola 0 0 0
Benin 0 0 2
Burkina Faso 0 0 1
Burundi 0 0 0
Cameroon 0 0 0
Chad 0 0 0
Congo 0 0 1
Congo Democratic Republic 0 0 1
Cote d'Ivoire 0 0 0
Ethiopia 0 0 1
Gabon 0 0 1
Gambia 0 0 0
Ghana 0 0 1
Kenya 0 0 1
Lesotho 0 0 0
Liberia 0 0 1
Malawi 0 0 0
Mali 0 0 2
Mozambique 0 0 0
Namibia 0 0 1
Niger 0 0 1
Nigeria 0 0 1
Rwanda 0 0 1
Senegal 0 0 1
Sierra Leone 0 0 1
Togo 0 0 0
Uganda 0 0 1
Zambia 0 0 1
Zimbabwe 0 0 1
Wife.beating.justified.W Wife.beating.justified.M \
Country
Angola 0 0
Benin 0 0
Burkina Faso 0 0
Burundi 0 0
Cameroon 0 1
Chad 1 1
Congo 0 1
Congo Democratic Republic 0 1
Cote d'Ivoire 0 0
Ethiopia 0 0
Gabon 1 1
Gambia 0 0
Ghana 0 0
Kenya 0 0
Lesotho 0 0
Liberia 0 0
Malawi 0 0
Mali 0 1
Mozambique 0 0
Namibia 1 0
Niger 0 1
Nigeria 0 0
Rwanda 1 1
Senegal 0 2
Sierra Leone 0 0
Togo 0 0
Uganda 0 0
Zambia 0 0
Zimbabwe 0 0
Unprotected.paid.sex General.fertility.rate \
Country
Angola 0 0
Benin 0 0
Burkina Faso 0 0
Burundi 0 0
Cameroon 0 0
Chad 0 0
Congo 0 0
Congo Democratic Republic 0 0
Cote d'Ivoire 0 0
Ethiopia 1 0
Gabon 1 0
Gambia 0 0
Ghana 0 0
Kenya 0 0
Lesotho 0 0
Liberia 0 0
Malawi 0 0
Mali 0 0
Mozambique 0 0
Namibia 0 0
Niger 0 0
Nigeria 0 0
Rwanda 1 0
Senegal 1 0
Sierra Leone 0 0
Togo 0 0
Uganda 0 0
Zambia 0 0
Zimbabwe 0 0
Married.or.in.union.W Married.or.in.union.M \
Country
Angola 0 0
Benin 0 0
Burkina Faso 0 0
Burundi 0 0
Cameroon 0 0
Chad 0 0
Congo 0 0
Congo Democratic Republic 0 0
Cote d'Ivoire 0 0
Ethiopia 0 0
Gabon 0 0
Gambia 0 0
Ghana 0 0
Kenya 0 0
Lesotho 0 0
Liberia 0 0
Malawi 0 0
Mali 0 0
Mozambique 0 0
Namibia 0 0
Niger 0 0
Nigeria 0 0
Rwanda 0 0
Senegal 0 1
Sierra Leone 0 0
Togo 0 0
Uganda 0 0
Zambia 0 0
Zimbabwe 0 0
Number.of.co.wives.0 ... iso cow Age \
Country ...
Angola 0 ... 0 0 0
Benin 0 ... 0 0 0
Burkina Faso 0 ... 0 0 0
Burundi 0 ... 0 0 0
Cameroon 0 ... 0 0 0
Chad 0 ... 0 0 0
Congo 0 ... 0 0 0
Congo Democratic Republic 0 ... 0 0 0
Cote d'Ivoire 0 ... 0 0 0
Ethiopia 0 ... 0 0 0
Gabon 0 ... 0 0 0
Gambia 0 ... 0 0 0
Ghana 0 ... 0 0 0
Kenya 0 ... 0 0 0
Lesotho 2 ... 0 0 0
Liberia 0 ... 0 0 0
Malawi 0 ... 0 0 0
Mali 0 ... 0 0 0
Mozambique 0 ... 0 0 0
Namibia 0 ... 0 0 0
Niger 0 ... 0 0 0
Nigeria 0 ... 0 0 0
Rwanda 1 ... 0 0 0
Senegal 0 ... 0 0 0
Sierra Leone 0 ... 0 0 0
Togo 0 ... 0 0 0
Uganda 0 ... 0 0 0
Zambia 0 ... 0 0 0
Zimbabwe 0 ... 0 0 0
Wealth.index.Gini ART Christian Muslim \
Country
Angola 0 0 0 0
Benin 0 0 0 0
Burkina Faso 0 0 0 0
Burundi 0 0 0 0
Cameroon 0 0 0 0
Chad 0 0 0 0
Congo 0 0 0 0
Congo Democratic Republic 0 0 0 0
Cote d'Ivoire 0 0 0 0
Ethiopia 0 0 0 0
Gabon 0 0 0 0
Gambia 0 0 0 0
Ghana 0 0 0 0
Kenya 0 0 0 0
Lesotho 0 0 0 0
Liberia 0 0 0 0
Malawi 0 0 0 0
Mali 0 0 0 0
Mozambique 0 0 0 0
Namibia 0 0 0 0
Niger 0 0 0 0
Nigeria 0 0 0 0
Rwanda 0 0 0 0
Senegal 0 0 0 0
Sierra Leone 0 0 0 0
Togo 0 0 0 0
Uganda 0 0 0 0
Zambia 0 0 0 0
Zimbabwe 0 0 0 0
Folk.Religion Unaffiliated.Religion \
Country
Angola 0 0
Benin 0 0
Burkina Faso 0 0
Burundi 0 0
Cameroon 0 0
Chad 0 0
Congo 0 0
Congo Democratic Republic 0 0
Cote d'Ivoire 0 0
Ethiopia 0 0
Gabon 0 0
Gambia 0 0
Ghana 0 0
Kenya 0 0
Lesotho 0 0
Liberia 0 0
Malawi 0 0
Mali 0 0
Mozambique 0 0
Namibia 0 0
Niger 0 0
Nigeria 0 0
Rwanda 0 0
Senegal 0 0
Sierra Leone 0 0
Togo 0 0
Uganda 0 0
Zambia 0 0
Zimbabwe 0 0
Other.Religion
Country
Angola 0
Benin 0
Burkina Faso 0
Burundi 0
Cameroon 0
Chad 0
Congo 0
Congo Democratic Republic 0
Cote d'Ivoire 0
Ethiopia 0
Gabon 0
Gambia 0
Ghana 0
Kenya 0
Lesotho 0
Liberia 0
Malawi 0
Mali 0
Mozambique 0
Namibia 0
Niger 0
Nigeria 0
Rwanda 0
Senegal 0
Sierra Leone 0
Togo 0
Uganda 0
Zambia 0
Zimbabwe 0
[29 rows x 49 columns]
%% Cell type:markdown id: tags:
### III. Other ways of understanding missing data
%% Cell type:code id: tags:
``` python
# It is of interest to see the countries that have all missing data across particular indicators and across all surveys (in other words, list the indicators for which countries that have no data)
# Again this is the number of missing values per indicator per country (number of surveys per country for which the indicator has a missing value)
#final.set_index('Country').isnull().groupby(final.set_index('Country').index).sum()
# Comparing to the number of surveys that country has, we can see if no surveys for a country have data
#final.set_index('Country').groupby(final.set_index('Country').index).size()
# Now compare
final.set_index('Country').isnull().groupby(final.set_index('Country').index).sum()<final.set_index('Country').groupby(final.set_index('Country').index).size()
```
%% Output
ART Access.to.media.M Access.to.media.W Age \
Country
Angola False False False False
Benin False False False False
Burkina Faso False False False False
Burundi False False False False
Cameroon False False False False
Chad False False False False
Congo False False False False
Congo Democratic Republic False False False False
Cote d'Ivoire False False False False
Ethiopia False False False False
Gabon False False False False
Gambia False False False False
Ghana False False False False
Kenya False False False False
Lesotho False False False False
Liberia False False False False
Malawi False False False False
Mali False False False False
Mozambique False False False False
Namibia False False False False
Niger False False False False
Nigeria False False False False
Rwanda False False False False
Senegal False False False False
Sierra Leone False False False False
Togo False False False False
Uganda False False False False
Zambia False False False False
Zimbabwe False False False False
Angola Benin Burkina Faso Burundi \
Country
Angola False False False False
Benin False False False False
Burkina Faso False False False False
Burundi False False False False
Cameroon False False False False
Chad False False False False
Congo False False False False
Congo Democratic Republic False False False False
Cote d'Ivoire False False False False
Ethiopia False False False False
Gabon False False False False
Gambia False False False False
Ghana False False False False
Kenya False False False False
Lesotho False False False False
Liberia False False False False
Malawi False False False False
Mali False False False False
Mozambique False False False False
Namibia False False False False
Niger False False False False
Nigeria False False False False
Rwanda False False False False
Senegal False False False False
Sierra Leone False False False False
Togo False False False False
Uganda False False False False
Zambia False False False False
Zimbabwe False False False False
Buy.from.shopkeeper.with.AIDS.M \
Country
Angola False
Benin False
Burkina Faso False
Burundi False
Cameroon False
Chad False
Congo False
Congo Democratic Republic False
Cote d'Ivoire False
Ethiopia False
Gabon False
Gambia False
Ghana False
Kenya False
Lesotho False
Liberia False
Malawi False
Mali False
Mozambique False
Namibia False
Niger False
Nigeria False
Rwanda False
Senegal False
Sierra Leone False
Togo False
Uganda False
Zambia False
Zimbabwe False
Buy.from.shopkeeper.with.AIDS.W ... \
Country ...
Angola False ...
Benin False ...
Burkina Faso False ...
Burundi False ...
Cameroon False ...
Chad False ...
Congo False ...
Congo Democratic Republic False ...
Cote d'Ivoire False ...
Ethiopia False ...
Gabon False ...
Gambia False ...
Ghana False ...
Kenya False ...
Lesotho False ...
Liberia False ...
Malawi False ...
Mali False ...
Mozambique False ...
Namibia False ...
Niger False ...
Nigeria False ...
Rwanda False ...
Senegal False ...
Sierra Leone False ...
Togo False ...
Uganda False ...
Zambia False ...
Zimbabwe False ...
Unprotected.paid.sex Use.of.contraception \
Country
Angola False False
Benin False False
Burkina Faso False False
Burundi False False
Cameroon False False
Chad False False
Congo False False
Congo Democratic Republic False False
Cote d'Ivoire False False
Ethiopia False False
Gabon False False
Gambia False False
Ghana False False
Kenya False False
Lesotho False False
Liberia False False
Malawi False False
Mali False False
Mozambique False False
Namibia False False
Niger False False
Nigeria False False
Rwanda False False
Senegal False False
Sierra Leone False False
Togo False False
Uganda False False
Zambia False False
Zimbabwe False False
Wealth.index.Gini Wife.beating.justified.M \
Country
Angola False False
Benin False False
Burkina Faso False False
Burundi False False
Cameroon False False
Chad False False
Congo False False
Congo Democratic Republic False False
Cote d'Ivoire False False
Ethiopia False False
Gabon False False
Gambia False False
Ghana False False
Kenya False False
Lesotho False False
Liberia False False
Malawi False False
Mali False False
Mozambique False False
Namibia False False
Niger False False
Nigeria False False
Rwanda False False
Senegal False False
Sierra Leone False False
Togo False False
Uganda False False
Zambia False False
Zimbabwe False False
Wife.beating.justified.W Women.who.work Zambia \
Country
Angola False False False
Benin False False False
Burkina Faso False False False
Burundi False False False
Cameroon False False False
Chad False False False
Congo False False False
Congo Democratic Republic False False False
Cote d'Ivoire False False False
Ethiopia False False False
Gabon False False False
Gambia False False False
Ghana False False False
Kenya False False False
Lesotho False False False
Liberia False False False
Malawi False False False
Mali False False False
Mozambique False False False
Namibia False False False
Niger False False False
Nigeria False False False
Rwanda False False False
Senegal False False False
Sierra Leone False False False
Togo False False False
Uganda False False False
Zambia False False False
Zimbabwe False False False
Zimbabwe cow iso
Country
Angola False False False
Benin False False False
Burkina Faso False False False
Burundi False False False
Cameroon False False False
Chad False False False
Congo False False False
Congo Democratic Republic False False False
Cote d'Ivoire False False False
Ethiopia False False False
Gabon False False False
Gambia False False False
Ghana False False False
Kenya False False False
Lesotho False False False
Liberia False False False
Malawi False False False
Mali False False False
Mozambique False False False
Namibia False False False
Niger False False False
Nigeria False False False
Rwanda False False False
Senegal False False False
Sierra Leone False False False
Togo False False False
Uganda False False False
Zambia False False False
Zimbabwe False False False
[29 rows x 78 columns]
%% Cell type:code id: tags:
``` python
mv=final.set_index('Country').isnull().groupby(final.set_index('Country').index).sum()<final.set_index('Country').groupby(final.set_index('Country').index).size()
```
%% Cell type:code id: tags:
``` python
# Now let's see
mv.sum(axis=1)
# We see all 0s - this means every country has at least 1 values for each indicator across their surveys
```
%% Output
Country
Angola 0
Benin 0
Burkina Faso 0
Burundi 0
Cameroon 0
Chad 0
Congo 0
Congo Democratic Republic 0
Cote d'Ivoire 0
Ethiopia 0
Gabon 0
Gambia 0
Ghana 0
Kenya 0
Lesotho 0
Liberia 0
Malawi 0
Mali 0
Mozambique 0
Namibia 0
Niger 0
Nigeria 0
Rwanda 0
Senegal 0
Sierra Leone 0
Togo 0
Uganda 0
Zambia 0
Zimbabwe 0
dtype: int64
%% Cell type:code id: tags:
``` python
# Here some code to list the countries and indicators where they have most missing values
# min_surveys is to look only at countries with at least min_surveys+1
min_surveys = 2
# indic_difference means the number of non-missing values to look for
# i.e. if indic_difference is 1 and num_surveys is 2, then this code will list the countries that have at least 3 surveys, and the indicators for which there is only 1 non-missing value in the 3-4 available surveys
indic_difference = 1
for country in final.Country.unique():
for i in range(final[final.Country == country].shape[1]):
if final[final.Country==country].shape[0]>min_surveys:
if final[final.Country==country].isna().sum()[i] == final[final.Country==country].shape[0]-indic_difference:
print(country, final.columns[i])
```
%% Output
Lesotho Number.of.co.wives.0
Lesotho Number.of.co.wives.1
Lesotho Number.of.co.wives.2
Zimbabwe Knowledge.about.AIDS.W
Zimbabwe Knowledge.about.AIDS.M
%% Cell type:markdown id: tags:
## 2. Imputing the missing values
%% Cell type:code id: tags:
``` python
# Remove any unwanted columns before trying imputation
df=final.drop(columns=['Survey', 'iso', 'cow', 'GY']).set_index('Country').astype(float)
```
%% Cell type:code id: tags:
``` python
df
```
%% Output
Use.of.contraception Ever.paid.for.sex Wife.beating.justified.W \
Country
Angola 13.3 8.9 25.2
Benin 14.4 8.3 31.8
Benin 14.0 8.2 16.2
Benin 17.1 NaN 46.5
Benin 17.8 NaN 60.4
... ... ... ...
Zambia 29.9 NaN 61.9
Zambia 24.6 21.3 85.4
Zimbabwe 48.6 18.4 38.7
Zimbabwe 41.3 16.7 39.6
Zimbabwe 40.1 NaN 47.7
Wife.beating.justified.M Unprotected.paid.sex \
Country
Angola 19.8 1.2540
Benin 14.9 1.9584
Benin 14.8 1.4035
Benin 13.5 0.9360
Benin 31.2 0.4000
... ... ...
Zambia 48.2 1.9800
Zambia 69.3 5.1604
Zimbabwe 32.3 0.3430
Zimbabwe 33.0 0.3660
Zimbabwe 36.3 1.0222
General.fertility.rate Married.or.in.union.W \
Country
Angola 21.6 55.3
Benin 19.7 70.1
Benin 17.5 70.4
Benin 20.4 75.3
Benin 19.3 73.4
... ... ...
Zambia 21.4 61.6
Zambia 20.5 61.3
Zimbabwe 14.4 61.8
Zimbabwe 15.0 62.2
Zimbabwe 13.7 57.7
Married.or.in.union.M Number.of.co.wives.0 Number.of.co.wives.1 \
Country
Angola 47.6 76.9 17.2
Benin 54.6 61.3 27.7
Benin 56.8 62.9 25.2
Benin 59.1 56.4 31.0
Benin 55.9 54.2 30.0
... ... ... ...
Zambia 52.8 84.8 12.5
Zambia 55.1 83.8 11.6
Zimbabwe 49.9 87.6 8.0
Zimbabwe 50.4 84.2 8.8
Zimbabwe 45.6 83.9 7.2
... Men.who.work Female.headed.household Age \
Country ...
Angola ... 69.2 34.5 44.810127
Benin ... 84.0 24.9 39.467312
Benin ... 72.5 22.9 37.595908
Benin ... 70.5 22.5 37.376238
Benin ... 79.0 20.8 40.243902
... ... ... ... ...
Zambia ... 76.0 24.3 41.105769
Zambia ... 66.7 22.6 44.597701
Zimbabwe ... 65.0 40.6 40.807175
Zimbabwe ... 61.3 44.6 42.035398
Zimbabwe ... 62.6 37.7 46.621622
Wealth.index.Gini ART Christian Muslim Folk.Religion \
Country
Angola 51.3 25.0 90.513627 0.209644 4.140461
Benin 47.8 52.0 52.994350 23.841808 18.079096
Benin 43.4 39.0 40.000000 26.470000 28.400000
Benin 38.6 0.0 42.750000 22.880000 30.160000
Benin 38.6 0.0 43.220000 24.700000 28.700000
... ... ... ... ... ...
Zambia 54.6 19.0 87.000000 0.550000 10.490000
Zambia 42.1 0.0 81.940000 0.680000 15.210000
Zimbabwe 44.3 64.0 86.953063 0.875099 3.818616
Zimbabwe 43.2 30.0 81.880000 1.070000 13.540000
Zimbabwe 43.2 2.0 79.740000 0.580000 13.700000
Unaffiliated.Religion Other.Religion
Country
Angola 5.136268 0.000000
Benin 5.084746 0.000000
Benin 3.340000 1.800000
Benin 3.830000 0.380000
Benin 2.050000 1.330000
... ... ...
Zambia 0.180000 1.750000
Zambia 0.210000 1.870000
Zimbabwe 7.875895 0.477327
Zimbabwe 2.560000 0.940000
Zimbabwe 2.040000 3.920000
[83 rows x 45 columns]
%% Cell type:code id: tags:
``` python
df.describe()
```
%% Output
Use.of.contraception Ever.paid.for.sex Wife.beating.justified.W \
count 83.000000 62.000000 79.000000
mean 22.442169 8.377419 52.367089
std 11.627387 7.408511 17.779300
min 5.400000 0.800000 12.600000
25% 13.700000 3.125000 40.500000
50% 19.600000 6.200000 52.600000
75% 29.450000 10.275000 64.400000
max 50.200000 35.000000 88.800000
Wife.beating.justified.M Unprotected.paid.sex General.fertility.rate \
count 73.000000 79.000000 83.000000
mean 35.309589 1.282797 18.021687
std 14.634379 1.745998 3.238490
min 12.500000 0.059700 11.800000
25% 24.700000 0.364800 15.700000
50% 33.000000 0.709800 18.000000
75% 44.200000 1.322500 20.350000
max 74.800000 9.286600 26.900000
Married.or.in.union.W Married.or.in.union.M Number.of.co.wives.0 \
count 83.000000 82.000000 80.000000
mean 63.824096 50.504878 74.476250
std 10.278602 7.871450 11.046808
min 34.000000 28.800000 51.600000
25% 57.900000 47.525000 66.350000
50% 64.000000 50.650000 73.150000
75% 69.800000 56.500000 84.825000
max 88.500000 65.200000 93.200000
Number.of.co.wives.1 ... Men.who.work Female.headed.household \
count 80.000000 ... 81.000000 83.000000
mean 17.308750 ... 73.790123 27.061446
std 9.177286 ... 12.008836 7.724633
min 0.100000 ... 32.100000 9.300000
25% 9.625000 ... 65.000000 22.700000
50% 16.750000 ... 76.300000 26.600000
75% 25.800000 ... 82.100000 31.950000
max 32.700000 ... 94.000000 44.600000
Age Wealth.index.Gini ART Christian Muslim \
count 83.000000 83.000000 83.000000 83.000000 83.000000
mean 41.536783 43.267470 24.060241 56.498410 31.474679
std 2.886932 7.222325 22.075452 31.938316 33.556586
min 33.977901 29.800000 0.000000 2.407287 0.050000
25% 39.314937 39.500000 3.000000 35.750000 4.425000
50% 42.105263 42.800000 20.000000 63.070000 15.950000
75% 43.622283 46.550000 41.000000 85.125000 52.355000
max 48.491879 65.800000 79.000000 97.555386 96.379726
Folk.Religion Unaffiliated.Religion Other.Religion
count 83.000000 83.000000 83.000000
mean 9.729417 1.228020 1.061522
std 8.889242 1.833384 1.497679
min 0.000000 0.000000 0.000000
25% 4.060000 0.180000 0.170000
50% 6.390000 0.460000 0.520000
75% 13.270000 1.385000 1.380000
max 37.430000 7.875895 7.690000
[8 rows x 45 columns]
%% Cell type:markdown id: tags:
### I. SimpleImputer
%% Cell type:markdown id: tags:
#### Median strategy
%% Cell type:code id: tags:
``` python
imp_median=SimpleImputer(missing_values=np.nan, strategy='median')
imp_median.fit(df)
```
%% Output
SimpleImputer(strategy='median')
%% Cell type:code id: tags:
``` python
median=pd.DataFrame(imp_median.transform(df), columns=df.columns, index=df.index)
```
%% Cell type:code id: tags:
``` python
median.describe()
```
%% Output
Use.of.contraception Ever.paid.for.sex Wife.beating.justified.W \
count 83.000000 83.000000 83.000000
mean 22.442169 7.826506 52.378313
std 11.627387 6.460406 17.340310
min 5.400000 0.800000 12.600000
25% 13.700000 4.300000 41.600000
50% 19.600000 6.200000 52.600000
75% 29.450000 8.750000 64.200000
max 50.200000 35.000000 88.800000
Wife.beating.justified.M Unprotected.paid.sex General.fertility.rate \
count 83.000000 83.000000 83.000000
mean 35.031325 1.255183 18.021687
std 13.733881 1.707351 3.238490
min 12.500000 0.059700 11.800000
25% 25.550000 0.368500 15.700000
50% 33.000000 0.709800 18.000000
75% 43.100000 1.310500 20.350000
max 74.800000 9.286600 26.900000
Married.or.in.union.W Married.or.in.union.M Number.of.co.wives.0 \
count 83.000000 83.000000 83.000000
mean 63.824096 50.506627 74.428313
std 10.278602 7.823323 10.845709
min 34.000000 28.800000 51.600000
25% 57.900000 47.550000 66.750000
50% 64.000000 50.650000 73.150000
75% 69.800000 56.500000 84.500000
max 88.500000 65.200000 93.200000
Number.of.co.wives.1 ... Men.who.work Female.headed.household \
count 83.000000 ... 83.000000 83.000000
mean 17.288554 ... 73.850602 27.061446
std 9.008456 ... 11.867802 7.724633
min 0.100000 ... 32.100000 9.300000
25% 9.750000 ... 65.650000 22.700000
50% 16.750000 ... 76.300000 26.600000
75% 25.700000 ... 82.050000 31.950000
max 32.700000 ... 94.000000 44.600000
Age Wealth.index.Gini ART Christian Muslim \
count 83.000000 83.000000 83.000000 83.000000 83.000000
mean 41.536783 43.267470 24.060241 56.498410 31.474679
std 2.886932 7.222325 22.075452 31.938316 33.556586
min 33.977901 29.800000 0.000000 2.407287 0.050000
25% 39.314937 39.500000 3.000000 35.750000 4.425000
50% 42.105263 42.800000 20.000000 63.070000 15.950000
75% 43.622283 46.550000 41.000000 85.125000 52.355000
max 48.491879 65.800000 79.000000 97.555386 96.379726
Folk.Religion Unaffiliated.Religion Other.Religion
count 83.000000 83.000000 83.000000
mean 9.729417 1.228020 1.061522
std 8.889242 1.833384 1.497679
min 0.000000 0.000000 0.000000
25% 4.060000 0.180000 0.170000
50% 6.390000 0.460000 0.520000
75% 13.270000 1.385000 1.380000
max 37.430000 7.875895 7.690000
[8 rows x 45 columns]
%% Cell type:markdown id: tags:
#### Mean strategy
%% Cell type:code id: tags:
``` python
imp_mean=SimpleImputer(missing_values=np.nan, strategy='mean')
imp_mean.fit(df)
```
%% Output
SimpleImputer()
%% Cell type:code id: tags:
``` python
mean=pd.DataFrame(imp_mean.transform(df), columns=df.columns, index=df.index)
```
%% Cell type:code id: tags:
``` python
mean.describe()
```
%% Output
Use.of.contraception Ever.paid.for.sex Wife.beating.justified.W \
count 83.000000 83.000000 83.000000
mean 22.442169 8.377419 52.367089
std 11.627387 6.389825 17.340237
min 5.400000 0.800000 12.600000
25% 13.700000 4.300000 41.600000
50% 19.600000 8.377419 52.367089
75% 29.450000 8.750000 64.200000
max 50.200000 35.000000 88.800000
Wife.beating.justified.M Unprotected.paid.sex General.fertility.rate \
count 83.000000 83.000000 83.000000
mean 35.309589 1.282797 18.021687
std 13.713036 1.702881 3.238490
min 12.500000 0.059700 11.800000
25% 25.550000 0.368500 15.700000
50% 35.309589 0.767600 18.000000
75% 43.100000 1.310500 20.350000
max 74.800000 9.286600 26.900000
Married.or.in.union.W Married.or.in.union.M Number.of.co.wives.0 \
count 83.000000 83.000000 83.000000
mean 63.824096 50.504878 74.476250
std 10.278602 7.823306 10.842849
min 34.000000 28.800000 51.600000
25% 57.900000 47.550000 66.750000
50% 64.000000 50.504878 74.476250
75% 69.800000 56.500000 84.500000
max 88.500000 65.200000 93.200000
Number.of.co.wives.1 ... Men.who.work Female.headed.household \
count 83.000000 ... 83.000000 83.000000
mean 17.308750 ... 73.790123 27.061446
std 9.007845 ... 11.861483 7.724633
min 0.100000 ... 32.100000 9.300000
25% 9.750000 ... 65.650000 22.700000
50% 17.300000 ... 76.000000 26.600000
75% 25.700000 ... 82.050000 31.950000
max 32.700000 ... 94.000000 44.600000
Age Wealth.index.Gini ART Christian Muslim \
count 83.000000 83.000000 83.000000 83.000000 83.000000
mean 41.536783 43.267470 24.060241 56.498410 31.474679
std 2.886932 7.222325 22.075452 31.938316 33.556586
min 33.977901 29.800000 0.000000 2.407287 0.050000
25% 39.314937 39.500000 3.000000 35.750000 4.425000
50% 42.105263 42.800000 20.000000 63.070000 15.950000
75% 43.622283 46.550000 41.000000 85.125000 52.355000
max 48.491879 65.800000 79.000000 97.555386 96.379726
Folk.Religion Unaffiliated.Religion Other.Religion
count 83.000000 83.000000 83.000000
mean 9.729417 1.228020 1.061522
std 8.889242 1.833384 1.497679
min 0.000000 0.000000 0.000000
25% 4.060000 0.180000 0.170000
50% 6.390000 0.460000 0.520000
75% 13.270000 1.385000 1.380000
max 37.430000 7.875895 7.690000
[8 rows x 45 columns]
%% Cell type:markdown id: tags:
### II. Iterative Imputer
%% Cell type:code id: tags:
``` python
# Iterations do not converge unless a high number of max_iter is set
# Here set max_iter = 100
imp_iter=IterativeImputer(missing_values=np.nan, random_state=0, max_iter=100, verbose=2, tol=0.001)
imp_iter.fit(df)
```
%% Output
[IterativeImputer] Completing matrix with shape (83, 45)
[IterativeImputer] Ending imputation round 1/100, elapsed time 0.38
[IterativeImputer] Change: 137.9888263782507, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 2/100, elapsed time 0.70
[IterativeImputer] Change: 40.06655407005384, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 3/100, elapsed time 1.00
[IterativeImputer] Change: 22.61339645800811, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 4/100, elapsed time 1.22
[IterativeImputer] Change: 16.95364246518178, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 5/100, elapsed time 1.40
[IterativeImputer] Change: 13.188806660714018, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 6/100, elapsed time 1.63
[IterativeImputer] Change: 10.462530105367183, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 7/100, elapsed time 1.81
[IterativeImputer] Change: 8.465485205261345, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 8/100, elapsed time 2.12
[IterativeImputer] Change: 7.112409323411717, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 9/100, elapsed time 2.39
[IterativeImputer] Change: 5.948421730954363, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 10/100, elapsed time 2.58
[IterativeImputer] Change: 4.962263924912389, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 11/100, elapsed time 2.79
[IterativeImputer] Change: 4.300692092223311, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 12/100, elapsed time 2.96
[IterativeImputer] Change: 3.767496523644326, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 13/100, elapsed time 3.12
[IterativeImputer] Change: 3.307642284791865, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 14/100, elapsed time 3.34
[IterativeImputer] Change: 2.9037528489057314, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 15/100, elapsed time 3.53
[IterativeImputer] Change: 2.551531447575886, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 16/100, elapsed time 3.71
[IterativeImputer] Change: 2.2456579013557745, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 17/100, elapsed time 3.91
[IterativeImputer] Change: 1.9804205588270176, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 18/100, elapsed time 4.11
[IterativeImputer] Change: 1.7504764189094224, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 19/100, elapsed time 4.30
[IterativeImputer] Change: 1.5509764016079752, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 20/100, elapsed time 4.49
[IterativeImputer] Change: 1.378195885192002, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 21/100, elapsed time 4.68
[IterativeImputer] Change: 1.2279570499685295, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 22/100, elapsed time 4.95
[IterativeImputer] Change: 1.0965609916210157, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 23/100, elapsed time 5.16
[IterativeImputer] Change: 0.9813267001490678, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 24/100, elapsed time 5.36
[IterativeImputer] Change: 0.8800008326573447, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 25/100, elapsed time 5.55
[IterativeImputer] Change: 0.7906387557589047, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 26/100, elapsed time 5.86
[IterativeImputer] Change: 0.7115980474983803, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 27/100, elapsed time 6.03
[IterativeImputer] Change: 0.6415235305767608, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 28/100, elapsed time 6.19
[IterativeImputer] Change: 0.5792331467977292, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 29/100, elapsed time 6.39
[IterativeImputer] Change: 0.5237273496984192, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 30/100, elapsed time 6.70
[IterativeImputer] Change: 0.47415369136145036, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 31/100, elapsed time 6.87
[IterativeImputer] Change: 0.42978199746242396, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 32/100, elapsed time 7.03
[IterativeImputer] Change: 0.38998528363825846, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 33/100, elapsed time 7.19
[IterativeImputer] Change: 0.35468872773478344, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 34/100, elapsed time 7.38
[IterativeImputer] Change: 0.32286979154861384, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 35/100, elapsed time 7.60
[IterativeImputer] Change: 0.29411177403098465, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 36/100, elapsed time 7.80
[IterativeImputer] Change: 0.2680853269797474, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 37/100, elapsed time 7.98
[IterativeImputer] Change: 0.24450198878495089, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 38/100, elapsed time 8.15
[IterativeImputer] Change: 0.22310835666523565, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 39/100, elapsed time 8.34
[IterativeImputer] Change: 0.20368117721558976, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 40/100, elapsed time 8.55
[IterativeImputer] Change: 0.1860232023121855, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 41/100, elapsed time 8.82
[IterativeImputer] Change: 0.169959683374159, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 42/100, elapsed time 9.16
[IterativeImputer] Change: 0.15533539867950164, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 43/100, elapsed time 9.37
[IterativeImputer] Change: 0.14201212622142045, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 44/100, elapsed time 9.57
[IterativeImputer] Change: 0.12987345002471673, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 45/100, elapsed time 9.76
[IterativeImputer] Change: 0.12050978284319958, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 46/100, elapsed time 9.93
[IterativeImputer] Change: 0.11267358851135878, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 47/100, elapsed time 10.17
[IterativeImputer] Change: 0.1053476089817953, scaled tolerance: 0.09966888445530192
[IterativeImputer] Ending imputation round 48/100, elapsed time 10.35
[IterativeImputer] Change: 0.09849854648948919, scaled tolerance: 0.09966888445530192
[IterativeImputer] Early stopping criterion reached.
IterativeImputer(max_iter=100, random_state=0, verbose=2)
%% Cell type:code id: tags:
``` python
df_iter=pd.DataFrame(imp_iter.transform(df), columns=df.columns, index=df.index)
```
%% Output
[IterativeImputer] Completing matrix with shape (83, 45)
[IterativeImputer] Ending imputation round 1/48, elapsed time 0.02
[IterativeImputer] Ending imputation round 2/48, elapsed time 0.03
[IterativeImputer] Ending imputation round 3/48, elapsed time 0.04
[IterativeImputer] Ending imputation round 4/48, elapsed time 0.05
[IterativeImputer] Ending imputation round 5/48, elapsed time 0.05
[IterativeImputer] Ending imputation round 6/48, elapsed time 0.06
[IterativeImputer] Ending imputation round 7/48, elapsed time 0.07
[IterativeImputer] Ending imputation round 8/48, elapsed time 0.08
[IterativeImputer] Ending imputation round 9/48, elapsed time 0.09
[IterativeImputer] Ending imputation round 10/48, elapsed time 0.10
[IterativeImputer] Ending imputation round 11/48, elapsed time 0.11
[IterativeImputer] Ending imputation round 12/48, elapsed time 0.12
[IterativeImputer] Ending imputation round 13/48, elapsed time 0.13
[IterativeImputer] Ending imputation round 14/48, elapsed time 0.14
[IterativeImputer] Ending imputation round 15/48, elapsed time 0.14
[IterativeImputer] Ending imputation round 16/48, elapsed time 0.15
[IterativeImputer] Ending imputation round 17/48, elapsed time 0.16
[IterativeImputer] Ending imputation round 18/48, elapsed time 0.17
[IterativeImputer] Ending imputation round 19/48, elapsed time 0.18
[IterativeImputer] Ending imputation round 20/48, elapsed time 0.19
[IterativeImputer] Ending imputation round 21/48, elapsed time 0.20
[IterativeImputer] Ending imputation round 22/48, elapsed time 0.21
[IterativeImputer] Ending imputation round 23/48, elapsed time 0.22
[IterativeImputer] Ending imputation round 24/48, elapsed time 0.23
[IterativeImputer] Ending imputation round 25/48, elapsed time 0.23
[IterativeImputer] Ending imputation round 26/48, elapsed time 0.24
[IterativeImputer] Ending imputation round 27/48, elapsed time 0.25
[IterativeImputer] Ending imputation round 28/48, elapsed time 0.26
[IterativeImputer] Ending imputation round 29/48, elapsed time 0.28
[IterativeImputer] Ending imputation round 30/48, elapsed time 0.29
[IterativeImputer] Ending imputation round 31/48, elapsed time 0.30
[IterativeImputer] Ending imputation round 32/48, elapsed time 0.32
[IterativeImputer] Ending imputation round 33/48, elapsed time 0.33
[IterativeImputer] Ending imputation round 34/48, elapsed time 0.34
[IterativeImputer] Ending imputation round 35/48, elapsed time 0.36
[IterativeImputer] Ending imputation round 36/48, elapsed time 0.37
[IterativeImputer] Ending imputation round 37/48, elapsed time 0.38
[IterativeImputer] Ending imputation round 38/48, elapsed time 0.39
[IterativeImputer] Ending imputation round 39/48, elapsed time 0.40
[IterativeImputer] Ending imputation round 40/48, elapsed time 0.41
[IterativeImputer] Ending imputation round 41/48, elapsed time 0.42
[IterativeImputer] Ending imputation round 42/48, elapsed time 0.43
[IterativeImputer] Ending imputation round 43/48, elapsed time 0.44
[IterativeImputer] Ending imputation round 44/48, elapsed time 0.45
[IterativeImputer] Ending imputation round 45/48, elapsed time 0.46
[IterativeImputer] Ending imputation round 46/48, elapsed time 0.47
[IterativeImputer] Ending imputation round 47/48, elapsed time 0.48
[IterativeImputer] Ending imputation round 48/48, elapsed time 0.50
%% Cell type:code id: tags:
``` python
df_iter.describe()
```
%% Output
Use.of.contraception Ever.paid.for.sex Wife.beating.justified.W \
count 83.000000 83.000000 83.000000
mean 22.442169 8.171549 52.551518
std 11.627387 6.928908 17.376062
min 5.400000 -1.524709 12.600000
25% 13.700000 3.566879 41.600000
50% 19.600000 6.300000 53.401258
75% 29.450000 10.550000 64.200000
max 50.200000 35.000000 88.800000
Wife.beating.justified.M Unprotected.paid.sex General.fertility.rate \
count 83.000000 83.000000 83.000000
mean 36.680310 1.249607 18.021687
std 14.414606 1.717151 3.238490
min 12.500000 -0.600710 11.800000
25% 25.550000 0.364800 15.700000
50% 36.300000 0.709800 18.000000
75% 45.604299 1.322500 20.350000
max 74.800000 9.286600 26.900000
Married.or.in.union.W Married.or.in.union.M Number.of.co.wives.0 \
count 83.000000 83.000000 83.000000
mean 63.824096 50.381359 75.109111
std 10.278602 7.903825 11.342204
min 34.000000 28.800000 51.600000
25% 57.900000 47.500000 66.750000
50% 64.000000 50.500000 74.900000
75% 69.800000 56.500000 85.450000
max 88.500000 65.200000 94.976772
Number.of.co.wives.1 ... Men.who.work Female.headed.household \
count 83.000000 ... 83.000000 83.000000
mean 16.689507 ... 73.735274 27.061446
std 9.593905 ... 11.874490 7.724633
min -3.260893 ... 32.100000 9.300000
25% 9.250000 ... 65.650000 22.700000
50% 15.600000 ... 76.000000 26.600000
75% 25.700000 ... 82.050000 31.950000
max 32.700000 ... 94.000000 44.600000
Age Wealth.index.Gini ART Christian Muslim \
count 83.000000 83.000000 83.000000 83.000000 83.000000
mean 41.536783 43.267470 24.060241 56.498410 31.474679
std 2.886932 7.222325 22.075452 31.938316 33.556586
min 33.977901 29.800000 0.000000 2.407287 0.050000
25% 39.314937 39.500000 3.000000 35.750000 4.425000
50% 42.105263 42.800000 20.000000 63.070000 15.950000
75% 43.622283 46.550000 41.000000 85.125000 52.355000
max 48.491879 65.800000 79.000000 97.555386 96.379726
Folk.Religion Unaffiliated.Religion Other.Religion
count 83.000000 83.000000 83.000000
mean 9.729417 1.228020 1.061522
std 8.889242 1.833384 1.497679
min 0.000000 0.000000 0.000000
25% 4.060000 0.180000 0.170000
50% 6.390000 0.460000 0.520000
75% 13.270000 1.385000 1.380000
max 37.430000 7.875895 7.690000
[8 rows x 45 columns]
%% Cell type:code id: tags:
``` python
df.describe()
```
%% Output
Use.of.contraception Ever.paid.for.sex Wife.beating.justified.W \
count 83.000000 62.000000 79.000000
mean 22.442169 8.377419 52.367089
std 11.627387 7.408511 17.779300
min 5.400000 0.800000 12.600000
25% 13.700000 3.125000 40.500000
50% 19.600000 6.200000 52.600000
75% 29.450000 10.275000 64.400000
max 50.200000 35.000000 88.800000
Wife.beating.justified.M Unprotected.paid.sex General.fertility.rate \
count 73.000000 79.000000 83.000000
mean 35.309589 1.282797 18.021687
std 14.634379 1.745998 3.238490
min 12.500000 0.059700 11.800000
25% 24.700000 0.364800 15.700000
50% 33.000000 0.709800 18.000000
75% 44.200000 1.322500 20.350000
max 74.800000 9.286600 26.900000
Married.or.in.union.W Married.or.in.union.M Number.of.co.wives.0 \
count 83.000000 82.000000 80.000000
mean 63.824096 50.504878 74.476250
std 10.278602 7.871450 11.046808
min 34.000000 28.800000 51.600000
25% 57.900000 47.525000 66.350000
50% 64.000000 50.650000 73.150000
75% 69.800000 56.500000 84.825000
max 88.500000 65.200000 93.200000
Number.of.co.wives.1 ... Men.who.work Female.headed.household \
count 80.000000 ... 81.000000 83.000000
mean 17.308750 ... 73.790123 27.061446
std 9.177286 ... 12.008836 7.724633
min 0.100000 ... 32.100000 9.300000
25% 9.625000 ... 65.000000 22.700000
50% 16.750000 ... 76.300000 26.600000
75% 25.800000 ... 82.100000 31.950000
max 32.700000 ... 94.000000 44.600000
Age Wealth.index.Gini ART Christian Muslim \
count 83.000000 83.000000 83.000000 83.000000 83.000000
mean 41.536783 43.267470 24.060241 56.498410 31.474679
std 2.886932 7.222325 22.075452 31.938316 33.556586
min 33.977901 29.800000 0.000000 2.407287 0.050000
25% 39.314937 39.500000 3.000000 35.750000 4.425000
50% 42.105263 42.800000 20.000000 63.070000 15.950000
75% 43.622283 46.550000 41.000000 85.125000 52.355000
max 48.491879 65.800000 79.000000 97.555386 96.379726
Folk.Religion Unaffiliated.Religion Other.Religion
count 83.000000 83.000000 83.000000
mean 9.729417 1.228020 1.061522
std 8.889242 1.833384 1.497679
min 0.000000 0.000000 0.000000
25% 4.060000 0.180000 0.170000
50% 6.390000 0.460000 0.520000
75% 13.270000 1.385000 1.380000
max 37.430000 7.875895 7.690000
[8 rows x 45 columns]
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment