Aurora Tsai | Carnegie Mellon University |
December 22, 2017 | aurorat-at-andrew.cmu.edu |
Parts are adapted from the following sites:
1. Introduction 2. Visualizing Missing Data
3. MIA with MICE (Multivariate Imputation by Chained Equations)
4. MIA with the missForest package
5. MIA with the Hmisc package
6. References & other resources
We might have to work with incomplete data sets for any number of reasons. Participants may have skipped questions in a survey or assessment. Perhaps we designed our computer-mediated assessment to randomly select 50 questions from a pool of 60 questions to give to each test taker. For whatever the reason, we often have to work with incomplete data sets.
Multiple Imputation Analysis (MIA) (Little and Rubin, 2002) is a method used to fill in missing observations. It takes into account the uncertainty related to the unknown real values by imputing M plausible values for each unobserved response in the data. This renders M different versions of the data set, where the non-missing data is identical, but the missing data entries differ. Discarding all partially observed data units (e.g., through listwise deletion) is generally not recommended because it can lead to substantial bias and poor predictions (Ambler, Omar, & Royston, 2007).
It’s important to know how much missing data we have and how it is spread across our data. Data is considered Missing Completely at Random (MCAR) if “the propensity to observe a missing value in an item is unrelated to a) the value of the item itself and to other items; b) to the latent trait values; and c) to any other measured variables in the analysis” (Sulis & Porcu, 2017, p. 331). In other words, data is MCAR if there are no variables influencing what observations are missing (e.g., participants’ don’t answer a question because of the nature of the question, because it’s too difficult, or because participants in class B weren’t given the question. Data is Missing at Random (MAR) is their distribution depends only on observed data.
For this tutorial, make sure you install and load the following packages:
library(missForest)
library(Hmisc)
library(mice)
library(VIM)
library(rms)
Let’s work with the iris
sample data set in R.
data <- iris
head(iris)
Sepal.Length <dbl> | Sepal.Width <dbl> | Petal.Length <dbl> | Petal.Width <dbl> | Species <fctr> | |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
Now let’s randomly add missing values using the prodNA
function from missForest
.
#Produce NAs in 10% of the data
iris.mis <- prodNA(iris, noNA = 0.1)
head(iris.mis)
Sepal.Length <dbl> | Sepal.Width <dbl> | Petal.Length <dbl> | Petal.Width <dbl> | Species <fctr> | |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | NA | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | NA | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
We can create a table of missing values with this function from the mice
package:
md.pattern(iris.mis)
Sepal.Length Species Sepal.Width Petal.Length Petal.Width
90 1 1 1 1 1 0
5 0 1 1 1 1 1
11 1 1 0 1 1 1
7 1 1 1 0 1 1
16 1 1 1 1 0 1
6 1 0 1 1 1 1
1 0 1 0 1 1 2
2 0 1 1 0 1 2
3 1 1 0 0 1 2
1 0 1 1 1 0 2
2 1 1 1 0 0 2
2 0 0 1 1 1 2
1 1 0 0 1 1 2
2 1 0 1 0 1 2
1 1 0 1 1 0 2
11 12 16 16 20 75
Alternatively, we can also find the number of NAs for each variable using sapply:
sapply(iris.mis, function(x) sum(is.na(x)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
11 16 16 20 12
Use the aggr
function from the VIM
package to visualize missing data.
miss_plot <- aggr(iris.mis, col=c('navyblue','yellow'),
numbers=TRUE, sortVars=TRUE,
labels=names(iris.mis), cex.axis=.7,
gap=3, ylab=c("Missing data","Pattern"))
not enough horizontal space to display frequencies
Variables sorted by number of missings:
Variable Count
Petal.Width 0.13333333
Sepal.Width 0.10666667
Petal.Length 0.10666667
Species 0.08000000
Sepal.Length 0.07333333
Using marginplot
to visualize missing data for the Sepal.Width
and Sepal.Length
variables:
marginplot(iris.mis[c(1,2)])
Multivariate Imputation by Chained Equations (MICE)
In order to impute missing values with MICE, we use the mice
package Depending on how big your data set is, this can take a while (30 sec to a few hours), so be prepared to wait.
imputed_Data <- mice(iris.mis, m=5, maxit = 50, method = 'pmm', seed = 500)
m: the number of imputations made per missing observation (5 is normal–generates 5 data sets with imputed/original values)
maxit: the number of iterations?
method: We use ’probable means ?? seed: Values to randomly generate from??
We can get a summary of the data here:
summary(imputed_Data)
Multiply imputed data set
Call:
mice(data = iris.mis, m = 5, method = "pmm", maxit = 50, seed = 500)
Number of multiple imputations: 5
Missing cells per column:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
11 16 16 20 12
Imputation methods:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"pmm" "pmm" "pmm" "pmm" "pmm"
VisitSequence:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 2 3 4 5
PredictorMatrix:
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 0 1 1 1
Sepal.Width 1 0 1 1
Petal.Length 1 1 0 1
Petal.Width 1 1 1 0
Species 1 1 1 1
Species
Sepal.Length 1
Sepal.Width 1
Petal.Length 1
Petal.Width 1
Species 0
Random generator seed value: 500
To check all 5 sets of imputed values for a given variable (such as Sepal.Width), run the following:
imputed_Data$imp$Sepal.Width
1 <dbl> | 2 <dbl> | 3 <dbl> | 4 <dbl> | 5 <dbl> | |
---|---|---|---|---|---|
5 | 3.4 | 3.6 | 3.1 | 3.8 | 3.5 |
11 | 3.5 | 3.7 | 4.1 | 3.7 | 3.7 |
28 | 3.4 | 3.7 | 3.0 | 4.1 | 3.8 |
43 | 3.2 | 3.1 | 3.1 | 3.1 | 3.1 |
52 | 2.9 | 2.9 | 3.4 | 3.4 | 3.0 |
56 | 2.9 | 2.7 | 3.0 | 2.5 | 2.6 |
57 | 3.0 | 2.9 | 2.8 | 2.7 | 2.7 |
66 | 3.0 | 3.4 | 3.0 | 2.8 | 3.1 |
86 | 3.4 | 2.8 | 2.7 | 3.2 | 2.9 |
100 | 3.0 | 2.6 | 2.6 | 3.0 | 2.9 |
Plot sepal width against all other categories
xyplot(imputed_Data,Sepal.Width ~ Sepal.Length + Petal.Width,pch=18,cex=1)
densityplot(imputed_Data)
stripplot(imputed_Data, pch = 20, cex = 1.2)
Let’s use the third iteration:
completeData <- complete(imputed_Data, 3)
head(completeData)
Sepal.Length <dbl> | Sepal.Width <dbl> | Petal.Length <dbl> | Petal.Width <dbl> | Species <fctr> | |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.1 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
We can use all 5 imputed data sets to build predictive model. This only works for data where you expect some type of linear relationship.
fit <- with(data = imputed_Data, exp = lm(Sepal.Width ~ Sepal.Length + Petal.Width))
Then pool results of the predictive model to see how good a fit the imputed data sets are.
combine <- pool(fit)
summary(combine)
est se t df
(Intercept) 1.8090186 0.32544437 5.558611 129.1517
Sepal.Length 0.3098584 0.06673043 4.643434 130.4314
Petal.Width -0.4639563 0.07382371 -6.284652 109.1799
Pr(>|t|) lo 95 hi 95 nmis fmi
(Intercept) 1.493627e-07 1.1651261 2.4529111 NA 0.06000905
Sepal.Length 8.250837e-06 0.1778443 0.4418724 11 0.05735028
Petal.Width 6.902011e-09 -0.6102697 -0.3176428 20 0.09771899
lambda
(Intercept) 0.04556452
Sepal.Length 0.04300592
Petal.Width 0.08134066
Impute missing values, using all parameters as default values
iris.imp <- missForest(iris.mis)
missForest iteration 1 in progress...done!
missForest iteration 2 in progress...done!
missForest iteration 3 in progress...done!
missForest iteration 4 in progress...done!
missForest iteration 5 in progress...done!
missForest iteration 6 in progress...done!
Check imputed values by data set
iris.imp$ximp
Sepal.Length <dbl> | Sepal.Width <dbl> | Petal.Length <dbl> | Petal.Width <dbl> | Species <fctr> |
---|---|---|---|---|
5.100000 | 3.500000 | 1.400000 | 0.2000000 | setosa |
4.900000 | 3.000000 | 1.400000 | 0.2000000 | setosa |
4.700000 | 3.200000 | 1.300000 | 0.2085833 | setosa |
4.600000 | 3.100000 | 1.500000 | 0.2000000 | setosa |
5.000000 | 3.273020 | 1.400000 | 0.2000000 | setosa |
5.400000 | 3.900000 | 1.700000 | 0.4000000 | setosa |
4.600000 | 3.400000 | 1.400000 | 0.1978136 | setosa |
5.000000 | 3.400000 | 1.465890 | 0.2000000 | setosa |
4.400000 | 2.900000 | 1.400000 | 0.2000000 | setosa |
4.900000 | 3.100000 | 1.500000 | 0.1000000 | setosa |
Check for imputation error:
iris.imp$OOBerror
NRMSE PFC
0.13637876 0.04347826
NRMSE is normalized mean squared error. It is used to represent error derived from imputing continuous values. PFC (proportion of falsely classified) is used to represent error derived from imputing categorical values.
Compare actual data with imputed data set to see error:
iris.err <- mixError(iris.imp$ximp, iris.mis, iris)
iris.err
NRMSE PFC
0.162452 0.000000
Hmisc
has several functions and methods available to impute values.
Impute values based on the mean of observations:
iris.mis$imputed_age <- with(iris.mis, impute(Sepal.Length, mean))
head(iris.mis$imputed_age)
[1] 5.1 4.9 4.7 4.6 5.0 5.4
Impute values using randomly generated numbers:
iris.mis$imputed_age2 <- with(iris.mis, impute(Sepal.Length, 'random'))
head(iris.mis$imputed_age2)
[1] 5.1 4.9 4.7 4.6 5.0 5.4
(You can also use min, max, median to impute missing value)
Using the argImpute
function, Hmisc
performs multiple imputation using bootstraping and predictive mean matching. Different bootstrap resamples are used for each of the multiple imputations. Then a flexible additive model is used to predict missing values from nonmissing values. It acheives this by checking the fit of bootstrapped samples based to a predictive model (default) based on the original data.
Predictive mean matching works well for continuous and categorical (binary & multi-level) without the need for computing residuals and maximum likelihood fit.
impute_arg <- aregImpute(~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width + Species, data = iris.mis, n.impute = 5)
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
formula: The ~ Sepal.Length + … argument indicates what formula to use. In this case, we want all variables’ missing data to be imputed, so we add each one.
data: the data frame with missing data.
n.impute: number of multiple imputations. 5 is frequently used, but 10 or more doesn’t hurt
Note: argImpute() automatically identifies the variable type and treats them accordingly.
See a summary of imputed values:
impute_arg
Multiple Imputation using Bootstrap and PMM
aregImpute(formula = ~Sepal.Length + Sepal.Width + Petal.Length +
Petal.Width + Species, data = iris.mis, n.impute = 5)
n: 150 p: 5 Imputations: 5 nk: 3
Number of NAs:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
11 16 16 20 12
type d.f.
Sepal.Length s 2
Sepal.Width s 2
Petal.Length s 2
Petal.Width s 2
Species c 2
Transformation of Target Variables Forced to be Linear
R-squares for Predicting Non-Missing Values for Each Variable
Using Last Imputations of Predictors
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
0.862 0.796 0.982 0.956 0.986
The R square values indicate the liklihood that the predicted missing values in the ‘irt’ dataframe match what we would actually observe if participants had answered all questions. Higher values are better.
Check imputed values for variable Sepal.Length
:
impute_arg$imputed$Sepal.Length
[,1] [,2] [,3] [,4] [,5]
19 5.8 5.6 5.4 5.4 5.6
24 5.0 5.4 4.6 5.0 5.1
27 4.8 5.3 4.8 5.0 4.8
33 5.6 5.5 5.8 5.6 5.6
69 5.1 5.5 5.6 6.0 5.5
86 6.5 6.3 5.9 5.7 5.8
96 5.4 5.6 6.3 6.4 5.6
106 5.6 6.4 5.8 5.6 6.9
112 6.1 5.9 6.4 6.0 5.8
120 5.7 6.0 5.9 5.5 6.3
135 6.9 6.7 6.8 6.3 6.4
To use one of the iterations in our original data set, we can use the transcend
function:
completeData2 <- impute.transcan(impute_arg, imputation=1, data=iris.mis, list.out=TRUE,pr=FALSE, check=FALSE)
head(completeData2)
$Sepal.Length
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8
[13] 4.8 4.3 5.8 5.7 5.4 5.1 5.8* 5.1 5.4 5.1 4.6 5.0*
[25] 4.8 5.0 4.8* 5.2 5.2 4.7 4.8 5.4 5.6* 5.5 4.9 5.0
[37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6
[49] 5.3 5.0 7.0 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2
[61] 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 5.1* 5.6 5.9 6.1
[73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0
[85] 5.4 6.5* 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.4*
[97] 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 5.6* 4.9 7.3
[109] 6.7 7.2 6.5 6.1* 6.8 5.7 5.8 6.4 6.5 7.7 7.7 5.7*
[121] 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9
[133] 6.4 6.3 6.9* 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
[145] 6.7 6.7 6.3 6.5 6.2 5.9
$Sepal.Width
[1] 3.5 3.0 3.2 3.1 3.6* 3.9 3.4 3.4 2.9 3.1 3.5* 3.4
[13] 3.0 3.0 4.0 4.4 3.9 3.5 3.8 3.8 3.4 3.7 3.6 3.3
[25] 3.4 3.0 3.4 3.2* 3.4 3.2 3.1 3.4 4.1 4.2 3.1 3.2
[37] 3.5 3.6 3.0 3.4 3.5 2.3 2.9* 3.5 3.8 3.0 3.8 3.2
[49] 3.7 3.3 3.2 2.9* 3.1 2.3 2.8 3.0* 2.8* 2.4 2.9 2.7
[61] 2.0 3.0 2.2 2.9 2.9 2.5* 3.0 2.7 2.2 2.5 3.2 2.8
[73] 2.5 2.8 2.9 3.0 2.8 3.0 2.9 2.6 2.4 2.4 2.7 2.7
[85] 3.0 3.3* 3.1 2.3 3.0 2.5 2.6 3.0 2.6 2.3 2.7 3.0
[97] 2.9 2.9 2.5 2.6* 3.3 2.7 3.0 2.9 3.2* 3.0 2.5 2.9
[109] 2.5 3.6 3.2* 2.7 3.0 2.7* 2.8 3.2 3.0 3.8 3.2* 2.2
[121] 3.2 2.8 2.8 2.7 3.3 3.2 2.8 3.0 2.8 3.0 2.8 3.8
[133] 2.8 2.8 2.6 3.2* 3.4 3.1 3.0 3.1 3.1 3.1 2.7 3.2
[145] 3.3 3.6* 2.5 3.0 3.4 3.0
$Petal.Length
[1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.6* 1.4 1.5 1.5 1.6
[13] 1.4 1.1 1.2 1.5 1.3 1.4 1.7 1.5 1.7 1.5 1.0 1.7
[25] 1.6* 1.6 1.6 1.5* 1.4 1.6 1.6 1.5 1.4* 1.4 1.5 1.3*
[37] 1.3 1.4 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4
[49] 1.5 1.4 4.7 4.5 4.9 4.0 4.9* 4.5 4.7 3.3 4.6 3.9
[61] 3.7* 4.2 4.0 4.7 3.6 4.4 4.5 4.1 4.5 3.9 4.8 4.5*
[73] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 4.1* 3.8 3.7 3.9 5.1
[85] 4.5 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0 3.3 4.2 4.2
[97] 4.2 4.3 3.0 4.1 6.0 5.1 5.9 5.6 5.8 5.0* 4.5 6.3
[109] 5.8 6.1 5.8* 5.3 5.5 5.0 5.1 5.3 5.5 6.7 6.9 5.0
[121] 5.1* 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.1*
[133] 5.3* 5.1 5.6 6.9* 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9
[145] 5.7 5.2 5.5* 5.2 5.4 5.1
$Petal.Width
[1] 0.2 0.2 0.2* 0.2 0.2 0.4 0.4* 0.2 0.2 0.1 0.2 0.2
[13] 0.1 0.1 0.2* 0.4 0.4 0.3 0.3 0.3 0.2 0.4 0.2 0.5
[25] 0.4* 0.3* 0.4 0.2 0.2 0.2 0.2* 0.4 0.1 0.4* 0.2 0.2
[37] 0.2 0.1 0.2 0.2 0.4* 0.3 0.2 0.6 0.4 0.3 0.5* 0.2
[49] 0.5* 0.2 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.0 1.3 1.4
[61] 1.0 1.5* 1.0 1.4 1.3 1.4 1.5 1.0 1.5 1.1 1.8 1.5*
[73] 1.5 1.3* 1.3* 1.4 1.4 1.7 1.5 1.0 1.1 1.0 1.2 1.6
[85] 1.5 1.6 1.5 1.3 1.3 1.3 1.2 1.4 1.2 1.0 1.3 1.5*
[97] 1.3 1.3* 1.0* 1.3 2.5 1.8* 2.1 1.8 2.2 2.1 1.7 1.8
[109] 1.8 2.5 2.0 1.9 2.1 2.0 2.4 2.3 1.8 2.2 2.3 1.5
[121] 2.3 2.0 2.0 1.8 2.1 1.8 1.8 2.0* 2.4* 1.6 1.9 2.0
[133] 2.2 1.5 1.4 2.3 2.4 1.8 1.8 2.1 2.4 2.3 1.9 2.3
[145] 2.5 2.3 1.9 2.0 2.3 1.8
$Species
[1] setosa setosa setosa setosa setosa
[6] setosa setosa setosa setosa setosa
[11] setosa setosa setosa setosa setosa
[16] setosa setosa setosa setosa setosa
[21] setosa setosa setosa setosa setosa
[26] setosa* setosa setosa setosa setosa
[31] setosa setosa setosa setosa setosa
[36] setosa setosa setosa setosa setosa
[41] setosa setosa setosa setosa setosa
[46] setosa setosa setosa setosa setosa
[51] versicolor versicolor versicolor* versicolor virginica*
[56] versicolor versicolor versicolor versicolor versicolor
[61] versicolor versicolor versicolor versicolor versicolor
[66] versicolor versicolor versicolor virginica* versicolor
[71] versicolor versicolor versicolor versicolor versicolor
[76] versicolor* versicolor versicolor versicolor versicolor*
[81] versicolor versicolor versicolor versicolor versicolor
[86] versicolor versicolor versicolor versicolor versicolor
[91] versicolor versicolor versicolor versicolor versicolor
[96] versicolor versicolor versicolor versicolor versicolor
[101] virginica virginica virginica virginica virginica
[106] virginica virginica virginica* virginica virginica
[111] virginica virginica* virginica virginica virginica
[116] virginica virginica virginica virginica* virginica
[121] virginica virginica virginica virginica virginica
[126] virginica* virginica virginica virginica virginica
[131] virginica virginica virginica virginica virginica
[136] virginica virginica virginica virginica* virginica
[141] virginica virginica virginica* virginica virginica
[146] virginica virginica virginica virginica virginica
It conveniently puts an asterisk next to each imputed value.
Now we can create a fit model using ols (linear model) from the rms package (Regression Modeling Strategies)
fmi <- fit.mult.impute(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, ols, impute_arg, data=iris.mis)
Variance Inflation Factors Due to Imputation:
Intercept Sepal.Width Petal.Length
1.24 1.17 1.81
Petal.Width Species=versicolor Species=virginica
1.54 1.63 1.95
Rate of Missing Information:
Intercept Sepal.Width Petal.Length
0.19 0.15 0.45
Petal.Width Species=versicolor Species=virginica
0.35 0.39 0.49
d.f. for t-distribution for Tests of Single Coefficients:
Intercept Sepal.Width Petal.Length
109.87 183.76 20.07
Petal.Width Species=versicolor Species=virginica
32.92 26.52 16.88
The following fit components were averaged over the 5 model fits:
fitted.values stats linear.predictors
This tutorial synthesizes some of the research articles and online tutorials available on Multiple Imputation Analysis. However, you can learn more from some of the references below.
MICE: if you still have NAs in your dataset, it’s possible that you need to increase your maxit number https://stackoverflow.com/questions/20947908/imputation-mice-in-r-still-na-left-in-dataset
“Missing data” Visualizations
https://datascienceplus.com/imputing-missing-data-with-r-mice-package/
beginner guide
https://datascienceplus.com/handling-missing-data-with-mice-package-a-simple-approach/
Explanations of MICE and types of missing data
https://www.kdnuggets.com/2017/09/missing-data-imputation-using-r.html
Boone, W. J. (2016). Rasch Analysis for Instrument Development: Why, When, and How? CBE Life Sciences Education, 15(4), rm4. http://doi.org/10.1187/cbe.16-04-0148
Finch, H. (2008). Estimation of item response theory parameters in the presence of missing data. Journal of Educational Measurement, 45(3), 225–245.
Ambler G., Omar R. Z., Royston P. (2007). A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Statistical Methods in Medical Research, 16, 277–298.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Hoboken, NJ: John Wiley and Sons, Inc.
Sulis, I., & Porcu, M. (2017). Handling Missing Data in Item Response Theory. Assessing the Accuracy of a Multiple Imputation Procedure Based on Latent Class Analysis. Journal of Classification, 34(2), 327–359. https://doi.org/10.1007/s00357-017-9220-3