Title: | Converting Latent Variables into Likert Scale Responses |
---|---|
Description: | Effectively simulates the discretization process inherent to Likert scales while minimizing distortion. It converts continuous latent variables into ordinal categories to generate Likert scale item responses. Particularly useful for accurately modeling and analyzing survey data that use Likert scales, especially when applying statistical techniques that require metric data. |
Authors: | Marko Lalovic [aut, cre] |
Maintainer: | Marko Lalovic <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.2 |
Built: | 2025-03-08 05:01:51 UTC |
Source: | https://github.com/markolalovic/latent2likert |
Transforms the density function of a continuous random variable into a discrete probability distribution with minimal distortion using the Lloyd-Max algorithm.
discretize_density(density_fn, n_levels, eps = 1e-06)
discretize_density(density_fn, n_levels, eps = 1e-06)
density_fn |
probability density function. |
n_levels |
cardinality of the set of all possible outcomes. |
eps |
convergence threshold for the algorithm. |
The function addresses the problem of transforming a continuous random
variable into a discrete random variable
with minimal
distortion. Distortion is measured as mean-squared error (MSE):
where:
is the probability density function of ,
is the number of possible outcomes of ,
are endpoints of intervals that partition the domain
of ,
are representation points of the intervals.
This problem is solved using the following iterative procedure:
Start with an arbitrary initial set of representation
points: .
Repeat the following steps until the improvement in MSE
falls below given .
Calculate endpoints as
for each
and set
and
to
and
, respectively.
Update representation points by setting
equal to the conditional mean of
given
for each
.
With each execution of step and step
, the MSE decreases
or remains the same. As MSE is nonnegative, it approaches a limit.
The algorithm terminates when the improvement in MSE is less than a given
, ensuring convergence after a finite number
of iterations.
This procedure is known as Lloyd-Max's algorithm, initially used for scalar quantization and closely related to the k-means algorithm. Local convergence has been proven for log-concave density functions by Kieffer. Many common probability distributions are log-concave including the normal and skew normal distribution, as shown by Azzalini.
A list containing:
discrete probability distribution.
endpoints of intervals that partition the continuous domain.
representation points of the intervals.
distortion measured as the mean-squared error (MSE).
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178.
Kieffer, J. (1983). Uniqueness of locally optimal quantizer for log-concave density and convex error function. IEEE Transactions on Information Theory 29, 42–47.
Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), 129–137.
discretize_density(density_fn = stats::dnorm, n_levels = 5) discretize_density(density_fn = function(x) { 2 * stats::dnorm(x) * stats::pnorm(0.5 * x) }, n_levels = 4)
discretize_density(density_fn = stats::dnorm, n_levels = 5) discretize_density(density_fn = function(x) { 2 * stats::dnorm(x) * stats::pnorm(0.5 * x) }, n_levels = 4)
Estimates the mean and standard deviation of a latent variable given the discrete probabilities of its observed Likert scale responses.
estimate_mean_and_sd(prob, n_levels, skew = 0, eps = 1e-06, maxit = 100)
estimate_mean_and_sd(prob, n_levels, skew = 0, eps = 1e-06, maxit = 100)
prob |
named vector of probabilities for each response category. |
n_levels |
number of response categories for the Likert scale item. |
skew |
marginal skewness of the latent variable, defaults to 0. |
eps |
tolerance for convergence, defaults to 1e-6. |
maxit |
maximum number of iterations, defaults to 100. |
This function uses an iterative algorithm to solve the system of non-linear equations that describe the relationship between the continuous latent variable and the observed discrete probability distribution of Likert scale responses. The algorithm ensures stability by reparameterizing the system and applying constraints to prevent stepping into invalid regions.
A numeric vector with two elements: the estimated mean and standard deviation.
prob <- c("1" = 0.313, "2" = 0.579, "3" = 0.105, "4" = 0.003) # returns estimates that are close to the actual mean and sd: c(-1, 0.5) estimate_mean_and_sd(prob, 5)
prob <- c("1" = 0.313, "2" = 0.579, "3" = 0.105, "4" = 0.003) # returns estimates that are close to the actual mean and sd: c(-1, 0.5) estimate_mean_and_sd(prob, 5)
Estimates the location and scaling parameters of the latent variables from existing survey data.
estimate_params(data, n_levels, skew = 0)
estimate_params(data, n_levels, skew = 0)
data |
survey data with columns representing individual items.
Apart from this, |
n_levels |
number of response categories, a vector or a number. |
skew |
marginal skewness of latent variables, defaults to 0. |
The relationship between the continuous random variable and the
discrete probability distribution
, for
,
can be described by a system of non-linear equations:
where:
is the cumulative distribution function of ,
is the number of possible response categories,
are the endpoints defining the boundaries of the response categories,
is the probability of the -th
response category,
is the location parameter of ,
is the scaling parameter of .
The endpoints are calculated by discretizing a
random variable
with mean 0 and standard deviation 1 that follows the same
distribution as
.
By solving the above system of non-linear equations iteratively,
we can find the parameters that best fit the observed discrete
probability distribution
.
The function estimate_params
:
Computes the proportion table of the responses for each item.
Estimates the probabilities for each item.
Computes the estimates of and
for each item.
Combines the estimated parameters for all items into a table.
A table of estimated parameters for each latent variable.
discretize_density
for details on calculating
the endpoints, and part_bfi
for example of the survey data.
data(part_bfi) vars <- c("A1", "A2", "A3", "A4", "A5") estimate_params(data = part_bfi[, vars], n_levels = 6)
data(part_bfi) vars <- c("A1", "A2", "A3", "A4", "A5") estimate_params(data = part_bfi[, vars], n_levels = 6)
This dataset is a cleaned up version of a small part of bfi
dataset
from psychTools
package. It contains responses to the first 5 items
of the agreeableness scale from the International Personality Item Pool
(IPIP) and the gender attribute. It includes responses from 2800 subjects.
Each item was answered on a six point Likert scale ranging from
1 (very inaccurate), to 6 (very accurate). Gender was coded as
0 for male and 1 for female. Missing values were addressed using
mode imputation.
data(part_bfi)
data(part_bfi)
An object of class "data.frame"
with 2800 observations on
the following 6 variables:
Am indifferent to the feelings of others.
Inquire about others' well-being.
Know how to comfort others.
Love children.
Make people feel at ease.
Gender of the respondent.
International Personality Item Pool (https://ipip.ori.org)
https://search.r-project.org/CRAN/refmans/psychTools/html/bfi.html
Revelle, W. (2024). Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych
data(part_bfi) head(part_bfi)
data(part_bfi) head(part_bfi)
Plots the densities of latent variables and the corresponding transformed discrete probability distributions.
plot_likert_transform(n_items, n_levels, mean = 0, sd = 1, skew = 0)
plot_likert_transform(n_items, n_levels, mean = 0, sd = 1, skew = 0)
n_items |
number of Likert scale items (questions). |
n_levels |
number of response categories for each Likert item. Integer or vector of integers. |
mean |
means of the latent variables. Numeric or vector of numerics. Defaults to 0. |
sd |
standard deviations of the latent variables. Numeric or vector of numerics. Defaults to 1. |
skew |
marginal skewness of the latent variables. Numeric or vector of numerics. Defaults to 0. |
NULL. The function produces a plot.
plot_likert_transform(n_items = 3, n_levels = c(3, 4, 5)) plot_likert_transform(n_items = 3, n_levels = 5, mean = c(0, 1, 2)) plot_likert_transform(n_items = 3, n_levels = 5, sd = c(0.8, 1, 1.2)) plot_likert_transform(n_items = 3, n_levels = 5, skew = c(-0.5, 0, 0.5))
plot_likert_transform(n_items = 3, n_levels = c(3, 4, 5)) plot_likert_transform(n_items = 3, n_levels = 5, mean = c(0, 1, 2)) plot_likert_transform(n_items = 3, n_levels = 5, sd = c(0.8, 1, 1.2)) plot_likert_transform(n_items = 3, n_levels = 5, skew = c(-0.5, 0, 0.5))
Returns a table of proportions for each possible response category.
response_prop(data, n_levels)
response_prop(data, n_levels)
data |
numeric vector or matrix of responses. |
n_levels |
number of response categories. |
A table of response category proportions.
data <- c(1, 2, 2, 3, 3, 3) response_prop(data, n_levels = 3) data_matrix <- matrix(c(1, 2, 2, 3, 3, 3), ncol = 2) response_prop(data_matrix, n_levels = 3)
data <- c(1, 2, 2, 3, 3, 3) response_prop(data, n_levels = 3) data_matrix <- matrix(c(1, 2, 2, 3, 3, 3), ncol = 2) response_prop(data_matrix, n_levels = 3)
Generates an array of random responses to Likert-type questions based on specified latent variables.
rlikert(size, n_items, n_levels, mean = 0, sd = 1, skew = 0, corr = 0)
rlikert(size, n_items, n_levels, mean = 0, sd = 1, skew = 0, corr = 0)
size |
number of observations. |
n_items |
number of Likert scale items (number of questions). |
n_levels |
number of response categories for each item. Integer or vector of integers. |
mean |
means of the latent variables. Numeric or vector of numerics. Defaults to 0. |
sd |
standard deviations of the latent variables. Numeric or vector of numerics. Defaults to 1. |
skew |
marginal skewness of the latent variables. Numeric or vector of numerics. Defaults to 0. |
corr |
correlations between latent variables. Can be a single numeric value representing the same correlation for all pairs, or an actual correlation matrix. Defaults to 0. |
A matrix of random responses with dimensions size
by
n_items
. The column names are Y1, Y2, ..., Yn
where
n
is the number of items. Each entry in the matrix represents
a Likert scale response, ranging from 1 to n_levels
.
# Generate responses for a single item with 5 levels rlikert(size = 10, n_items = 1, n_levels = 5) # Generate responses for three items with different levels and parameters rlikert( size = 10, n_items = 3, n_levels = c(4, 5, 6), mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = 0.5 ) # Generate responses with a correlation matrix corr <- matrix(c( 1.00, -0.63, -0.39, -0.63, 1.00, 0.41, -0.39, 0.41, 1.00 ), nrow = 3) data <- rlikert( size = 1000, n_items = 3, n_levels = c(4, 5, 6), mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = corr )
# Generate responses for a single item with 5 levels rlikert(size = 10, n_items = 1, n_levels = 5) # Generate responses for three items with different levels and parameters rlikert( size = 10, n_items = 3, n_levels = c(4, 5, 6), mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = 0.5 ) # Generate responses with a correlation matrix corr <- matrix(c( 1.00, -0.63, -0.39, -0.63, 1.00, 0.41, -0.39, 0.41, 1.00 ), nrow = 3) data <- rlikert( size = 1000, n_items = 3, n_levels = c(4, 5, 6), mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = corr )
Simulates Likert scale item responses based on a specified number of response categories and the centered parameters of the latent variable.
simulate_likert(n_levels, cp)
simulate_likert(n_levels, cp)
n_levels |
number of response categories for the Likert scale item. |
cp |
centered parameters of the latent variable.
Named vector including mean ( |
The simulation process uses the following model detailed by
Boari and Nai-Ruscone. Let be the continuous variable of interest,
measured using Likert scale questions with
response categories. The
observed discrete variable
is defined as follows:
where ,
are endpoints defined in the domain
of
such that:
The endpoints dictate the transformation of the density
of
into a discrete probability distribution:
The continuous latent variable is modeled using a skew normal distribution.
The function simulate_likert
performs the following steps:
Ensures the centered parameters are within the acceptable range.
Converts the centered parameters to direct parameters.
Defines the density function for the skew normal distribution.
Computes the probabilities for each response category using optimal endpoints.
A named vector of probabilities for each response category.
Boari, G. and Nai Ruscone, M. (2015). A procedure simulating Likert scale item responses. Electronic Journal of Applied Statistical Analysis 8(3), 288–297. doi:10.1285/i20705948v8n3p288
discretize_density
for details on how to calculate
the optimal endpoints.
cp <- c(mu = 0, sd = 1, skew = 0.5) simulate_likert(n_levels = 5, cp = cp) cp2 <- c(mu = 1, sd = 2, skew = -0.3) simulate_likert(n_levels = 7, cp = cp2)
cp <- c(mu = 0, sd = 1, skew = 0.5) simulate_likert(n_levels = 5, cp = cp) cp2 <- c(mu = 1, sd = 2, skew = -0.3) simulate_likert(n_levels = 7, cp = cp2)