Package 'latent2likert' reference manual

Title:	Converting Latent Variables into Likert Scale Responses
Description:	Effectively simulates the discretization process inherent to Likert scales while minimizing distortion. It converts continuous latent variables into ordinal categories to generate Likert scale item responses. Particularly useful for accurately modeling and analyzing survey data that use Likert scales, especially when applying statistical techniques that require metric data.
Authors:	Marko Lalovic [aut, cre]
Maintainer:	Marko Lalovic <[email protected]>
License:	MIT + file LICENSE
Version:	1.2.2
Built:	2025-03-08 05:01:51 UTC
Source:	https://github.com/markolalovic/latent2likert

Discretize Density

Description

Transforms the density function of a continuous random variable into a discrete probability distribution with minimal distortion using the Lloyd-Max algorithm.

Usage

discretize_density(density_fn, n_levels, eps = 1e-06)
discretize_density(density_fn, n_levels, eps = 1e-06)

Arguments

`density_fn`	probability density function.
`n_levels`	cardinality of the set of all possible outcomes.
`eps`	convergence threshold for the algorithm.

Details

The function addresses the problem of transforming a continuous random variable $X$ into a discrete random variable $Y$ with minimal distortion. Distortion is measured as mean-squared error (MSE):

$\text{E}\left[ (X - Y)^2 \right] = \sum_{k=1}^{K} \int_{x_{k-1}}^{x_{k}} f_{X}(x) \left( x - r_{k} \right)^2 \, dx$

where:

$f_{X}$: is the probability density function of $X$ ,
$K$: is the number of possible outcomes of $Y$ ,
$x_{k}$: are endpoints of intervals that partition the domain of $X$ ,
$r_{k}$: are representation points of the intervals.

This problem is solved using the following iterative procedure:

$1.$: Start with an arbitrary initial set of representation points: $r_{1} < r_{2} < \dots < r_{K}$ .
$2.$: Repeat the following steps until the improvement in MSE falls below given $\varepsilon$ .
$3.$: Calculate endpoints as $x_{k} = (r_{k+1} + r_{k})/2$ for each $k = 1, \dots, K-1$ and set $x_{0}$ and $x_{K}$ to $-\infty$ and $\infty$ , respectively.
$4.$: Update representation points by setting $r_{k}$ equal to the conditional mean of $X$ given $X \in (x_{k-1}, x_{k})$ for each $k = 1, \dots, K$ .

With each execution of step $(3)$ and step $(4)$ , the MSE decreases or remains the same. As MSE is nonnegative, it approaches a limit. The algorithm terminates when the improvement in MSE is less than a given $\varepsilon > 0$ , ensuring convergence after a finite number of iterations.

This procedure is known as Lloyd-Max's algorithm, initially used for scalar quantization and closely related to the k-means algorithm. Local convergence has been proven for log-concave density functions by Kieffer. Many common probability distributions are log-concave including the normal and skew normal distribution, as shown by Azzalini.

Value

A list containing:

prob: discrete probability distribution.
endp: endpoints of intervals that partition the continuous domain.
repr: representation points of the intervals.
dist: distortion measured as the mean-squared error (MSE).

References

Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178.

Kieffer, J. (1983). Uniqueness of locally optimal quantizer for log-concave density and convex error function. IEEE Transactions on Information Theory 29, 42–47.

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), 129–137.

Examples

discretize_density(density_fn = stats::dnorm, n_levels = 5)
discretize_density(density_fn = function(x) {
  2 * stats::dnorm(x) * stats::pnorm(0.5 * x)
}, n_levels = 4)
discretize_density(density_fn = stats::dnorm, n_levels = 5)
discretize_density(density_fn = function(x) {
  2 * stats::dnorm(x) * stats::pnorm(0.5 * x)
}, n_levels = 4)

Estimate mean and standard deviation

Description

Estimates the mean and standard deviation of a latent variable given the discrete probabilities of its observed Likert scale responses.

Usage

estimate_mean_and_sd(prob, n_levels, skew = 0, eps = 1e-06, maxit = 100)
estimate_mean_and_sd(prob, n_levels, skew = 0, eps = 1e-06, maxit = 100)

Arguments

`prob`	named vector of probabilities for each response category.
`n_levels`	number of response categories for the Likert scale item.
`skew`	marginal skewness of the latent variable, defaults to 0.
`eps`	tolerance for convergence, defaults to 1e-6.
`maxit`	maximum number of iterations, defaults to 100.

Details

This function uses an iterative algorithm to solve the system of non-linear equations that describe the relationship between the continuous latent variable and the observed discrete probability distribution of Likert scale responses. The algorithm ensures stability by reparameterizing the system and applying constraints to prevent stepping into invalid regions.

Value

A numeric vector with two elements: the estimated mean and standard deviation.

Examples

prob <- c("1" = 0.313, "2" = 0.579, "3" = 0.105, "4" = 0.003)
# returns estimates that are close to the actual mean and sd: c(-1, 0.5)
estimate_mean_and_sd(prob, 5)

prob <- c("1" = 0.313, "2" = 0.579, "3" = 0.105, "4" = 0.003)
# returns estimates that are close to the actual mean and sd: c(-1, 0.5)
estimate_mean_and_sd(prob, 5)

Estimate Latent Parameters

Description

Estimates the location and scaling parameters of the latent variables from existing survey data.

Usage

estimate_params(data, n_levels, skew = 0)
estimate_params(data, n_levels, skew = 0)

Arguments

`data`	survey data with columns representing individual items. Apart from this, `data` can be of almost any class such as "data.frame" "matrix" or "array".
`n_levels`	number of response categories, a vector or a number.
`skew`	marginal skewness of latent variables, defaults to 0.

Details

The relationship between the continuous random variable $X$ and the discrete probability distribution $p_k$ , for $k = 1, \dots, K$ , can be described by a system of non-linear equations:

$p_{k} = F_{X}\left( \frac{x_{k - 1} - \xi}{\omega} \right) - F_{X}\left( \frac{x_{k} - \xi}{\omega} \right) \quad \text{for} \ k = 1, \dots, K$

where:

$F_{X}$: is the cumulative distribution function of $X$ ,
$K$: is the number of possible response categories,
$x_{k}$: are the endpoints defining the boundaries of the response categories,
$p_{k}$: is the probability of the $k$ -th response category,
$\xi$: is the location parameter of $X$ ,
$\omega$: is the scaling parameter of $X$ .

The endpoints $x_{k}$ are calculated by discretizing a random variable $Z$ with mean 0 and standard deviation 1 that follows the same distribution as $X$ . By solving the above system of non-linear equations iteratively, we can find the parameters that best fit the observed discrete probability distribution $p_{k}$ .

The function estimate_params:

Computes the proportion table of the responses for each item.
Estimates the probabilities $p_{k}$ for each item.
Computes the estimates of $\xi$ and $\omega$ for each item.
Combines the estimated parameters for all items into a table.

Value

A table of estimated parameters for each latent variable.

Examples

data(part_bfi)
vars <- c("A1", "A2", "A3", "A4", "A5")
estimate_params(data = part_bfi[, vars], n_levels = 6)
data(part_bfi)
vars <- c("A1", "A2", "A3", "A4", "A5")
estimate_params(data = part_bfi[, vars], n_levels = 6)

Agreeableness and Gender Data

Description

This dataset is a cleaned up version of a small part of bfi dataset from psychTools package. It contains responses to the first 5 items of the agreeableness scale from the International Personality Item Pool (IPIP) and the gender attribute. It includes responses from 2800 subjects. Each item was answered on a six point Likert scale ranging from 1 (very inaccurate), to 6 (very accurate). Gender was coded as 0 for male and 1 for female. Missing values were addressed using mode imputation.

Usage

data(part_bfi)
data(part_bfi)

Format

An object of class "data.frame" with 2800 observations on the following 6 variables:

A1: Am indifferent to the feelings of others.
A2: Inquire about others' well-being.
A3: Know how to comfort others.
A4: Love children.
A5: Make people feel at ease.
gender: Gender of the respondent.

Source

International Personality Item Pool (https://ipip.ori.org)

https://search.r-project.org/CRAN/refmans/psychTools/html/bfi.html

References

Revelle, W. (2024). Psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University. https://CRAN.R-project.org/package=psych

Examples

data(part_bfi)
head(part_bfi)
data(part_bfi)
head(part_bfi)

Plot Transformation

Description

Plots the densities of latent variables and the corresponding transformed discrete probability distributions.

Usage

plot_likert_transform(n_items, n_levels, mean = 0, sd = 1, skew = 0)
plot_likert_transform(n_items, n_levels, mean = 0, sd = 1, skew = 0)

Arguments

`n_items`	number of Likert scale items (questions).
`n_levels`	number of response categories for each Likert item. Integer or vector of integers.
`mean`	means of the latent variables. Numeric or vector of numerics. Defaults to 0.
`sd`	standard deviations of the latent variables. Numeric or vector of numerics. Defaults to 1.
`skew`	marginal skewness of the latent variables. Numeric or vector of numerics. Defaults to 0.

Value

NULL. The function produces a plot.

Examples

plot_likert_transform(n_items = 3, n_levels = c(3, 4, 5))
plot_likert_transform(n_items = 3, n_levels = 5, mean = c(0, 1, 2))
plot_likert_transform(n_items = 3, n_levels = 5, sd = c(0.8, 1, 1.2))
plot_likert_transform(n_items = 3, n_levels = 5, skew = c(-0.5, 0, 0.5))
plot_likert_transform(n_items = 3, n_levels = c(3, 4, 5))
plot_likert_transform(n_items = 3, n_levels = 5, mean = c(0, 1, 2))
plot_likert_transform(n_items = 3, n_levels = 5, sd = c(0.8, 1, 1.2))
plot_likert_transform(n_items = 3, n_levels = 5, skew = c(-0.5, 0, 0.5))

Calculate Response Proportions

Description

Returns a table of proportions for each possible response category.

Usage

response_prop(data, n_levels)
response_prop(data, n_levels)

Arguments

`data`	numeric vector or matrix of responses.
`n_levels`	number of response categories.

Value

A table of response category proportions.

Examples

data <- c(1, 2, 2, 3, 3, 3)
response_prop(data, n_levels = 3)

data_matrix <- matrix(c(1, 2, 2, 3, 3, 3), ncol = 2)
response_prop(data_matrix, n_levels = 3)
data <- c(1, 2, 2, 3, 3, 3)
response_prop(data, n_levels = 3)

data_matrix <- matrix(c(1, 2, 2, 3, 3, 3), ncol = 2)
response_prop(data_matrix, n_levels = 3)

Generate Random Responses

Description

Generates an array of random responses to Likert-type questions based on specified latent variables.

Usage

rlikert(size, n_items, n_levels, mean = 0, sd = 1, skew = 0, corr = 0)
rlikert(size, n_items, n_levels, mean = 0, sd = 1, skew = 0, corr = 0)

Arguments

`size`	number of observations.
`n_items`	number of Likert scale items (number of questions).
`n_levels`	number of response categories for each item. Integer or vector of integers.
`mean`	means of the latent variables. Numeric or vector of numerics. Defaults to 0.
`sd`	standard deviations of the latent variables. Numeric or vector of numerics. Defaults to 1.
`skew`	marginal skewness of the latent variables. Numeric or vector of numerics. Defaults to 0.
`corr`	correlations between latent variables. Can be a single numeric value representing the same correlation for all pairs, or an actual correlation matrix. Defaults to 0.

Value

A matrix of random responses with dimensions size by n_items. The column names are Y1, Y2, ..., Yn where n is the number of items. Each entry in the matrix represents a Likert scale response, ranging from 1 to n_levels.

Examples

# Generate responses for a single item with 5 levels
rlikert(size = 10, n_items = 1, n_levels = 5)

# Generate responses for three items with different levels and parameters
rlikert(
  size = 10, n_items = 3, n_levels = c(4, 5, 6),
  mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = 0.5
)

# Generate responses with a correlation matrix
corr <- matrix(c(
  1.00, -0.63, -0.39,
  -0.63, 1.00, 0.41,
  -0.39, 0.41, 1.00
), nrow = 3)
data <- rlikert(
  size = 1000, n_items = 3, n_levels = c(4, 5, 6),
  mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = corr
)

# Generate responses for a single item with 5 levels
rlikert(size = 10, n_items = 1, n_levels = 5)

# Generate responses for three items with different levels and parameters
rlikert(
  size = 10, n_items = 3, n_levels = c(4, 5, 6),
  mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = 0.5
)

# Generate responses with a correlation matrix
corr <- matrix(c(
  1.00, -0.63, -0.39,
  -0.63, 1.00, 0.41,
  -0.39, 0.41, 1.00
), nrow = 3)
data <- rlikert(
  size = 1000, n_items = 3, n_levels = c(4, 5, 6),
  mean = c(0, -1, 0), sd = c(0.8, 1, 1), corr = corr
)

Simulate Likert Scale Item Responses

Description

Simulates Likert scale item responses based on a specified number of response categories and the centered parameters of the latent variable.

Usage

simulate_likert(n_levels, cp)
simulate_likert(n_levels, cp)

Arguments

`n_levels`	number of response categories for the Likert scale item.
`cp`	centered parameters of the latent variable. Named vector including mean (`mu`), standard deviation (`sd`), and skewness (`skew`). Skewness must be between -0.95 and 0.95.

Details

The simulation process uses the following model detailed by Boari and Nai-Ruscone. Let $X$ be the continuous variable of interest, measured using Likert scale questions with $K$ response categories. The observed discrete variable $Y$ is defined as follows:

$Y = k, \quad \text{ if } \ \ x_{k - 1} < X \leq x_{k} \quad \text{ for } \ \ k = 1, \dots, K$

where $x_{k}$ , $k = 0, \dots, K$ are endpoints defined in the domain of $X$ such that:

$-\infty = x_{0} < x_{1} < \dots < x_{K - 1} < x_{K} = \infty.$

The endpoints dictate the transformation of the density $f_{X}$ of $X$ into a discrete probability distribution:

$\text{Pr}(Y = k) = \int_{x_{k - 1}}^{x_{k}} f_{X}(x) \, dx \quad \text{ for } \ \ k = 1, \dots, K.$

The continuous latent variable is modeled using a skew normal distribution. The function simulate_likert performs the following steps:

Ensures the centered parameters are within the acceptable range.
Converts the centered parameters to direct parameters.
Defines the density function for the skew normal distribution.
Computes the probabilities for each response category using optimal endpoints.

Value

A named vector of probabilities for each response category.

References

Boari, G. and Nai Ruscone, M. (2015). A procedure simulating Likert scale item responses. Electronic Journal of Applied Statistical Analysis 8(3), 288–297. doi:10.1285/i20705948v8n3p288

Examples

cp <- c(mu = 0, sd = 1, skew = 0.5)
simulate_likert(n_levels = 5, cp = cp)
cp2 <- c(mu = 1, sd = 2, skew = -0.3)
simulate_likert(n_levels = 7, cp = cp2)
cp <- c(mu = 0, sd = 1, skew = 0.5)
simulate_likert(n_levels = 5, cp = cp)
cp2 <- c(mu = 1, sd = 2, skew = -0.3)
simulate_likert(n_levels = 7, cp = cp2)

Package 'latent2likert'

Help Index

Discretize Density

Description

Usage

Arguments

Details

Value

References

Examples

Estimate mean and standard deviation

Description

Usage

Arguments

Details

Value

Examples

Estimate Latent Parameters

Description

Usage

Arguments

Details

Value

See Also

Examples

Agreeableness and Gender Data

Description

Usage

Format

Source

References

Examples

Plot Transformation

Description

Usage

Arguments

Value

Examples

Calculate Response Proportions

Description

Usage

Arguments

Value

Examples

Generate Random Responses

Description

Usage

Arguments

Value

Examples

Simulate Likert Scale Item Responses

Description

Usage

Arguments

Details

Value

References

See Also

Examples