Public Methods and Types · HighDimMixedModels.jl

HighDimMixedModels.Control — Type

Control

Hyperparameters for the coordinate descent algorithm

Fields

tol: Small positive number, default is 1e-4, providing convergence tolerance
seed: Random seed, default 770. Note that the only randomness in the algorithm is during the initialization of fixed effect parameters (for the data splits in the cross validation)
trace: Bool, default false. If true, prints cost and size of active set over the course of the algorithm.
max_iter: Integer, default 1000, giving maximum number of iterations in the coordinate gradient descent.
max_armijo: Integer, default 20, giving the maximum number of steps in the Armijo algorithm. If the maximum is reached, algorithm doesn't update current coordinate and proceeds to the next coordinate
act_num: Integer, default 5. We will only update all of the fixed effect parameters every act_num iterations. Otherwise, we update only the parameters in the current active set.
a₀: a₀ in the Armijo step, default 1.0. See Schelldorfer et al. (2010) for details about this and the next five fields.
δ: δ in the Armijo step, default 0.1.
ρ: ρ in the Armijo step, default 0.001.
γ: γ in the Armijo step, default 0.0.
lower: Lower bound for the Hessian, default 1e-6.
upper: Upper bound for the Hessian, default 1e8.
var_int: Tuple with bounds of interval on which to optimize when updating a diagonal entry of L, default (0, 100). See Optim.jl in section "minimizing a univariate function on a bounded interval"
cov_int: Tuple with bounds of interval on which to optimize the when updating a non-diagonal entry of L, default (-50, 50). See Optim.jl in section "minimizing a univariate function on a bounded interval"
optimize_method: Symbol denoting method for performing the univariate optimization, either :Brent or :GoldenSection, default is :Brent
thres: If an update to a diagonal entry of L is smaller than thres, the parameter is set to 0

source

HighDimMixedModels.HDMModel — Type

HDMModel

Results of a fitted model

Fields

data: NamedTuple containing the input data used for fitting the model
weights: Vector of penalty weights used in the model
init_coef: NamedTuple containing the initial coefficient values
init_log_like: Initial log-likelihood value
init_objective: Initial objective function value
init_nz: Number of non-zero components in the initial estimate of fixed effects
penalty: String indicating the type of penalty used in the model
standardize: Boolean indicating whether the input data was standardized
λ: Regularization hyperparameter
scada: Hyperparameter relevant to the scad penalty
σ²: Estimated variance parameter
L: Lower triangular matrix representing the Cholesky factor of the random effect covariance matrix
fixef: Vector of estimated fixed effects
ranef: vector of g vectors, each of length m, holding random effects BLUPs for each group
fitted: Vector of fitted values, including random effects
resid: Vector of residuals, including random effects
log_like: Log-likelihood value at convergence
objective: Objective function value at convergence
npar: Total number of parameters in the model
nz: Number of non-zero fixed effects
deviance: Deviance value
num_arm: Number of times armijo! needed to be called
arm_con: Number of times the Armijo algorithm failed to converge
aic: Akaike Information Criterion
bic: Bayesian Information Criterion
iterations: Number of iterations performed
ψstr: Assumed structure of the random effect covariance matrix
ψ: Estimated random effect covariance matrix, i.e. L * L'
control: Control object containing hyperparameters that were used for the coordinate descent algorithm

source

HighDimMixedModels.hdmm — Function

hdmm(X::Matrix{<:Real}, G::Matrix{<:Real}, y::Vector{<:Real}, 
     grp::Vector{<:Union{String, Int64}}, Z::Matrix{<:Real}=X; 
     <keyword arguments>)

Fit a penalized linear mixed effect model using the coordinate gradient descent (CGD) algorithm and return a fitted model of type HDMModel.

Arguments

X: Low dimensional (N by q) design matrix for unpenalized fixed effects (first column must be all 1's to fit intercept)
G: High dimensional (N by p) design matrix for penalized fixed effects (should not include column of 1's)
y: Length N response vector
grp: Length N vector with group assignments of each observation
Z=X: Random effects design matrix (N by m), should contain some subset of the columns of X (defaults to equal X)

Keyword:

penalty::String="scad": Either "scad" or "lasso"
standardize::Bool=true: Whether to standardize the columns of G before fitting. The value of λ (and wts) should be chosen accordingly. Coefficient estimates are returned on the original scale.
λ::Real=10.0: Positive number providing the regularization parameter for the penalty
wts::Union{Vector,Nothing}=nothing: If specified, the penalty on covariate j will be λ/wⱼ, so this argument is useful if you want to penalize some covariates more than others.
scada::Real=3.7: Positive number providing the extra tuning parameter for the scad penalty (ignored for lasso)
max_active::Real=length(y)/2: Maximum number of fixed effects estimated non-zero (defaults to half the total sample size)
ψstr::String="diag": One of "ident", "diag", or "sym", specifying the structure of the random effects' covariance matrix
init_coef::Union{Tuple,Nothing} = nothing: If specified, provides the initialization to the algorithm. See notes below for more details
control::Control = Control(): Custom struct with hyperparameters of the CGD algorithm, defaults are in documentation of Control struct

Notes

The initialization to the descent algorithm can be specified in the init_coef argument as a tuple of the form (β, L, σ²), where

β is a vector of length p + q providing an initial estimate of the fixed effect coefficients
L is the Cholesky factor of the random effect covariance matrix, and is represented as
- a scalar if ψstr="ident"
- a vector of length m if ψstr="diag"
- a lower triangular matrix of size m by m if ψstr="sym"
σ² is a scalar providing an initial estimate of the noise variance

If the init_coef argument is not specified, we obtain initial parameter estimates in the following manner:

A LASSO that ignores random effects (with λ chosen using cross validation) is performed to estimate the fixed effect parameters.
L, assumed temporarilly to be a scalar, and σ² are estimated to maximize the likelihood given these estimated fixed effect parameters.
If ψstr is "diag" or "sym", the scalar L is converted to a vector or matrix by repeating the scalar or filling the diagonal of a matrix with the scalar, respectively.

source

StatsAPI.coeftable — Function

coeftable(fit::HDMModel, names::Vector{String}=string.(1:length(fit.fixef)))

Return a table of the selected coefficients, i.e. those not set to 0, from the model.

Arguments

fit::HDMModel: A fitted model.
names::Vector{String}: Names of the all the coefficients in the model (not just those selected), defaults to integer names

Returns

A StatsBase.CoefTable object.

source

StatsAPI.fitted — Method

fitted(fit::HDMModel)

Accounts for the random effects in generating predictions

source

StatsAPI.residuals — Method

residuals(fit::HDMModel)

Accounts for the random effects in generating predictions

source