| Title: | Competing Risks Models for Multi-Center Survival Data with Frailty |
|---|---|
| Description: | Implements methods for analyzing competing risks data in multi-center survival studies using frailty models. The approach relies on a mixed proportional hazards model for the sub-distribution, allowing for cluster-specific random effects. The package provides tools for model estimation with or without frailty using Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML). It supports flexible modeling of between-center heterogeneity and is particularly suited for multi-center clinical trials or registries. Core features include data simulation, likelihood computation, cluster-dependent censoring options, and testing of frailty effects. For methodological details, see Katsahian et al. (2006) <doi:10.1002/sim.2684>. |
| Authors: | Benjamin Delmas [aut], Lucas Ducrot [aut, cre] (ORCID: <https://orcid.org/0000-0002-8020-2897>), Sandrine Katsahian [aut] (ORCID: <https://orcid.org/0000-0002-7261-0671>), Inria [cph] |
| Maintainer: | Lucas Ducrot <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-05-28 07:25:35 UTC |
| Source: | https://github.com/teamheka/frailtycomprisk |
Validates whether the submitted data frame is correctly structured for the analysis of multicentre competing risks with frailty.
check_data_format(df)check_data_format(df)
df |
A data frame to be checked. |
TRUE if df has the expected structure.
Otherwise, the function stops with an error describing the formatting issue.
n_cov = 1 n_cluster = 5 n_per_cluster = 100 n = n_cluster * n_per_cluster a1 = 0 b1 = 1 G <- rep(1:n_cluster, each = n_per_cluster) Z <- matrix(runif(n * n_cov, a1, b1), ncol = n_cov) df = simulate_data( G, Z, prop = 0.6, beta = c(0), theta = 0.01, cens = TRUE, pcens = 0.25, tau = 0 ) check_data_format(df)n_cov = 1 n_cluster = 5 n_per_cluster = 100 n = n_cluster * n_per_cluster a1 = 0 b1 = 1 G <- rep(1:n_cluster, each = n_per_cluster) Z <- matrix(runif(n * n_cov, a1, b1), ncol = n_cov) df = simulate_data( G, Z, prop = 0.6, beta = c(0), theta = 0.01, cens = TRUE, pcens = 0.25, tau = 0 ) check_data_format(df)
This function computes the matrix of inverse probability of censoring weights (IPCW), used in multicenter competing risks models with frailty. It supports both Kaplan-Meier and Nelson-Aalen estimators for the censoring distribution.
compute_M_optimized(times, status, f, u_bis, clusters)compute_M_optimized(times, status, f, u_bis, clusters)
times |
A numeric vector of observed times (events or censoring). |
status |
A numeric vector of status indicators (0 = censored, 1 or 2 = cause of failure). |
f |
A string, either |
u_bis |
A numeric vector of cluster-level random effects (used only if |
clusters |
A vector indicating cluster membership for each subject (same length as |
If f == "KaplanMeier_Censoring_vectorized", then u_bis is ignored.
The weight matrix accounts for pairwise contributions, depending on event times and causes.
A square matrix M of size N x N, where M[i, j] is the IPCW weight between subject i and subject j.
KaplanMeier_Censoring_vectorized(), Nelson_Censoring_vectorized(), create_G_function()
This function creates a right-continuous step function (vectorized) based on estimated values of the censoring survival function at given unique times.
create_G_function(unique_times, G_vals)create_G_function(unique_times, G_vals)
unique_times |
A numeric vector of increasing unique times (e.g., censoring times). |
G_vals |
A numeric vector of the same length as |
The resulting function can be used to evaluate for any vector of time points.
A function G(t) that can be evaluated on a numeric vector of time points.
Computes the Kaplan-Meier estimator of the censoring distribution at given time points.
KaplanMeier_Censoring_vectorized(times, status, unique_times)KaplanMeier_Censoring_vectorized(times, status, unique_times)
times |
A numeric vector of observed times. |
status |
A numeric vector of status indicators (0 = censored, >0 = event). |
unique_times |
A numeric vector of time points at which to evaluate the estimator. |
A numeric vector of estimated survival probabilities at each unique time.
Computes the contribution to the log-likelihood from individuals who experienced event type 1,
using a weighting matrix W and an IPCW matrix M.
logLikelihood_1(status, M, W)logLikelihood_1(status, M, W)
status |
A numeric vector indicating the event type:
|
M |
A numeric matrix (usually computed with |
W |
A sparse matrix (e.g., of class |
For each individual i such that status[i] == 1, the function computes:
and sums this across all such individuals.
A numeric scalar: the log-likelihood contribution from all individuals who failed from cause 1.
compute_M_optimized(), Matrix::Matrix
This function computes the second component of the log-likelihood (penalty term) associated with the random effects (frailty terms) in a mixed-effects model.
logLikelihood_2(u, theta, K)logLikelihood_2(u, theta, K)
u |
A numeric vector of length |
theta |
A positive numeric scalar, representing the variance component of the random effects. |
K |
An integer representing the number of clusters (i.e., the length of |
The penalty is derived from the Gaussian assumption on the random effects and
involves both the dimension of the random effects vector and the variance parameter theta.
A numeric scalar representing the log-likelihood penalty contribution from the random effects.
Performs maximum likelihood estimation (MLE) for the effect of covariates on the cause-specific hazard function of cause 1 in a competing risks framework.
Ml_CompRisk(data, max_iter = 100, tol = 1e-06)Ml_CompRisk(data, max_iter = 100, tol = 1e-06)
data |
A data frame containing:
|
max_iter |
Maximum number of Newton-Raphson iterations (default = 100). |
tol |
Convergence tolerance for the change in parameter estimates (default = 1e-6). |
The function fits a Cox-type model for the subdistribution hazard of cause 1, treating all other causes (including censoring) as censored (i.e., status != 1 becomes 0). It uses a Newton-Raphson procedure to maximize the partial likelihood.
The design matrix is reordered internally by event time to facilitate risk set computation.
A named list with one element:
A numeric vector of estimated regression coefficients for the subdistribution hazard of cause 1. Each coefficient represents the log effect of the corresponding covariate on the subdistribution hazard.
Reml_Cox_frailty, Ml_Cox, logLikelihood_1
data <- data.frame( times = c(0.099, 0.006, 0.260, 0.151, 0.262, 0.019, 0.026, 0.241, 0.017, 0.195, 0.007, 0.057, 0.241, 0.132, 0.044, 0.242, 0.416, 0.096, 0.172, 0.015), status = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), clusters = c(1,1,1,1,1, 1,1,1,1,1, 2,2,2,2,2, 2,2,2,2,2), V1 = c(0.720, 0.289, 0.382, 0.995, 0.045, 0.048, 0.042, 0.810, 0.964, 0.781, 0.443, 0.303, 0.902, 0.596, 0.462, 0.521, 0.656, 0.164, 0.261, 0.635) ) beta_hat <- Ml_CompRisk(data)data <- data.frame( times = c(0.099, 0.006, 0.260, 0.151, 0.262, 0.019, 0.026, 0.241, 0.017, 0.195, 0.007, 0.057, 0.241, 0.132, 0.044, 0.242, 0.416, 0.096, 0.172, 0.015), status = c(1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), clusters = c(1,1,1,1,1, 1,1,1,1,1, 2,2,2,2,2, 2,2,2,2,2), V1 = c(0.720, 0.289, 0.382, 0.995, 0.045, 0.048, 0.042, 0.810, 0.964, 0.781, 0.443, 0.303, 0.902, 0.596, 0.462, 0.521, 0.656, 0.164, 0.261, 0.635) ) beta_hat <- Ml_CompRisk(data)
This function computes the maximum likelihood estimates of the regression coefficients in the Cox proportional hazards model using Newton-Raphson iterations.
Ml_Cox(data, max_iter = 100, tol = 1e-06)Ml_Cox(data, max_iter = 100, tol = 1e-06)
data |
A data frame with columns: |
max_iter |
Maximum number of iterations (default = 100). |
tol |
Convergence tolerance for the Euclidean norm (default = 1e-6). |
A named list with one element:
A numeric vector of estimated regression coefficients in the Cox proportional hazards model. Each coefficient represents the log hazard ratio associated with the corresponding covariate.
Computes the estimated censoring survival probabilities using a Nelson-Aalen-type estimator, accounting for shared frailty through a cluster-specific random effect.
Nelson_Censoring_vectorized(times, unique_times, status, clusters, u_bis)Nelson_Censoring_vectorized(times, unique_times, status, clusters, u_bis)
times |
A numeric vector of observed times. |
unique_times |
A numeric vector of distinct, sorted time points. |
status |
A numeric vector where 0 indicates censoring. |
clusters |
A vector indicating the cluster/group for each observation. |
u_bis |
A numeric vector of cluster-specific random effects (e.g., frailty values). |
A numeric vector of estimated censoring survival probabilities for each individual.
Provides a unified wrapper to estimate model parameters under four different modeling strategies:
Competing risks with shared frailty for cause 1 ("CompRisk_frailty")
Competing risks without frailty for cause 1 ("CompRisk")
Standard Cox model with shared frailty ("Cox_frailty")
Standard Cox model without frailty ("Cox")
Parameters_estimation( data, method = "CompRisk_frailty", cluster_censoring = FALSE, max_iter = 300, tol = 1e-06, threshold = 1e-06 )Parameters_estimation( data, method = "CompRisk_frailty", cluster_censoring = FALSE, max_iter = 300, tol = 1e-06, threshold = 1e-06 )
data |
A data frame containing at least the following columns:
|
method |
Character string indicating the estimation method. One of:
|
cluster_censoring |
Logical. If |
max_iter |
Maximum number of iterations for the selected model estimation (default = 300). |
tol |
Convergence tolerance (default = 1e-6). |
threshold |
Lower bound for the frailty variance parameter |
This wrapper allows seamless switching between various modeling frameworks. It automatically calls:
Reml_CompRisk_frailty for "CompRisk_frailty"
Ml_CompRisk for "CompRisk"
Reml_Cox_frailty for "Cox_frailty"
Ml_Cox for "Cox"
Data format is validated via the helper function check_data_format. An error is raised if input format is invalid or if cluster_censoring is used with an incompatible method.
A named list whose structure depends on method.
For "CompRisk" and "Cox", the returned list contains:
Numeric vector of estimated regression coefficients.
For "CompRisk_frailty" and "Cox_frailty", the returned list may contain:
Numeric vector of estimated fixed-effect coefficients.
Numeric vector of estimated cluster-specific frailties.
Estimated frailty variance parameter.
P-value for testing the null hypothesis of no frailty effect.
Returns NULL if estimation fails.
Reml_CompRisk_frailty, Ml_CompRisk, Reml_Cox_frailty, Ml_Cox, check_data_format
data <- data.frame( times = c(4, 6, 10, 12, 3), status = c(1, 0, 2, 1, 0), clusters = c(1, 1, 2, 2, 3), x1 = rnorm(5), x2 = sample(0:1, 5, replace = TRUE) ) result <- Parameters_estimation(data, method = "CompRisk_frailty") result$betadata <- data.frame( times = c(4, 6, 10, 12, 3), status = c(1, 0, 2, 1, 0), clusters = c(1, 1, 2, 2, 3), x1 = rnorm(5), x2 = sample(0:1, 5, replace = TRUE) ) result <- Parameters_estimation(data, method = "CompRisk_frailty") result$beta
Fits a cause-specific hazard model for cause 1 in a competing risks framework using restricted maximum likelihood (REML) with a shared frailty term for clusters.
Reml_CompRisk_frailty( data, cluster_censoring = FALSE, max_iter = 300, tol = 1e-06, threshold = 1e-05 )Reml_CompRisk_frailty( data, cluster_censoring = FALSE, max_iter = 300, tol = 1e-06, threshold = 1e-05 )
data |
A data frame with at least the following columns:
|
cluster_censoring |
Logical. If |
max_iter |
Maximum number of iterations for the Newton-Raphson algorithm (default = 300). |
tol |
Convergence tolerance for the likelihood and frailty variance (default = 1e-6). |
threshold |
Lower bound for the frailty variance parameter |
The function fits a proportional hazards model for cause 1, accounting for unobserved heterogeneity via a shared frailty term on clusters. When cluster_censoring = TRUE, a frailty-adjusted survival curve is used to correct for informative censoring using either a Kaplan-Meier or Nelson-Aalen-based estimator.
If the estimated frailty variance becomes smaller than the provided threshold, the model reverts to a standard Cox model for cause 1 using Ml_CompRisk.
A list with:
Estimated regression coefficients for the cause 1 hazard.
Estimated cluster-specific random effects (frailties).
Estimated frailty variance.
P-value testing the null hypothesis .
Ml_CompRisk, Reml_Cox_frailty, logLikelihood_1, compute_M_optimized
data <- data.frame( times = c(4, 6, 10, 12, 3), status = c(1, 0, 2, 1, 0), clusters = c(1, 1, 2, 2, 3), x1 = rnorm(5), x2 = sample(0:1, 5, replace = TRUE) ) result <- Reml_CompRisk_frailty(data) result$beta result$p_valuedata <- data.frame( times = c(4, 6, 10, 12, 3), status = c(1, 0, 2, 1, 0), clusters = c(1, 1, 2, 2, 3), x1 = rnorm(5), x2 = sample(0:1, 5, replace = TRUE) ) result <- Reml_CompRisk_frailty(data) result$beta result$p_value
Estimates the regression coefficients, random effects, and frailty variance in a Cox proportional hazards model with shared frailty using Restricted Maximum Likelihood (REML).
Reml_Cox_frailty(data, max_iter = 300, tol = 1e-06, threshold = 1e-05)Reml_Cox_frailty(data, max_iter = 300, tol = 1e-06, threshold = 1e-05)
data |
A data frame containing:
|
max_iter |
Maximum number of Newton-Raphson iterations (default = 300). |
tol |
Tolerance for convergence (default = 1e-6). |
threshold |
Lower bound for the frailty variance parameter |
The function uses a REML-based Newton-Raphson approach to estimate the parameters.
A penalized likelihood including a log-likelihood contribution for the frailty variance is maximized.
The hypothesis test of is based on a Wald-type statistic. A high p-value (e.g. > 0.05) may indicate that the random effects (frailties) are negligible.
A list with the following elements:
Estimated fixed effects (regression coefficients).
Estimated random effects (frailties) for each cluster.
Estimated variance of the random effects.
P-value for testing the null hypothesis (no frailty).
Ml_Cox, logLikelihood_1, logLikelihood_2
This function simulates clustered time-to-event data under a competing risks framework with shared frailty and the possibility of random censoring. Cause 1 is modeled through an inversion method of the subdistribution function, and cause 2 is introduced when inversion fails.
simulate_data( G, Z = NULL, prop = 0.6, beta = NULL, theta = 0.5, cens = FALSE, pcens = 0.25, tau = 0.5 )simulate_data( G, Z = NULL, prop = 0.6, beta = NULL, theta = 0.5, cens = FALSE, pcens = 0.25, tau = 0.5 )
G |
A vector of group or cluster identifiers (length |
Z |
A matrix of covariates (dimensions |
prop |
Proportion of individuals susceptible to cause 1 (default: |
beta |
A numeric vector of regression coefficients, one per covariate (length |
theta |
Variance of the shared frailty term for event times (cause 1). |
cens |
Logical, indicating whether censoring should be simulated (default: |
pcens |
Target censoring proportion (default: |
tau |
Variance of the shared frailty term for censoring times (default: |
A data.frame with one row per individual and at least the columns:
Observed follow-up time.
Event indicator: 0 for censoring, 1 for cause 1,
2 for cause 2.
Cluster identifier.
If covariates are simulated, additional numeric covariate columns are included.
n_cov <- 1 n_cluster <- 5 n_per_cluster <- 100 n <- n_cluster * n_per_cluster G <- rep(1:n_cluster, each = n_per_cluster) Z <- matrix(runif(n * n_cov), ncol = n_cov) df <- simulate_data( G = G, Z = Z, prop = 0.6, beta = c(0.5), theta = 0.01, cens = TRUE, pcens = 0.25, tau = 0.01 ) head(df) table(df$status)n_cov <- 1 n_cluster <- 5 n_per_cluster <- 100 n <- n_cluster * n_per_cluster G <- rep(1:n_cluster, each = n_per_cluster) Z <- matrix(runif(n * n_cov), ncol = n_cov) df <- simulate_data( G = G, Z = Z, prop = 0.6, beta = c(0.5), theta = 0.01, cens = TRUE, pcens = 0.25, tau = 0.01 ) head(df) table(df$status)