Package 'mmcif' reference manual

Title:	Mixed Multivariate Cumulative Incidence Functions
Description:	Fits the mixed cumulative incidence functions model suggested by <doi:10.1093/biostatistics/kxx072> which decomposes within cluster dependence of risk and timing. The estimation method supports computation in parallel using a shared memory C++ implementation. A sandwich estimator of the covariance matrix is available. Natural cubic splines are used to provide a flexible model for the cumulative incidence functions.
Authors:	Benjamin Christoffersen [cre, aut] , Mark Clements [cph], Alan Genz [cph], Frank Bretz [cph], Torsten Hothorn [cph], R-core [cph]
Maintainer:	Benjamin Christoffersen <[email protected]>
License:	GPL (>= 3)
Version:	0.1.2
Built:	2025-03-04 04:04:29 UTC
Source:	https://github.com/boennecd/mmcif

Computes the Log Cholesky Decomposition and the Inverse

Description

Computes the log Cholesky decomposition and the inverse of it. The functions are provided as the log Cholesky decomposition is used in the parameterization of the covariance matrix.

Usage

log_chol(x)

log_chol_inv(x)
log_chol(x)

log_chol_inv(x)

Arguments

`x`	A positive definite matrix or a vector with a log Cholesky decomposition.

Value

A numeric vector with the log Cholesky decomposition or a matrix with the inverse.

Examples

set.seed(1)
S <- drop(rWishart(1, 10, diag(10)))
log_chol(S)
stopifnot(isTRUE(all.equal(S, log_chol_inv(log_chol(S)))),
          (NCOL(S) * (NCOL(S) + 1L)) %/% 2L == length(log_chol(S)))

set.seed(1)
S <- drop(rWishart(1, 10, diag(10)))
log_chol(S)
stopifnot(isTRUE(all.equal(S, log_chol_inv(log_chol(S)))),
          (NCOL(S) * (NCOL(S) + 1L)) %/% 2L == length(log_chol(S)))

Sets up an Object to Compute the Log Composite Likelihood

Description

Sets up the R and C++ objects that are needed to evaluate the log composite likelihood. This reduces to a log likelihood when only clusters of size one or two are used.

Usage

mmcif_data(
  formula,
  data,
  cause,
  time,
  cluster_id,
  max_time,
  spline_df = 3L,
  left_trunc = NULL,
  ghq_data = NULL,
  strata = NULL,
  knots = NULL,
  boundary_quantiles = c(0.025, 0.975)
)
mmcif_data(
  formula,
  data,
  cause,
  time,
  cluster_id,
  max_time,
  spline_df = 3L,
  left_trunc = NULL,
  ghq_data = NULL,
  strata = NULL,
  knots = NULL,
  boundary_quantiles = c(0.025, 0.975)
)

Arguments

`formula`	`formula` for covariates in the risk and trajectories.
`data`	`data.frame` with the covariate and outcome information.
`cause`	an integer vector with the cause of each outcome. If there are `n_causes` of outcome, then the vector should have values in `1:(n_causes + 1)` with `n_causes + 1` indicating censoring.
`time`	a numeric vector with the observed times.
`cluster_id`	an integer vector with the cluster id of each individual.
`max_time`	the maximum time after which there are no observed events. It is denoted by $\tau$ in the original article (Cederkvist et al., 2019).
`spline_df`	degrees of freedom to use for each spline in the cumulative incidence functions.
`left_trunc`	numeric vector with left-truncation times. `NULL` implies that there are not any individuals with left-truncation.
`ghq_data`	the default Gauss-Hermite quadrature nodes and weights to use. It should be a list with two elements called `"node"` and `"weight"`. A default is provided if `NULL` is passed.
`strata`	an integer vector or a factor vector with the strata of each individual. `NULL` implies that there are no strata.
`knots`	A list of lists with knots for the splines. The inner lists needs to have elements called `"knots"` and `"boundary_knots"` which are passed to a function like `ns`. `NULL` yields defaults based on the quantiles of the observed event times. Note that the knots needs to be on the `atanh((time - max_time / 2) / (max_time / 2))` scale.
`boundary_quantiles`	two dimensional numerical vector with boundary quantile probabilities after which the natural cubic splines for the time transformations are restricted to be linear. Only relevant if `knots` is not `NULL`.

Value

An object of class mmcif which is needed for the other functions in the package.

References

Cederkvist, L., Holst, K. K., Andersen, K. K., & Scheike, T. H. (2019). Modeling the cumulative incidence function of multivariate competing risks data allowing for within-cluster dependence of risk and timing. Biostatistics, Apr 1, 20(2), 199-217.

Examples

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)
}

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)
}

Fits a Mixed Competing Risk Model

Description

Fits mixed cumulative incidence functions model by maximizing the log composite likelihood function.

Usage

mmcif_fit(
  par,
  object,
  n_threads = 1L,
  control.outer = list(itmax = 100L, method = "nlminb", kkt2.check = FALSE, trace =
    FALSE),
  control.optim = list(eval.max = 10000L, iter.max = 10000L),
  ghq_data = object$ghq_data,
  ...
)
mmcif_fit(
  par,
  object,
  n_threads = 1L,
  control.outer = list(itmax = 100L, method = "nlminb", kkt2.check = FALSE, trace =
    FALSE),
  control.optim = list(eval.max = 10000L, iter.max = 10000L),
  ghq_data = object$ghq_data,
  ...
)

Arguments

`par`	numeric vector with parameters. This is using a log Cholesky decomposition for the covariance matrix.
`object`	an object from `mmcif_data`.
`n_threads`	the number of threads to use.
`control.outer`, `control.optim`, `...`	arguments passed to `auglag`.
`ghq_data`	the Gauss-Hermite quadrature nodes and weights to use. It should be a list with two elements called `"node"` and `"weight"`. The argument can also be a list with lists with different sets of quadrature nodes. In this case, fits are successively made using the previous fit as the starting value. This may reduce the computation time by starting with fewer quadrature nodes.

Value

The output from auglag.

References

Examples

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # estimate the parameters
  ests <- mmcif_fit(start_vals$upper, mmcif_obj, n_threads = n_threads)

  # show the estimated covariance matrix of the random effects
  tail(ests$par, 10L) |> log_chol_inv() |> print()

  # gradient is ~ zero
  mmcif_logLik_grad(
    mmcif_obj, ests$par, is_log_chol = TRUE, n_threads = n_threads) |>
    print()
}

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # estimate the parameters
  ests <- mmcif_fit(start_vals$upper, mmcif_obj, n_threads = n_threads)

  # show the estimated covariance matrix of the random effects
  tail(ests$par, 10L) |> log_chol_inv() |> print()

  # gradient is ~ zero
  mmcif_logLik_grad(
    mmcif_obj, ests$par, is_log_chol = TRUE, n_threads = n_threads) |>
    print()
}

Evaluates the Log Composite Likelihood and its Gradient

Description

Evaluates the log composite likelihood and its gradient using adaptive Gauss-Hermite quadrature.

Usage

mmcif_logLik(
  object,
  par,
  ghq_data = object$ghq_data,
  n_threads = 1L,
  is_log_chol = FALSE
)

mmcif_logLik_grad(
  object,
  par,
  ghq_data = object$ghq_data,
  n_threads = 1L,
  is_log_chol = FALSE
)
mmcif_logLik(
  object,
  par,
  ghq_data = object$ghq_data,
  n_threads = 1L,
  is_log_chol = FALSE
)

mmcif_logLik_grad(
  object,
  par,
  ghq_data = object$ghq_data,
  n_threads = 1L,
  is_log_chol = FALSE
)

Arguments

`object`	an object from `mmcif_data`.
`par`	numeric vector with parameters. This is either using a log Cholesky decomposition for the covariance matrix or the covariance matrix.
`ghq_data`	the Gauss-Hermite quadrature nodes and weights to use. It should be a list with two elements called `"node"` and `"weight"`. A default is provided if `NULL` is passed.
`n_threads`	the number of threads to use.
`is_log_chol`	logical for whether a log Cholesky decomposition is used for the covariance matrix or the full covariance matrix.

Value

A numeric vector with either the log composite likelihood or the gradient of it.

Examples

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # compute the log composite likelihood and the gradient at the starting
  # values
  mmcif_logLik(
    mmcif_obj, start_vals$upper, is_log_chol = TRUE, n_threads = n_threads) |>
    print()
  mmcif_logLik_grad(
    mmcif_obj, start_vals$upper, is_log_chol = TRUE, n_threads = n_threads) |>
    print()
}

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # compute the log composite likelihood and the gradient at the starting
  # values
  mmcif_logLik(
    mmcif_obj, start_vals$upper, is_log_chol = TRUE, n_threads = n_threads) |>
    print()
  mmcif_logLik_grad(
    mmcif_obj, start_vals$upper, is_log_chol = TRUE, n_threads = n_threads) |>
    print()
}

Computes Marginal Measures Using Two Observations

Description

Computes bivariate figures such as conditional CIFs, survival probabilities, and hazards or bivariate CIFs, densities, and survival probabilities.

Usage

mmcif_pd_cond(
  par,
  object,
  newdata,
  cause,
  time,
  left_trunc = NULL,
  ghq_data = object$ghq_data,
  strata = NULL,
  which_cond,
  type_cond = "derivative",
  type_obs = "cumulative"
)

mmcif_pd_bivariate(
  par,
  object,
  newdata,
  cause,
  time,
  left_trunc = NULL,
  ghq_data = object$ghq_data,
  strata = NULL,
  use_log = FALSE,
  type = c("cumulative", "cumulative")
)
mmcif_pd_cond(
  par,
  object,
  newdata,
  cause,
  time,
  left_trunc = NULL,
  ghq_data = object$ghq_data,
  strata = NULL,
  which_cond,
  type_cond = "derivative",
  type_obs = "cumulative"
)

mmcif_pd_bivariate(
  par,
  object,
  newdata,
  cause,
  time,
  left_trunc = NULL,
  ghq_data = object$ghq_data,
  strata = NULL,
  use_log = FALSE,
  type = c("cumulative", "cumulative")
)

Arguments

`par`	numeric vector with the model parameters.
`object`	an object from `mmcif_data`.
`newdata`	a `data.frame` with data for the observations. It needs to have two rows.
`cause`	an integer vector with the cause of each outcome. If there are `n_causes` of outcome, then the vector should have values in `1:(n_causes + 1)` with `n_causes + 1` indicating censoring.
`time`	a numeric vector with the observed times.
`left_trunc`	numeric vector with left-truncation times. `NULL` implies that there are not any individuals with left-truncation.
`ghq_data`	the Gauss-Hermite quadrature nodes and weights to use. It should be a list with two elements called `"node"` and `"weight"`. A default is provided if `NULL` is passed.
`strata`	an integer vector or a factor vector with the strata of each individual. `NULL` implies that there are no strata.
`which_cond`	an integer with value one or two for the index of the individual that is being conditioned on.
`type_cond`	a character for the type of outcome that is being conditioned on. `"derivative"` for the derivative of a CIF or `"cumulative"` for a CIF or the survival probability.
`type_obs`	a character the type of conditional measure. It can be `"derivative"` for the derivative of a CIF, `"cumulative"` for a CIF or the survival probability, and `"hazard"` for the hazard.
`use_log`	a logical for whether the returned output should be on the log scale.
`type`	a 2D character vector for the type of measures for each observation. The elements can be `"derivative"` for the derivative of a CIF or `"cumulative"` for a CIF or the survival probability.

Value

A numeric scalar with the requested quantity.

Examples

if(require(mets)){
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # Gauss Hermite quadrature nodes and weights from fastGHQuad::gaussHermiteData
  ghq_data <- list(
    node = c(-3.43615911883774, -2.53273167423279, -1.75668364929988, -1.03661082978951,
             -0.342901327223705, 0.342901327223705, 1.03661082978951, 1.75668364929988,
             2.53273167423279, 3.43615911883774),
    weight = c(7.6404328552326e-06, 0.00134364574678124, 0.0338743944554811, 0.240138611082314,
               0.610862633735326,0.610862633735326, 0.240138611082315, 0.033874394455481,
               0.00134364574678124, 7.64043285523265e-06))

  # setup the object for the computation
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country, ghq_data = ghq_data)

  # previous estimates
  par <- c(0.727279974859164, 0.640534073288067, 0.429437766165371, 0.434367104339573,
           -2.4737847536253, -1.49576564624673, -1.89966050143904, -1.58881346649412,
           -5.5431198001029, -3.5328359024178, -5.82305147022587, -3.4531896212114,
           -5.29132887832377, -3.36106297109548, -6.03690322125729, -3.49516746825624,
           2.55000711185704, 2.71995985605891, 2.61971498736444, 3.05976391058032,
           -5.97173564860957, -3.37912051983482, -5.14324860374941, -3.36396780694965,
           -6.02337246348561, -3.03754644968859, -5.51267338700737, -3.01148582224673,
           2.69665543753264, 2.59359057553995, 2.7938341786374, 2.70689750644755,
           -0.362056555418564, 0.24088005091276, 0.124070380635372, -0.246152029808377,
           -0.0445628476462479, -0.911485513197845, -0.27911988106887, -0.359648419277058,
           -0.242711959678559, -6.84897302527358)

  # the test data we will use
  test_dat <- data.frame(
    country = factor(c("Norway", "Norway"), levels(prt_use$country)),
    status = c(1L, 2L), time = c(60, 75))

  # probability that both experience the event prior to the two times
  mmcif_pd_bivariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = time, type =
      c("cumulative", "cumulative")) |>
    print()

  # density that one experiences an event at the point and the other
  # experiences an event prior to the point
  mmcif_pd_bivariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = time, type =
      c("derivative", "cumulative")) |>
    print()

  # probability that both survive up to the passed points
  mmcif_pd_bivariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = c(3L, 3L),
    strata = country, ghq_data = ghq_data, time = time, type =
      c("cumulative", "cumulative")) |>
    print()

  # conditional hazard given that the other experiences an event prior to time
  mmcif_pd_cond(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = time, which_cond = 1L,
    type_cond = "cumulative", type_obs = "hazard") |>
    print()

  # conditional CIF given that the other experiences an event prior to the
  # time
  mmcif_pd_cond(
    par = par, object = mmcif_obj, newdata = test_dat, cause = c(2L, 2L),
    strata = country, ghq_data = ghq_data, time = time, which_cond = 1L,
    type_cond = "cumulative", type_obs = "cumulative") |>
    print()

  # same but given that the other experiences the event at the point
  mmcif_pd_cond(
    par = par, object = mmcif_obj, newdata = test_dat, cause = c(2L, 2L),
    strata = country, ghq_data = ghq_data, time = time, which_cond = 1L,
    type_cond = "derivative", type_obs = "cumulative") |>
    print()
}

if(require(mets)){
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # Gauss Hermite quadrature nodes and weights from fastGHQuad::gaussHermiteData
  ghq_data <- list(
    node = c(-3.43615911883774, -2.53273167423279, -1.75668364929988, -1.03661082978951,
             -0.342901327223705, 0.342901327223705, 1.03661082978951, 1.75668364929988,
             2.53273167423279, 3.43615911883774),
    weight = c(7.6404328552326e-06, 0.00134364574678124, 0.0338743944554811, 0.240138611082314,
               0.610862633735326,0.610862633735326, 0.240138611082315, 0.033874394455481,
               0.00134364574678124, 7.64043285523265e-06))

  # setup the object for the computation
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country, ghq_data = ghq_data)

  # previous estimates
  par <- c(0.727279974859164, 0.640534073288067, 0.429437766165371, 0.434367104339573,
           -2.4737847536253, -1.49576564624673, -1.89966050143904, -1.58881346649412,
           -5.5431198001029, -3.5328359024178, -5.82305147022587, -3.4531896212114,
           -5.29132887832377, -3.36106297109548, -6.03690322125729, -3.49516746825624,
           2.55000711185704, 2.71995985605891, 2.61971498736444, 3.05976391058032,
           -5.97173564860957, -3.37912051983482, -5.14324860374941, -3.36396780694965,
           -6.02337246348561, -3.03754644968859, -5.51267338700737, -3.01148582224673,
           2.69665543753264, 2.59359057553995, 2.7938341786374, 2.70689750644755,
           -0.362056555418564, 0.24088005091276, 0.124070380635372, -0.246152029808377,
           -0.0445628476462479, -0.911485513197845, -0.27911988106887, -0.359648419277058,
           -0.242711959678559, -6.84897302527358)

  # the test data we will use
  test_dat <- data.frame(
    country = factor(c("Norway", "Norway"), levels(prt_use$country)),
    status = c(1L, 2L), time = c(60, 75))

  # probability that both experience the event prior to the two times
  mmcif_pd_bivariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = time, type =
      c("cumulative", "cumulative")) |>
    print()

  # density that one experiences an event at the point and the other
  # experiences an event prior to the point
  mmcif_pd_bivariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = time, type =
      c("derivative", "cumulative")) |>
    print()

  # probability that both survive up to the passed points
  mmcif_pd_bivariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = c(3L, 3L),
    strata = country, ghq_data = ghq_data, time = time, type =
      c("cumulative", "cumulative")) |>
    print()

  # conditional hazard given that the other experiences an event prior to time
  mmcif_pd_cond(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = time, which_cond = 1L,
    type_cond = "cumulative", type_obs = "hazard") |>
    print()

  # conditional CIF given that the other experiences an event prior to the
  # time
  mmcif_pd_cond(
    par = par, object = mmcif_obj, newdata = test_dat, cause = c(2L, 2L),
    strata = country, ghq_data = ghq_data, time = time, which_cond = 1L,
    type_cond = "cumulative", type_obs = "cumulative") |>
    print()

  # same but given that the other experiences the event at the point
  mmcif_pd_cond(
    par = par, object = mmcif_obj, newdata = test_dat, cause = c(2L, 2L),
    strata = country, ghq_data = ghq_data, time = time, which_cond = 1L,
    type_cond = "derivative", type_obs = "cumulative") |>
    print()
}

Computes Marginal Measures for One Observation

Description

Computes the marginal cumulative incidence functions (CIF), marginal survival function or the derivative of the CIF.

Usage

mmcif_pd_univariate(
  par,
  object,
  newdata,
  cause,
  time,
  left_trunc = NULL,
  ghq_data = object$ghq_data,
  strata = NULL,
  use_log = FALSE,
  type = "cumulative"
)
mmcif_pd_univariate(
  par,
  object,
  newdata,
  cause,
  time,
  left_trunc = NULL,
  ghq_data = object$ghq_data,
  strata = NULL,
  use_log = FALSE,
  type = "cumulative"
)

Arguments

`par`	numeric vector with the model parameters.
`object`	an object from `mmcif_data`.
`newdata`	a `data.frame` with data for the observation. It needs to have one row.
`cause`	an integer vector with the cause of each outcome. If there are `n_causes` of outcome, then the vector should have values in `1:(n_causes + 1)` with `n_causes + 1` indicating censoring.
`time`	a numeric vector with the observed times.
`left_trunc`	numeric vector with left-truncation times. `NULL` implies that there are not any individuals with left-truncation.
`ghq_data`	the Gauss-Hermite quadrature nodes and weights to use. It should be a list with two elements called `"node"` and `"weight"`. A default is provided if `NULL` is passed.
`strata`	an integer vector or a factor vector with the strata of each individual. `NULL` implies that there are no strata.
`use_log`	a logical for whether the returned output should be on the log scale.
`type`	a character for the type of measures for the observation. It can have value `"derivative"` for the derivative of a CIF or `"cumulative"` for a CIF or the survival probability.

Value

A numeric scalar with the requested quantity.

Examples

if(require(mets)){
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # Gauss Hermite quadrature nodes and weights from fastGHQuad::gaussHermiteData
  ghq_data <- list(
    node = c(-3.43615911883774, -2.53273167423279, -1.75668364929988, -1.03661082978951,
             -0.342901327223705, 0.342901327223705, 1.03661082978951, 1.75668364929988,
             2.53273167423279, 3.43615911883774),
    weight = c(7.6404328552326e-06, 0.00134364574678124, 0.0338743944554811, 0.240138611082314,
               0.610862633735326,0.610862633735326, 0.240138611082315, 0.033874394455481,
               0.00134364574678124, 7.64043285523265e-06))

  # setup the object for the computation
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country, ghq_data = ghq_data)

  # previous estimates
  par <- c(0.727279974859164, 0.640534073288067, 0.429437766165371, 0.434367104339573,
           -2.4737847536253, -1.49576564624673, -1.89966050143904, -1.58881346649412,
           -5.5431198001029, -3.5328359024178, -5.82305147022587, -3.4531896212114,
           -5.29132887832377, -3.36106297109548, -6.03690322125729, -3.49516746825624,
           2.55000711185704, 2.71995985605891, 2.61971498736444, 3.05976391058032,
           -5.97173564860957, -3.37912051983482, -5.14324860374941, -3.36396780694965,
           -6.02337246348561, -3.03754644968859, -5.51267338700737, -3.01148582224673,
           2.69665543753264, 2.59359057553995, 2.7938341786374, 2.70689750644755,
           -0.362056555418564, 0.24088005091276, 0.124070380635372, -0.246152029808377,
           -0.0445628476462479, -0.911485513197845, -0.27911988106887, -0.359648419277058,
           -0.242711959678559, -6.84897302527358)

  # the test data we will use
  test_dat <- data.frame(country = factor("Norway", levels(prt_use$country)),
                         status = 2L)

  # compute the CIF
  mmcif_pd_univariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = 75, type = "cumulative") |>
    print()

  # compute the derivative of the CIF
  mmcif_pd_univariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = 75, type = "derivative") |>
    print()

  # compute the survival probability
  mmcif_pd_univariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = 3L,
    strata = country, ghq_data = ghq_data, time = 75, type = "cumulative") |>
    print()
}

if(require(mets)){
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # Gauss Hermite quadrature nodes and weights from fastGHQuad::gaussHermiteData
  ghq_data <- list(
    node = c(-3.43615911883774, -2.53273167423279, -1.75668364929988, -1.03661082978951,
             -0.342901327223705, 0.342901327223705, 1.03661082978951, 1.75668364929988,
             2.53273167423279, 3.43615911883774),
    weight = c(7.6404328552326e-06, 0.00134364574678124, 0.0338743944554811, 0.240138611082314,
               0.610862633735326,0.610862633735326, 0.240138611082315, 0.033874394455481,
               0.00134364574678124, 7.64043285523265e-06))

  # setup the object for the computation
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country, ghq_data = ghq_data)

  # previous estimates
  par <- c(0.727279974859164, 0.640534073288067, 0.429437766165371, 0.434367104339573,
           -2.4737847536253, -1.49576564624673, -1.89966050143904, -1.58881346649412,
           -5.5431198001029, -3.5328359024178, -5.82305147022587, -3.4531896212114,
           -5.29132887832377, -3.36106297109548, -6.03690322125729, -3.49516746825624,
           2.55000711185704, 2.71995985605891, 2.61971498736444, 3.05976391058032,
           -5.97173564860957, -3.37912051983482, -5.14324860374941, -3.36396780694965,
           -6.02337246348561, -3.03754644968859, -5.51267338700737, -3.01148582224673,
           2.69665543753264, 2.59359057553995, 2.7938341786374, 2.70689750644755,
           -0.362056555418564, 0.24088005091276, 0.124070380635372, -0.246152029808377,
           -0.0445628476462479, -0.911485513197845, -0.27911988106887, -0.359648419277058,
           -0.242711959678559, -6.84897302527358)

  # the test data we will use
  test_dat <- data.frame(country = factor("Norway", levels(prt_use$country)),
                         status = 2L)

  # compute the CIF
  mmcif_pd_univariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = 75, type = "cumulative") |>
    print()

  # compute the derivative of the CIF
  mmcif_pd_univariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = status,
    strata = country, ghq_data = ghq_data, time = 75, type = "derivative") |>
    print()

  # compute the survival probability
  mmcif_pd_univariate(
    par = par, object = mmcif_obj, newdata = test_dat, cause = 3L,
    strata = country, ghq_data = ghq_data, time = 75, type = "cumulative") |>
    print()
}

Computes the Sandwich Estimator

Description

Computes the sandwich estimator of the covariance matrix. The parameter that is passed is using the log Cholesky decomposition. The Hessian is computed using numerical differentiation with Richardson extrapolation to refine the estimate.

Usage

mmcif_sandwich(
  object,
  par,
  ghq_data = object$ghq_data,
  n_threads = 1L,
  eps = 0.01,
  scale = 2,
  tol = 1e-08,
  order = 3L
)
mmcif_sandwich(
  object,
  par,
  ghq_data = object$ghq_data,
  n_threads = 1L,
  eps = 0.01,
  scale = 2,
  tol = 1e-08,
  order = 3L
)

Arguments

`object`	an object from `mmcif_data`.
`par`	numeric vector with the parameters to compute the sandwich estimator at.
`ghq_data`	the Gauss-Hermite quadrature nodes and weights to use. It should be a list with two elements called `"node"` and `"weight"`. A default is provided if `NULL` is passed.
`n_threads`	the number of threads to use.
`eps`	determines the step size in the numerical differentiation using `max(sqrt(.Machine$double.eps), \|par[i]\| * eps)` for each parameter `i`.
`scale`	scaling factor in the Richardson extrapolation. Each step is smaller by a factor `scale`.
`tol`	relative convergence criteria in the extrapolation given by `max(tol, \|g[i]\| * tol)` with `g` being the gradient and for each parameter `i`.
`order`	maximum number of iteration of the Richardson extrapolation.

Value

The sandwich estimator along attributes called

"meat" for the "meat" of the sandwich estimator.
"hessian" for the Hessian of the log composite likelihood.
"res vcov" which is the sandwich estimator where the last elements are the upper triangle of the covariance matrix of the random effects rather than the log Cholesky decomposition of the matrix.

References

Examples

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # estimate the parameters
  ests <- mmcif_fit(start_vals$upper, mmcif_obj, n_threads = n_threads)

  # get the sandwich estimator
  vcov_est <- mmcif_sandwich(
    mmcif_obj, ests$par, n_threads = n_threads, order = 2L)

  # show the parameter estimates along with the standard errors
  rbind(Estimate = ests$par,
        SE = sqrt(diag(vcov_est))) |>
    print()

  # show the upper triangle of the covariance matrix and the SEs
  rbind(`Estimate (vcov)` = tail(ests$par, 10) |> log_chol_inv() |>
          (\(x) x[upper.tri(x, TRUE)])() ,
        SE = attr(vcov_est, "res vcov") |> diag() |> sqrt() |> tail(10)) |>
    print()
}

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # estimate the parameters
  ests <- mmcif_fit(start_vals$upper, mmcif_obj, n_threads = n_threads)

  # get the sandwich estimator
  vcov_est <- mmcif_sandwich(
    mmcif_obj, ests$par, n_threads = n_threads, order = 2L)

  # show the parameter estimates along with the standard errors
  rbind(Estimate = ests$par,
        SE = sqrt(diag(vcov_est))) |>
    print()

  # show the upper triangle of the covariance matrix and the SEs
  rbind(`Estimate (vcov)` = tail(ests$par, 10) |> log_chol_inv() |>
          (\(x) x[upper.tri(x, TRUE)])() ,
        SE = attr(vcov_est, "res vcov") |> diag() |> sqrt() |> tail(10)) |>
    print()
}

Finds Staring Values

Description

Fast heuristic for finding starting values for the mixed cumulative incidence functions model.

Usage

mmcif_start_values(object, n_threads = 1L, vcov_start = NULL)
mmcif_start_values(object, n_threads = 1L, vcov_start = NULL)

Arguments

`object`	an object from `mmcif_data`.
`n_threads`	the number of threads to use.
`vcov_start`	starting value for the covariance matrix of the random effects. `NULL` yields the identity matrix.

Value

A list with

an element called "full" with the starting value where the last components are the covariance matrix.
an element called "upper" the staring values where the covariance matrix is stored as a log Cholesky decomposition. This is used e.g. for optimization with mmcif_fit.

Examples

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # the starting values
  print(start_vals)
}

if(require(mets)){
  # prepare the data
  data(prt)

  # truncate the time
  max_time <- 90
  prt <- within(prt, {
    status[time >= max_time] <- 0
    time <- pmin(time, max_time)
  })

  # select the DZ twins and re-code the status
  prt_use <- subset(prt, zyg == "DZ") |>
    transform(status = ifelse(status == 0, 3L, status))

  # randomly sub-sample
  set.seed(1)
  prt_use <- subset(
    prt_use, id %in% sample(unique(id), length(unique(id)) %/% 10L))

  n_threads <- 2L
  mmcif_obj <- mmcif_data(
    ~ country - 1, prt_use, status, time, id, max_time,
    2L, strata = country)

  # get the staring values
  start_vals <- mmcif_start_values(mmcif_obj, n_threads = n_threads)

  # the starting values
  print(start_vals)
}

Package 'mmcif'

Help Index

Computes the Log Cholesky Decomposition and the Inverse

Description

Usage

Arguments

Value

Examples

Sets up an Object to Compute the Log Composite Likelihood

Description

Usage

Arguments

Value

References

See Also

Examples

Fits a Mixed Competing Risk Model

Description

Usage

Arguments

Value

References

See Also

Examples

Evaluates the Log Composite Likelihood and its Gradient

Description

Usage

Arguments

Value

Examples

Computes Marginal Measures Using Two Observations

Description

Usage

Arguments

Value

See Also

Examples

Computes Marginal Measures for One Observation

Description

Usage

Arguments

Value

See Also

Examples

Computes the Sandwich Estimator

Description

Usage

Arguments

Value

References

See Also

Examples

Finds Staring Values

Description

Usage

Arguments

Value

Examples