Fit the occARU model
fit_model.RdFits a Bayesian multispecies occupancy model with count observation model to
data prepared by make_data() in Stan. Requires CmdStan >= 2.36.0, which can
be installed with setup_occARU().
Usage
fit_model(
data,
stan_file = NULL,
spatial = c("gp", "mvn", "none"),
temporal = c("gp", "mvn", "none"),
periodic_gp = FALSE,
period = NULL,
species_length_scales = FALSE,
project_kappa = TRUE,
overdispersion = c("none", "nb", "olre"),
variance_decomposition = c("dirichlet", "logistic-normal"),
latent = TRUE,
loo_draws = 100L,
ppc = c("Q", "y", "both", "none"),
prior = set_priors(verbose = FALSE),
init = "pathfinder",
pathfinder_args = list(),
threads = 1L,
grainsize = 1L,
...
)Arguments
- data
A
occARU_dataobject produced bymake_data().- stan_file
character. Path to a custom Stan file. IfNULL(default), uses the built-in multispecies occARU model, or the single species version if only one species is included. Intended for advanced users who have modified the Stan program; note that custom models will likely require corresponding changes to the output ofmake_data().- spatial
character. Structure of site-level random effects."gp"(default) fits a hierarchical multi-species spatial Gaussian process with exponentiated quadratic kernel, which is the recommended option."mvn"fits an unstructured multivariate normal, and"none"omits site-level random effects entirely.- temporal
character. Structure of survey-level random effects."gp"(default) fits a hierarchical multi-species temporal Gaussian process with exponentiated quadratic kernel, which is the recommended option."mvn"fits an unstructured multivariate normal, and"none"omits survey-level random effects entirely.- periodic_gp
logical. IfTRUE, a periodic kernel is added to the temporal GP kernel. Only used whentemporal = "gp". Default:FALSE.- period
Positive numeric. Period length in survey units (i.e. number of survey periods per cycle). Only used when
temporal = "gp"andperiodic_gp = TRUE. Defaults to365 / survey_length, corresponding to an annual cycle. For example, withsurvey_length = 7the default isperiod = 52.1. Override if your data span a different temporal cycle.- species_length_scales
logical. IfTRUE, species-specific GP length scales are estimated for each kernel, each drawn independently from the shared length scale priors. Note that enabling this requires additional Cholesky decompositions per species and GP per iteration, which can substantially increase sampling time. IfFALSE(default), only one Cholesky decomposition is performed per GP. Only used when multiple species are included, andspatial = "gp"ortemporal = "gp".- project_kappa
logical. IfTRUE(default), uses orthogonal projection for random survey effects using the site-averaged survey predictor design matrix. Ignored when no survey predictors are provided.- overdispersion
character. Overdispersion model for the observation process. One of"none"(Poisson, default),"nb"(negative binomial), or"olre"(correlated observation-level random effects).- variance_decomposition
character. Prior for variance partitions. One of"dirichlet"(default) or"logistic-normal".- latent
logical. IfTRUE(default), latent occupancy stateszare recovered for each species using the forward-backward sampling algorithm.- loo_draws
Non-negative integer. Number of Monte Carlo draws for marginal log-likelihood estimation of site-level random effects and/or Poisson OLREs for PSI-LOO-CV via
loo::loo(). Default:100, which produces an additional[S, I]matrixlog_lik2by marginalising over site effects (and OLRE residuals if applicable) via Monte Carlo integration.log_lik2is recommended overlog_likfor PSIS-LOO-CV as it produces better Pareto-k diagnostics. Set to0to disable, returning onlylog_lik. Only used whenspatialis not"none"oroverdispersion = "olre".- ppc
character. Posterior predictive checks to compute. One of"Q"(default),"y","both", or"none"."y"returns the full[I, J, S]prediction array (yrep);"Q"returns only aggregated counts[I, S](Qrep). For large datasets,"Q"or"none"can substantially reduce memory usage and sampling time.- prior
An
occARU_priorsobject fromset_priors(). If omitted, default priors are used.- init
character,numeric, orlist. Initialisation strategy passed to cmdstanr::CmdStanModel$sample(). One of:"pathfinder"Default. Use pathfinder to generate initial values (see cmdstanr::CmdStanModel
$pathfinder()). Recommended for complex models as it can substantially reduce warmup time and improve convergence.- A numeric scalar
Initialise all parameters uniformly in \([-\)
init\(,\)init\(]\).- A list
Custom initial values passed directly to cmdstanr::CmdStanModel
$sample().
- pathfinder_args
Named list of additional arguments passed to cmdstanr::CmdStanModel
$pathfinder()wheninit = "pathfinder". Overrides defaults (refresh = 0,sig_figs = 14,init = 0.1,num_paths = chains,num_threads = chains). Default:list().- threads
Positive integer. Number of threads for within-chain parallelisation via
reduce_sum(). Default:1(no parallelisation). The total number of threads used isthreads * chains, so for optimal performance setthreads = floor(available_cores / chains). For example, 8 cores with 4 chains givesthreads = 2.- grainsize
Positive integer. Chunk size (number of sites) for within-chain parallelisation via
reduce_sum(). Only used whenthreads > 1. Default:1, which lets Stan automatically determine the optimal chunk size. Increase if you have many sites and want to reduce parallelisation overhead. See the Stan User's Guide for details on tuning grainsize.- ...
Additional arguments passed to cmdstanr::CmdStanModel
$pathfinder(). Uses parallel chains by default. All other sampling arguments use Stan defaults.
Value
A CmdStanFit object with occARU-specific attributes attached:
stan_dataThe full Stan data list passed to the model, including prior hyperparameters.
occARU_dataThe original
occARU_dataobject frommake_data().
See also
make_data(), set_priors(), setup_occARU(),
cmdstanr::CmdStanMCMC for methods on the fitted model object.
The statistical model is described in
vignette("model", package = "occARU").