Prepare data for the occARU model
make_data.RdTransforms raw deployment and observation data into a named list suitable
for passing to fit_model(). Follows the
camtrapDP data format by default.
Site coordinates are automatically projected from WGS84 latitude/longitude
to UTM (km), with the zone auto-detected from the mean longitude.
Usage
make_data(
deployments,
observations,
failures = NULL,
deploymentID = deploymentID,
deploymentStart = deploymentStart,
deploymentEnd = deploymentEnd,
latitude = latitude,
longitude = longitude,
season = season,
eventStart = eventStart,
scientificName = scientificName,
count = count,
failureStart = failureStart,
failureEnd = failureEnd,
survey_length = 1L,
thin_minutes = 30,
day_start = c("midday", "midnight"),
occupancy_site_predictors = NULL,
detection_site_predictors = NULL,
survey_predictors = NULL,
date = date,
summary_functions = NULL,
scale_predictors = TRUE,
verbose = TRUE
)Arguments
- deployments
A dataframe of deployment information, one row per site (and potentially season). Must contain columns
deploymentID,deploymentStart, anddeploymentEnd(or equivalents specified via the corresponding arguments). Optionally,latitudeandlongitudecolumns enable the spatial Gaussian process. If multiple seasons, must also contain columnseason.- observations
A dataframe of observation records. Must contain columns
deploymentID,eventStart,scientificName, andcount(or equivalents specified via the corresponding arguments). If multiple seasons, must also contain columnseason.- failures
Optional dataframe of ARU failure periods. Must contain columns
deploymentID,failureStart, andfailureEnd, with each row corresponding to one failure period at adeploymentIDfromfailureStarttofailureEnd(inclusive). Seefind_failures().- deploymentID
<
data-masking> Column name for sites (ARUs). Retains levels if supplied as factor. Default:deploymentID.- deploymentStart
<
data-masking>Date. Column name for deployment start dates indeployments. Default:deploymentStart.- deploymentEnd
<
data-masking>Date.Column name for deployment end dates indeployments. Default:deploymentEnd.- latitude
<
data-masking>numeric. Column name for WGS84 latitude indeployments. If omitted alongsidelongitude, no spatial Gaussian process is fitted. Default:latitude.- longitude
<
data-masking>numeric. Column name for WGS84 longitude indeployments. Default:longitude.- season
<
data-masking> Optional column specifying season. The column must be a factor to ensure correct ordering. If the column is not present indeployments, all observations are treated as a single season. Default:season.- eventStart
<
data-masking>POSIXt. Column name for observation timestamps inobservations. Default:eventStart.- scientificName
<
data-masking> Column name for species names inobservations. Retains levels if supplied as factor. Default:scientificName.- count
<
data-masking>integerish. Column name for number of individuals per observation record. Default:count.- failureStart
<
data-masking>Date. Column name for failure start dates infailures. Default:failureStart.- failureEnd
<
data-masking>Date. Column name for failure end dates (inclusive) infailures. Default:failureEnd.- survey_length
Positive integer. Defines the length of each survey period in days. Observations are aggregated within each survey period by summing
count, and recording effort (Delta) is computed as the fraction of the survey length the ARU was active. For example,survey_length = 7aggregates to weekly survey periods, withDeltaranging from 0 (ARU failed all week) to 1 (ARU active all week). Longer periods reduce the number of surveysJbut increase the counts per survey, trading off temporal resolution against model complexity and the closure assumption within a survey period. Default:1L.- thin_minutes
Non-negative numeric. If supplied, observations within
thin_minutesminutes of each other (per site and species) are thinned to a single observation, retaining the record with the highestcount. Thinning is performed viathin_observations(). Default:30.- day_start
Whether survey days start at
"midnight"or"midday". Default:"midday".- occupancy_site_predictors
Optional dataframe of site-level covariates for the occupancy submodel. Must contain a
deploymentIDcolumn with the same entries asdeployments. Predictor columns must benumeric(continuous),factor(unordered categorical), orordered factor(ordinal). If multiple seasons, eachdeploymentIDrequires a value for eachseasonit was deployed.- detection_site_predictors
Optional dataframe of site-level covariates for the detection submodel. Same column-type rules as
occupancy_site_predictors. If identical tooccupancy_site_predictors, the same matrices are reused.- survey_predictors
Optional dataframe of site-by-survey level covariates, with one row per site and date. Must contain
deploymentIDanddatecolumns. Predictor columns follow the same type rules as the site-level predictor dataframes. Must cover the full deployment period for eachdeploymentID.- date
<
data-masking> Column name for dates insurvey_predictors. Default:date.- summary_functions
An optional named list mapping continuous survey predictor column names to summary functions, used when aggregating survey predictors over
survey_length-length periods. Each value can be a function name as a string (e.g."sum") or a function object (e.g.sum). Numeric predictors not named insummary_functionsare summarised withmean; categorical and ordinal predictors are summarised with the modal value. Default:NULL.- scale_predictors
Logical. If
TRUE, continuous predictors are scaled to zero mean and unit variance. Survey predictors are scaled using parameters derived from site-averaged values per survey period (a[P, J]matrix) rather than the raw[I, P, J]array, so that spatial variation across sites does not inflate the scaling. Scaling parameters (means and SDs) are stored as an attribute. Default:TRUE.- verbose
Logical. If
TRUE(default), prints data.
Value
A named list of class "occARU_data" containing all inputs
required by the occARU Stan model, except for model specification
arguments which are added by fit_model(). The list contains:
INumber of sites (ARUs).
JNumber of survey periods.
KNumber of seasons (if multiseason).
SNumber of species.
Delta[I, J(, K)]array of recording effort (0-1).y[I, J(, K), S]array of detection counts.XY[I, 2]matrix of UTM coordinates in km, or zeros if coordinates not supplied.PInteger vector of length 3: number of continuous predictors for occupancy, and site and survey detection.
P_catInteger vector of length 3: number of categorical predictors for each component.
P_ordInteger vector of length 3: number of ordinal predictors for each component.
X1[P[1](, K), I]occupancy continuous design array.X_cat1[P_cat[1](, K), I]occupancy categorical integer array.X_ord1[P_ord[1](, K), I]occupancy ordinal integer array.X2[P[2](, K), I]site-level detection continuous design array.X_cat2[P_cat[2](, K), I]site-level detection categorical integer array.X_ord2[P_ord[2](, K), I]site-level detection ordinal integer array.X3[I(, K), P[3], J]site-by-survey level detection continuous array.X_cat3[I(, K), P_cat[3], J]site-by-survey categorical integer array.X_ord3[I(, K), P_ord[3], J]site-by-survey survey ordinal integer array.
The object also carries the following attributes, accessible via
attr():
sitesCharacter vector of site identifiers.
surveystibble of start dates and indices for each survey period per season.
seasonsCharacter vector of season identifiers.
speciesCharacter vector species names.
utm_crsCharacter. PROJ string of the UTM coordinate reference system used to transform site coordinates, or
NULLif no coordinates were supplied.scalingtibble of means and standard deviations used to standardise continuous predictors, or
NULLifscale_predictors = FALSE.levelsNamed list of category levels for categorical and ordinal predictors
survey_lengththin_minutesreference_datesday_start
See also
fit_model(), thin_observations(), find_failures()
The model is described in detail in vignette("model", package = "occARU").