Prepare data for the occARU model
make_data.RdTransforms raw deployment and observation data into a named list suitable
for passing to occARU(). Follows the
camtrapDP data format by default.
Site coordinates are automatically projected from WGS84 latitude/longitude
to UTM (km), with the zone auto-detected from the mean longitude.
Usage
make_data(
deployments,
observations,
failures = NULL,
locationID = locationID,
deploymentStart = deploymentStart,
deploymentEnd = deploymentEnd,
latitude = latitude,
longitude = longitude,
region = region,
season = season,
eventStart = eventStart,
scientificName = scientificName,
count = count,
failureStart = failureStart,
failureEnd = failureEnd,
survey_length = 1L,
thin_minutes = 30,
day_start = c("midday", "midnight"),
occupancy_site_predictors = NULL,
detection_site_predictors = NULL,
survey_predictors = NULL,
date = date,
summary_functions = NULL,
scale_predictors = TRUE,
verbose = TRUE
)Arguments
- deployments
A dataframe of deployment information, one row per site (and potentially season). Must contain columns
locationID,deploymentStart, anddeploymentEnd(or equivalents specified via the corresponding arguments). Optionally,latitudeandlongitudecolumns enable the spatial Gaussian process. If multiple seasons, must also contain columnseason.- observations
A dataframe of observation records. Must contain columns
locationID,eventStart,scientificName, andcount(or equivalents specified via the corresponding arguments). If multiple seasons, must also contain columnseason.- failures
Optional dataframe of ARU failure periods. Must contain columns
locationID,failureStart, andfailureEnd, with each row corresponding to one failure period at alocationIDfromfailureStarttofailureEnd(inclusive). Seefind_failures().- locationID
<
data-masking> Column name for sites (ARUs). Retains levels if supplied as factor. Default:locationID.- deploymentStart
<
data-masking>Date. Column name for deployment start dates indeployments. Default:deploymentStart.- deploymentEnd
<
data-masking>Date.Column name for deployment end dates indeployments. Default:deploymentEnd.- latitude
<
data-masking>numeric. Column name for WGS84 latitude indeployments. If omitted alongsidelongitude, no spatial Gaussian process is fitted. Default:latitude.- longitude
<
data-masking>numeric. Column name for WGS84 longitude indeployments. Default:longitude.- region
<
data-masking> Optional column indeploymentsspecifying region, defined as a cluster of ARUs. Leads to faster model fits when spatial site effects are included inoccARU(). If the column is not present indeployments, all observations are treated as a single region. Default:region.- season
<
data-masking> Optional column specifying season indeployments. The column must be a factor to ensure correct ordering. If the column is not present, a single season is assumed. Default:season. Seefind_seasons().- eventStart
<
data-masking>POSIXt. Column name for observation timestamps inobservations. Default:eventStart.- scientificName
<
data-masking> Column name for species names inobservations. Retains levels if supplied as factor. Default:scientificName.- count
<
data-masking>integerish. Column name for number of individuals per observation record. Default:count.- failureStart
<
data-masking>Date. Column name for failure start dates infailures. Default:failureStart.- failureEnd
<
data-masking>Date. Column name for failure end dates (inclusive) infailures. Default:failureEnd.- survey_length
Positive integer. Defines the length of each survey period in days. Observations are aggregated within each survey period by summing
count, and recording effort (Delta) is computed as the fraction of the survey length the ARU was active. For example,survey_length = 7aggregates to weekly survey periods, withDeltaranging from 0 (ARU failed all week) to 1 (ARU active all week). Longer periods reduce the number of surveysJbut increase the counts per survey, trading off temporal resolution against model complexity and the closure assumption within a survey period. Default:1L.- thin_minutes
Non-negative numeric. If supplied, observations within
thin_minutesminutes of each other (per site and species) are thinned to a single observation, retaining the record with the highestcount. Thinning is performed viathin_observations(). Default:30.- day_start
Whether survey days start at
"midnight"or"midday". Default:"midday".- occupancy_site_predictors
Optional dataframe of site-level covariates for the occupancy submodel. Must contain a
locationIDcolumn with the same entries asdeployments. Predictor columns must benumeric(continuous),factor(unordered categorical), orordered factor(ordinal). If multiple seasons, eachlocationIDrequires a value for eachseasonit was deployed.- detection_site_predictors
Optional dataframe of site-level covariates for the detection submodel. Same column-type rules as
occupancy_site_predictors. If identical tooccupancy_site_predictors, the same matrices are reused.- survey_predictors
Optional dataframe of site-by-survey level covariates, with one row per site and date. Must contain
locationIDanddatecolumns. Predictor columns follow the same type rules as the site-level predictor dataframes. Must cover the full deployment period for eachlocationID.- date
<
data-masking> Column name for dates insurvey_predictors. Default:date.- summary_functions
An optional named list mapping continuous survey predictor column names to summary functions, used when aggregating survey predictors over
survey_length-length periods. Each value can be a function name as a string (e.g."sum") or a function object (e.g.sum). Numeric predictors not named insummary_functionsare summarised withmean; categorical and ordinal predictors are summarised with the modal value. Default:NULL.- scale_predictors
Logical. If
TRUE, continuous predictors are scaled to zero mean and unit variance. Survey predictors are scaled using parameters derived from site-averaged values per survey period (a[P, J]matrix) rather than the raw[I, P, J]array, so that spatial variation across sites does not inflate the scaling. Scaling parameters (means and SDs) are stored as an attribute. Default:TRUE.- verbose
Logical. If
TRUE(default), prints data.
Value
A named list of class "occARU_data" containing all inputs
required by the occARU Stan model, except for model specification
arguments which are added by occARU(). The list contains:
INumber of sites (ARUs).
RNumber of regions (groups of sites).
JNumber of survey periods (maximum).
KNumber of seasons.
SNumber of species.
tauInterval length in years between end of previous deploymment and start of current deployment (if multiseason).
dynIndicator for dynamic occupancy, when at least one site was deployed over multiple seasons.
Delta[K, J, I]array of recording effort (0-1).y[K, I, J, S]array of detection counts.XY[I, 2]matrix of UTM coordinates in km, or zeros if coordinates not supplied.PInteger vector of length 3: number of continuous predictors for occupancy, and site and survey detection.
P_catInteger vector of length 3: number of categorical predictors for each component.
P_ordInteger vector of length 3: number of ordinal predictors for each component.
X1[K, I, P[1]]occupancy continuous design array.X_cat1[K, I, P_cat[1]]occupancy categorical integer array.X_ord1[K, I, P_ord[1]]occupancy ordinal integer array.X2[K, I, P[2]]site-level detection continuous design array.X_cat2[K, I, P_cat[2]]site-level detection categorical integer array.X_ord2[K, I, P_ord[2]]site-level detection ordinal integer array.X3[K, I, J, P[3]]site-by-survey level detection continuous array.X_cat3[K, I, J, P_cat[3]]site-by-survey categorical integer array.X_ord3[K, I, J, P_ord[3]]site-by-survey survey ordinal integer array.
The object also carries the following attributes, accessible via
attr():
deploymentsThe processed
deployments.observationsThe processed and thinned
observationsaggregated to the chosen survey length.sitesCharacter vector of site identifiers.
regionsCharacter vector of region identifiers.
surveystibble of start dates and indices for each survey period per season.
seasonsCharacter vector of season identifiers.
speciesCharacter vector species names.
utm_crsCharacter. PROJ string of the UTM coordinate reference system used to transform site coordinates, or
NULLif no coordinates were supplied.scalingtibble of means and standard deviations used to standardise continuous predictors, or
NULLifscale_predictors = FALSE.levelsNamed list of category levels for categorical and ordinal predictors.
reference_datesFirst deploymentStart per season.
survey_lengththin_minutesday_start
See also
occARU(), plot_deployments(), plot_observations(),
thin_observations(), find_failures(), find_seasons()
The model is described in detail in vignette("model").