data-sources.qmd

---
title: "Data Sources"
---

## Assembling Source Data

Where and how you access the source data for your telemetry study will depend
on the type of tag and vendor. Argos location data is available to users from the 
Argos website and many third party sites and repositories (e.g. movebank, 
sea-turtle.org) have API connections to ArgosWeb that can facilitate data access.
Each tag manufacturer often has additional data streams included within the
Argos transmission that require specific software or processing to translate.
Both the Sea Mammal Research Unit (SMRU) and Wildlife Computers provide online
data portals that provide users access to the location and additional sensor
data.

Regardless of how the data are retrieved, these files should be treated as read
only and not edited. ArgosWeb and vendors have, typically, kept the data formats
and structure consistent over time. This affords end users the ability to develop
custom processing scripts without much fear the formats will change and break
their scripts.

## Data Management

The quantity and diversity of data generated from bio-logging studies can
quickly become overwhelming especially in situations where a single device
is deployed for several months and collects data from multiple on-board
sensors. For some research projects, storing these data in a central
relational database (e.g. PostgreSQL) can have important benefits. If
storing bio-logger data in a relational database, explore one that provides
native support or extensions for spatial data types and operations (e.g. PostGIS,
SpatiaLite). For many, though, simply storing data as and organized collection
of plain text files (e.g. *.csv) is sufficient. This approach can be further
improved by converting the comma-separated files to `parquet` files and leaning
on the the Apache `arrow` package for R. These `parquet` files provide 
modern and efficient file compression, a columnar-based structure, and are
optimized for big data. Even if a project starts as small data, it's often
useful to design your data management with growth in mind.

## Wildlife Computers Data Sources

We have the most first-hand experience with data originating from Wildlife
Computers devices and data that have been processed through the Wildlife
Computers Data Portal. This doesn't mean the techniques, packages, and
analysis presented require Wildlife Computers devices. Or, that most of what
is presented is not applicable to other bio-logging data sources. When
possible, we've tried to create examples and processes that are agnostic to
the type of deployed device.

:::{.callout-tip appearance="minimal"}
## The `wcUtils` Package

Frequent users of the Wildlife Computers Data Portal or data from their
bio-loggers processed through the DAP program may find the `wcutils` package
for R useful. This package is maintained by Josh London and provides several
utility function for downloading data from the WCDP and for tidy processing
of typical data files.
:::