Commit a6fe7ee6 authored by Timm Schoening's avatar Timm Schoening
Browse files


parent 0d2825ec
# Standard operating procedures
# Introduction
Publishing marine image data in a FAIR and open way requires data curation for both image data and image metadata. These quality control steps need to follow common standard operating procedures (SOPs) to facilitate joint data interpretation. This repository mainly collects SOPs for the QA/QC steps between acquisition and publication. It also contains some example SOPs on image acquisition for the interested user. It does not yet provide operational steps for the publication phase (e.g. physical file transfer to Pangaea).
# Fundamentals
## Data Structure:
How to structure data on disk should not be enforced by any SOP. Anyhow, we recommend the following structure to aid the automation of data curation and publication workflows. It is based on several best-practices (e.g.
├── <event_1>
│ ├── <sensor_x>
│ │ ├── external/ (Optional) External data that affects the creation of raw data(e.g. calibration curves)
│ │ └── raw/ The raw data as recorded by the sensor(e.g. acoustic soundings)
│ │ └── intermediate/ (Optional) Intermediate data that will not be archived. Playground or sandbox for working with the raw data
│ │ └── processed/ Processed data that has been QA/QC'd and is ready for publication (e.g. map grids)
│ │ └── data_products/ (Optional) Dataproducts created from the processed data for visualization or as combinations of data of several events (e.g. geological maps)
│ │ └── protocol/ │Documentation on how the data was created, curated, processed, visualized, etc.
│ └── <sensor_y> Same as above for the next sensor deployed during this event
│ └── protocol/ General information on this event (e.g. ROV deployment plan)
└── <event_2> The same as above for the next event
├── <sensor_x>
└── <sensor_z>
On German research vessels, the "scientists folder" on the network or the new "Mass-Data-Module" (MDM, installed in 2021) will mostly act as the root folder `/<volume>/<project>` but for some researchers, who bring their own mass storage or NAS devices, it may be some path on their own hardware. Some disciplines/groups like to split their data by sensor first. This is not recommended but certainly possible. In that case, the paths would look like this:
├── <sensor_x>
│ ├── <event_1>
│ └── <event_2>
└── <sensor_y>
├── <event_1>
└── <event_3>
## Provenance documentation:
Provenance documentation of (automated) SOP steps is required to enable reusability of data and validity checks. Provenance information needs to document the entities, agents and activities and should facilitate reproducibility but mainly document execution steps rather than enable the fully automated re-execution which would require automated setup of the software environment (through Docker etc.). Provenance of individual SOP steps should be recorded in a machine-readable fashion (i.e. a **yaml** or json file) like so:
path: </path/to/executable>
hash: <md5 hash of executable binary>
time: <utc time of execution, milliseconds since epoch>
version: <version string of executable>
- name: <param-x_name>
value: <param-x_value>
[hash: md5 hash of file at <param-x_value> (optional, only for files)]
- name: <param-y_name>
value: <param-y_value>
# Standard operating procedures (SOPs)
## QA/QC steps between acquisition and publication
- Creating navigation data per image item (in prep.)
- Creating image FAIR Digital Objects (iFDOs) for a deployment (in prep.)
- Creating proxy FAIR Digital Objects (pFDOs) for a deployment (in prep.)
- Creating semantic FAIR Digital Objects (sFDOs) for a deployment (in prep.)
## Image acquisition
- At sea using ROVs (in prep.)
- At sea using AUVs (in prep.)
- At sea using OFOSs (in prep.)
## Other
- Publishing image data in Pangaea (in prep.)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment