Commit 4fd3464b authored by Timm Schoening's avatar Timm Schoening
Browse files
parent 41b41bdf
## FAIRness
The buzzterm [FAIR](https://www.go-fair.org/fair-principles/) is an acronym for Findable, Accessible, Interoperable and Reusable. Which you probably knew already. Sorry. But in case you didn't: its the big idea, currently driving research data management (RDM) and a massive challenge for RDM infrastructure providers, maintainers and users. But it is also a fantastic opportunity to open up research by making research efforts more prominent and results more reliable. It appears everywhere nowadays, most importantly in project calls and thus project proposals. So while there is also no way around it, embracing this new data culture will change the face of research for the better!
## FAIR marine images
Data management for images is hard. Much harder than most other data domains in research. The data volume is massive (> 1TB per deployment) and the data is inherently unstructured and thus not directly comprehensible for humans or machines. So of course, making marine imagery FAIR is also a major challenge but particularly worthwile. It will i) simplify working and researching with image data by providing well-structured information and formats, ii) it will simplify software development by providing community-driven interfaces and iii) it will allow researchers to gain credit for imaging efforts by making image data publicly visible - for collaboration, inspiration and to advertise one's research.
Curiously, FAIR does not necessarily mean open. So while - in theory - your data can be FAIR it can still be very much locked up. This would go against the principles and vision outlined above, so in the context of the MareHub Ag Videos/Images FAIR also always means open. Period.
(Well 99% period: we accept that some research requires obfuscating parts of the data or metadata. This includes personal rights, safety concerns and maybe others. In case you are in doubt you probably belong to the 99%. Although these exceptions exist, you will certainly know whether your research falls amongst the 1%).
## FAIRness of marine images
What FAIR means and is and how to become FAIR is an active field of research. There are vast amounts of information available: e.g. by the [EOSC](https://eosc-portal.eu/) (European Open Science Could) or the [RDA](https://rd-alliance.org/) (Research Data Alliance). Their working groups produce fantastic implementation guidelines for users, managers, governing agencies etc. including ones on _Metrics to assess FAIRness_ (by EOSC: [Recommendations on FAIR metrics](https://doi.org/10.2777/70791), by RDA: [FAIR Data Maturity Model](https://doi.org/10.15497/rda00050)).
The following table applies these metrics to marine images to track our status at achieving our goal: to make marine images FAIR.
| [FAIR principle](https://www.go-fair.org/fair-principles/) | [FAIR Metrics ID](https://doi.org/10.2777/70791) | Indicator | Priority | Marine images |
| - | - | - | - | - |
| F1 | RDA-F1-01M | Metadata is identified by a persistend identifier | Essential | [ ] Handle
| F1 | RDA-F1-01D | Data is identified by a persistent identifier | Essential | [ ] Handle |
| F1 | RDA-F1-02M | Metadata is identified by a globally unique identifier | Essential | [x] UUID |
| F1 | RDA-F1-02D | Data is identified by a globally unique identifier | Essential | [x] UUID |
| F2 | RDA-F2-01M | Rich metadata is provided to allow discovery | Essential | [x] [iFDO](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/blob/master/MareHub_AGVI_iFDO.md) |
| F3 | RDA-F3-01M | Metadata includes the indentifier for the data | Essential | [x] iFDO |
| F4 | RDA-F4-01M | Metadata is offered in such a way that it can be harvested and indexed | Essential | [ ] OSIS-API |
| A1 | RDA-A1-01M | Metadata contains information to enable the user to get access to the data | Important | [x] iFDO |
| A1 | RDA-A1-02M | Metadata can be accessed manually | Essential | [x] [OSIS-www](https://osis.geomar.de/app) |
| A1 | RDA-A1-02D | Data can be accessed manually | Essential | [x] Elements |
| A1 | RDA-A1-03M | Metadata identifier resolves to a metadata record | Essential | [x] iFDO |
| A1 | RDA-A1-03D | Data identifiert resolves to a digital object | Essential | [x] iFDO |
| A1 | RDA-A1-04M | Metadata is acccessed through standardised protocol | Essential | [x] https |
| A1 | RDA-A1-04D | Data is accessible through standardised protocol | Essential | [x] https |
| A1 | RDA-A1-05D | Data can be accessed automatically | Important | [x] Elements-API |
| A1.1 | RDA-A1.1-01M | Metadata is accessible through a free access protocol | Essential | [x] https |
| A1.1 | RDA-A1.1-01D | Data is accessible through a free access protocol | Important | [x] https |
| A1.2 | RDA-A1.2-01D | Data is accessible through an access protocol that supports authentication and authorisation | Useful | [ ] _No_ |
| A2 | RDA-A2-01M | Metadata is guaranteed to remain available after data is no longer available | Essential | [x] Yes |
| I1 | RDA-I1-01M | Metadata uses knowledge representation expressed in standardised format | Important | [x] [MVP](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos) |
| I1 | RDA-I1-01D | Data uses knowledge representation expressed in standardised format | Important | [x] jpg, tif, png |
| I1 | RDA-I1-02M | Metadata uses machine-understandable knowledge representation | Important | [x] FDO |
| I1 | RDA-I1-02D | Data uses machine-understandable knowledge representation | Important | [x] jpg, tif, png |
| I2 | RDA-I2-01M | Metadata uses FAIR-compliant vocabularies | Important | [ ] MVP Git? |
| I2 | RDA-I2-01D | Data uses FAIR-compliant vocabularies | Useful | [x] MVP Git |
| I3 | RDA-I3-01M | Metadata includes references to other metadata | Important | [x] Orcid, URN |
| I3 | RDA-I3-01D | Data includes references to other data | Useful | [ ] _No_ |
| I3 | RDA-I3-02M | Metadata includes refereces to other data | Useful | [ ] _No_ |
| I3 | RDA-I3-02D | Data includes qualified references to other data | Useful | [ ] _No_ |
| I3 | RDA-I3-03M | Metadata includes qualified references to other metadata | Important | [ ] **No** |
| I3 | RDA-I3-04M | Metadata includes qualified references to other data | Useful | [ ] _No_ |
| R1 | RDA-R1-01M | Plurality of accurate and relevant attributes are provided to allow reuse | Essential | [x] iFDO |
| R1.1 | RDA-R1.1-01M | Metadata includes information about the license under which the data can be reused | Essential | [x] iFDO |
| R1.1 | RDA-R1.1-02M | Metadata refers to a standard reuse license | Important | [x] iFDO |
| R1.1 | RDA-R1.1-03M | Metadata refers to a machine-understandable reuse license | Important | [x] iFDO |
| R1.2 | RDA-R1.2-01M | Metadata includes provenance information according to community-specific standards | Important | [ ] **No** |
| R1.2 | RDA-R1.2-02M | Metadata includes provenance information according to a cross-community language | Useful | [ ] _No_ |
| R1.3 | RDA-R1.3-01M | Metadata complies with a community standard | Essential | [x] MVP |
| R1.3 | RDA-R1.3-01D | Data complies with a communty standard | Essential | [x] MVP |
| R1.3 | RDA-R1.3-03M | Metadata is expressed in compliance with a machine-understanable community standard | Essential | [x] MVP |
| R1.3 | RDA-R1.3-02D | Data is expressed incompliance with a machine-understandable community standard | Important | **No** |
_(as of April 2021 - while there are still challenges to overcome, we are close to achieving FAIRness - at least conceptually)_
Creative Commons Attribution-NonCommercial 4.0 International Public License
By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
Section 1 – Definitions.
Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.
Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.
Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
Licensor means the individual(s) or entity(ies) granting rights under this Public License.
NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange.
Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning.
Section 2 – Scope.
License grant.
Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and
produce, reproduce, and Share Adapted Material for NonCommercial purposes only.
Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
Term. The term of this Public License is specified in Section 6(a).
Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material.
Downstream recipients.
Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
Other rights.
Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
Patent and trademark rights are not licensed under this Public License.
To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes.
Section 3 – License Conditions.
Your exercise of the Licensed Rights is expressly made subject to the following conditions.
Attribution.
If You Share the Licensed Material (including in modified form), You must:
retain the following if it is supplied by the Licensor with the Licensed Material:
identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
a copyright notice;
a notice that refers to this Public License;
a notice that refers to the disclaimer of warranties;
a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
indicate if You modified the Licensed Material and retain an indication of any previous modifications; and
indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.
You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.
If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.
Section 4 – Sui Generis Database Rights.
Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:
for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only;
if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and
You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.
For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
Section 5 – Disclaimer of Warranties and Limitation of Liability.
Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.
To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.
The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.
Section 6 – Term and Termination.
This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:
automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
upon express reinstatement by the Licensor.
For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
Sections 1, 5, 6, 7, and 8 survive termination of this Public License.
Section 7 – Other Terms and Conditions.
The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
Section 8 – Interpretation.
For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
# iFDO capture fields
Information on how image data was captured can be crucial to understand information extracted from the images. It is thus highly recommended to enrich all iFDOs with capture information. Some capture metadata are specified here for the purpose of promoting imagery in the marine data portal and in other contexts. The potential metadata in the iFDO capture fields is expected to grow with time, as additional (marine) imaging domains make use of this concept. Anyhow, below you find a pool of iFDO capture fields which are highly recommened to be added to your iFDO. Only with these fields populated will your dataset shine in the marine data portal!
## File format
All iFDO capture fields shall be stored alongside the core metadata in your iFDO file! It does not take up a specific section of the file, rather the values are intermixed into the image-set-header and image-set-items section!
# Recommended iFDO capture fields (essential to be part of the marine data portal!)
| Field | Format / Values / Unit | Comment |
| ----- | ---------------------- | ------- |
| image-set-acquisition | photo, video, scan | photo: still images, video: moving images, scan: microscopy / slide scans |
| image-set-quality | raw, processed, product | raw: straight from the sensor, processed: QA/QC'd, product: image data ready for interpretation |
| image-set-deployment | mapping, stationary, survey, exploration, experiment, sampling | mapping: planned path execution along 2-3 spatial axes, stationary: fixed spatial position, survey: planned path execution along free path, exploration: unplanned path execution, experiment: observation of manipulated environment, sampling: ex-situ imaging of samples taken by other method |
| image-set-navigation | satellite, beacon, transponder, reconstructed | satellite: GPS/Galileo etc., beacon: USBL etc., transponder: LBL etc., reconstructed: position estimated from other measures like cable length and course over ground |
| image-set-scale-reference | 3D camera, calibrated camera, laser marker, optical flow | 3D camera: the imaging system provides scale directly, calibrated camera: image data and additional external data like object distance provide scale together, laser marker: scale information is embedded in the visual data, optical flow: scale is computed from the relative movement of the images and the camera navigation data |
| image-set-illumination | sunlight, artificial light, mixed light | sunlight: the scene is only illuminated by the sun, artificial light: the scene is only illuminated by artificial light, mixed light: both sunlight and artificial light illuminate the scene |
| image-set-resolution | km, hm, dam, m, cm, mm, µm | average size of one pixel of an image |
| image-set-marine-zone | seafloor, water column, sea surface, atmosphere, laboratory | seafloor: images taken in/on/right above the seafloor, water column: images taken in the free water without the seafloor or the sea surface in sight, sea surface: images taken right below the sea surface, atmosphere: images taken outside of the water, laboratory: images taken ex-situ |
| image-set-spectral-resolution | grayscale, rgb, multi-spectral, hyper-spectral | grayscale: single channel imagery, rgb: three channel imagery, multi-spectral: 4-10 channel imagery, hyper-spectral: 10+ channel imagery |
| image-set-capture-mode | timer,manual,mixed | whether the time points of image capture were systematic, human-truggered or both |
| image-area | Float [m^2] | The footprint of the entire image in square meters |
| image-pixel-per-millimeter | Float [px/mm^2 = MPx/m^2] | Resolution of the imagery in pixels / millimeter which is identical to megapixel / square meter |
| image-meters-above-ground | Float [m] | Distance of the camera to the seafloor |
| image-acquisition-settings | **yaml**/json, free keys | All the information that is recorded by the camera in the EXIF, IPTC etc. As a dict. Includes ISO, aperture, etc. |
| image-camera-intrinsics | **yaml**/json, free keys | 4x3 K matrix encoding the six intrinsic parameters (f, m_x,m_y,u_o,v_o, gamma): focal length [px], inverse pixel width & height, principal point x/y, skew coefficient. |
| image-camera-extrinsics | **yaml**/json, free keys | 4x4 pose matrix (R,T). See: http://dx.doi.org/10.1201/9781315368597 |
# Further domain-specific or otherwise valuable iFDO capture fields
| Field | Format / Values / Unit | Comment |
| ----- | ---------------------- | ------- |
| image-set-spatial-contraints | Text | A description / definition of the spatial extent of the study area (inside which the photographs were captured), including boundaries and reasons for constraints (e.g. scientific, practical) |
| image-set-temporal-constraints | Text | A description / definition of the temporal extent, including boundaries and reasons for constraints (e.g. scientific, practical) |
| image-set-target-environment | Text | A description, delineation, and definition of the habitat or environment of study, including boundaries of such |
| image-set-objective | Text | A general translation of the aims and objectives of the study, as they pertain to biology and method scope. This should define the primary and secondary data to be measured and to what precision. |
| image-set-reference-calibration | Text | Calibration data and information on calibration process |
| image-set-time-synchronisation | Text | Synchronisation procedure and determined time offsets between camera recording values and UTC |
| image-set-item-identification-scheme | Text | How the images file names are constructed. Should be like this `<project>_<event>_<sensor>_<date>_<time>.<ext>` |
| image-set-curation-protocol | Text | A description of the image and metadata curation steps and results |
| ... and many more to come | | Please suggest more! |
# Example
- [iFDO capture example](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-vocabulary-profile/-/blob/master/SO268-1_021-1_GMR_CAM-23_example-pFDO.yaml): SO268-1_021-1_GMR_CAM-23_example-iFDO_capture.yaml
- [iFDO capture vocabulary](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/blob/master/MareHub_AGVI_iFDO_capture-vocabulary.yaml): the set of allowed terms to be used in the set of recommended iFDO capture fields
# iFDO content fields
Image data is inherently unstructured and obtaining a glimpse of its content is hard to achieve for humans as well as machines. The iFDOs content fields are a mechanism to encode the content of image data by means of visual, textual or other data proxies (annotations, previews, descriptions, categorisations, etc.). These can take various forms as described below. Simple examples of visual proxies are thumbnails for images or the average intensity along a video.
# Further domain-specific iFDO content fields
| Field | Format / Values / Unit | Comment |
| ----- | ---------------------- | ------- |
| image-entropy | Float | 1D time series constructed of single entropy values for each image / frame `<image filename 1>: <entropy 1>\n<image filename 2>: <entropy 2\n...>` |
| image-particle-count | Int | 1D time series constructed of single particle/object count values for each image / frame `<image filename 1>: <particle count 1>\n<image filename 2>: <particle count 2\n...>` |
| image-average-color | [Int,Int,Int] | Set of `n` 1D time series constructed of the average colour for each image / frame and the `n` channels of an image (e.g. 3 for RGB) `<image filename 1>:\n\t<channel 0>: <value>\n\t<channel 1>: <value>\n<image filename 2>:\n\t<channel 0>: <value>\n...>` |
| image-mpeg7-colorlayout | [Float,Float,...] | An nD feature vector per image / frame of varying dimensionality according to the chosen descriptor settings.|| image-mpeg7-colorstatistic | [Float,Float,...] | An nD feature vector per image / frame of varying dimensionality according to the chosen descriptor settings. |
| image-mpeg7-colorstructure | [Float,Float,...] | An nD feature vector per image / frame of varying dimensionality according to the chosen descriptor settings. |
| image-mpeg7-dominantcolor | [Float,Float,...] | An nD feature vector per image / frame of varying dimensionality according to the chosen descriptor settings. |
| image-mpeg7-edgehistogram | [Float,Float,...] | An nD feature vector per image / frame of varying dimensionality according to the chosen descriptor settings. |
| image-mpeg7-homogeneoustexture | [Float,Float,...] | An nD feature vector per image / frame of varying dimensionality according to the chosen descriptor settings. |
| image-mpeg7-scalablecolor | [Float,Float,...] | An nD feature vector per image / frame of varying dimensionality according to the chosen descriptor settings. |
** More fields on annotation will follow soon!**
# Example:
[iFDO content example](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-vocabulary-profile/-/blob/master/MareHub_AGVI_example-iFDO_content.md)
# iFDO - image FAIR Digital Object
Marine image data collections need a core set of standardized metadata for FAIR and open publication. An entire image set (e.g. deployment) requires header information on the ownership and allowed usage of the collection. Numerical metadata is required for each image on its acquisition position. It is recommended to provide further optional metadata based on the imaging use case. The following list of fields defines the required and optional keys and values that need to be provided to be in accordance with the MareHub protocol on marine imaging.
## File format
All image metadata shall be stored in one image FAIR digital object (iFDO) file. This file shall contain all iFDO fields, most importantly the header and detail fields for the image set (aka the core iFDO fields). The file should be human and machine-readable, hence *.yaml format is recommended. The file name should be: <project>_<event>_<sensor>_iFDO.yaml
All core metadata fields can be part of the image-set-header part in case they are static (always the same value for each image). In case they do vary across the dataset, the metadata of each image may also contain this value in the image-set-items part, but without the "-set" term (e.g. image-acquisition-settings). The metadata for an image always supersedes the corresponding metadata for the image-set! Bold: suggested best-practice.
# iFDO core fields:
## Header information in the image-set-header part
| Field | Format / Values / Unit | Comment |
| ----- | ---------------------- | ------- |
| image-set-name | Needs to include `<project>, <event>, <sensor>` and purpose | A unique name for the image set |
| image-set-project | `<project>` | Project |
| image-set-context | `<context>` | Expedition or cruise or experiment or ... |
| image-set-abstract | Text | 500 - 2000 characters describing what, when, where, why and how the data was collected. Includes general information on the event (aka station, experiment), e.g. overlap between images/frames, parameters on platform movement, aims, purpose of image capture etc. |
| image-set-event | `<event>` | One event of a project or expedition or cruise or experiment or ... |
| image-set-platform | `<platform>` | Sensors URN or Equipment Git ID (Handle) |
| image-set-sensor | `<sensor>` | Sensors URN or Equipment Git ID (Handle) |
| image-set-uuid | UUID | A UUID (**version 4 - random**) for the entire image set |
| image-set-handle | Handle String | A Handle (using the UUID?) to point to the landing page of the data set |
| image-set-data-handle | Handle String | A Handle (using the UUID?) to point from the metadata to the data |
| image-set-metadata-handle | Handle String | A Handle (using the UUID?) to point to this metadata record |
| image-set-creators | **Orcids** | Orcids (or Name, E-Mail) |
| image-set-pi | **Orcid** | Orcid (or Name & E-Mail) of principal investigator |
| image-set-license | **CC-BY** / CC-0 | License to use the data (should be FAIR!) |
| image-set-copyright | String | Copyright sentence / contact person or office |
| image-set-crs | **EPSG:4326**, ... | The coordinate reference system |
| image-set-coordinate-uncertainty | Float [m] | Average/static uncertainty of coordinates in this dataset, given in meters |
## Image item information in the image-set-items part (frequency: per image or per second of video):
| Field | Format / Values / Unit | Comment |
| ----- | ---------------------- | ------- |
| image-uuid | UUID | UUID (**version 4 - random**) for the image file (still or moving) |
| image-filename | `<project>_<event>_<sensor>_[free-text_]<datetime>.[jpg,png,tif,raw,...]` | A filename string to identify the image data on disk (no absolute path!) ||
| image-hash | **SHA256** | A hash to represent the whole file (including UUID in metadata!) to verify integrity on disk |
| image-datetime | UTC: YYYY-MM-DD HH:MM:SS.SSSSS | |
| image-longitude | longitude [deg] | Decimal degrees: D.DDDDDDD |
| image-latitude | latitude [deg] | Decimal degrees: D.DDDDDDD |
| image-depth | Float [m] | Use when camera below water, then it has positive values |
| image-altitude | Float [m] | Use wenn camera above water, then it has positive values |
# Example:
[iFDO example](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-vocabulary-profile/-/blob/master/SO268-1_021-1_GMR_CAM-23_example-iFDO_core.yaml): SO268-1_021-1_GMR_CAM-23_example-iFDO_core.yaml
image-set-acquisition:
photo: still images
video: moving images
scan: microscopy / slide scans
image-set-quality:
raw: straight from the sensor
processed: QA/QC'd
product: image data ready for interpretation
image-set-deployment:
mapping: planned path execution along 2-3 spatial axes
stationary: fixed spatial position
survey: planned path execution along free path
exploration: unplanned path execution
experiment: observation of manipulated environment
sampling: ex-situ imaging of samples taken by other method
image-set-navigation:
satellite: GPS/Galileo etc.
beacon: USBL etc.
transponder: LBL etc.
reconstructed: position estimated from other measures like cable length and course over ground
image-set-scale-reference:
3D camera: the imaging system provides scale directly
calibrated camera: image data and additional external data like object distance provide scale together
laser marker: scale information is embedded in the visual data
optical flow: scale is computed from the relative movement of the images and the camera navigation data
image-set-illumination:
sunlight: the scene is only illuminated by the sun
artificial light: the scene is only illuminated by artificial light
mixed light: both sunlight and artificial light illuminate the scene
image-set-resolution:
km: average size of one pixel in an image of the data set is 1 km = 1000 m
hm: average size of one pixel in an image of the data set is 1 hm = 100 m
dam: average size of one pixel in an image of the data set is 1 dam = 10 m
m: average size of one pixel in an image of the data set is 1 m
dm: average size of one pixel in an image of the data set is 1 dm = 0.1 m
cm: average size of one pixel in an image of the data set is 1 cm = 0.01 m
mm: average size of one pixel in an image of the data set is 1 mm = 0.001 m
µm: average size of one pixel in an image of the data set is 1 µm = 0.0001 m
image-set-marine-zone:
seafloor: images taken in/on/right above the seafloor
water column: images taken in the free water without the seafloor or the sea surface in sight
sea surface: images taken right below the sea surface
atmosphere: images taken outside of the water
laboratory: images taken ex-situ
image-set-spectral-resolution:
grayscale: single channel imagery
rgb: three channel imagery
multi-spectral: 4-10 channel imagery
hyper-spectral: 10+ channel imagery
image-set-bit-resolution: # Integer value of the number of bits used to encode intensity values in the image, all positive integers are allowed
1: binary image
8: 8 bit image
# Root path for all configuration activities and data files for curation
root_path: /home/myqt/project/curation/
# Data path where the event/sensor data resides
base_path: /volumes/project/
# Remote data path where computation nodes see the event/sensor data
base_path_remote: /shipcc/project/
# False (recommended) expects data to be structured like so: ___base_path___event/sensor/, True expects: ___base_path___gear/event/sensor/
use_gear_folders: False
# General information about the project
project:
number: "PRJ23"
title: "Assessing the impacts of holothurian harvesting."
acronym: "Holothurian Impact"
vessel: "RV Moon"
vessel_acronym: "MO"
start:
port: "Manzanillo"
date: "2019-02-15 06:00:00+0000"
end:
port: "Vancouver"
date: "2020-05-27 08:00:00+0000"
pi:
name: "Dr. Jane Doe"
affiliation: "GEOMAR Helmholtz Centre for Ocean Research Kiel"
info:
de: "Ein deutscher Text mit ca. 1000 Zeichen der das Projekt beschreibt."
en: "An english text of ca. 1000 characters length, describing the project."
# Equipment operated in this cruise, extracted from the equipment Git
equipment:
platforms:
- eqid: MO_PFM-1_OFOS
- eqid: MO_PFM-2_ROV
...
cameras:
- eqid: MO_CAM-1_Photo_OFOS
camera: Canon EOS 5D Mark IV
lens: Canon EF24 f/1.4L II USM
focallength: 24mm
pfdo-proxy-core:
acquisition: photo
image-quality: raw
deployment: survey
navigation: beacon
scale-reference: laser marker
illumination: artificial light
resolution: mm
zone: seafloor
spectral-resolution: rgb
- eqid: MO-CAM-2_ROV_Video
...
# Settings on how imagery is created, required to create FDOs
images:
copyright:
artist: PRJ23 team
credit: PRJ23 team & Dr. Jane Doe
editor: John Doe
copyright: (c) GEOMAR Helmholtz Centre for Ocean Research Kiel. Contact: presse@geomar.de
license: CC-BY
exif:
imagedescription: "Acquired by camera ___DEPLOYMENT:CAMERAID___ mounted on platform ___DEPLOYMENT:PLATFORM___ during cruise ___CRUISE:NUMBER___ (station: ___DEPLOYMENT:STATION___). Navigation data were automatically edited by the MarIQT software (removal of outliers, smoothed and splined to fill time gaps) and linked to the image data by timestamp."
# Settings on navigation data processing and DSHIP interaction
navigation:
# File that contains all navigation data for all the deployments of an entire cruise
# and for all (relevant) navigation beacons. Should have been exported from DSHIP with ca. 10minute interval
# Appended to the ___base_path___
all_underwater_navgiation_file: files/PRJ23_all-underwater-navigation.dat
# Set this path to the correct device operation file for the cruise
# You can create this file using the BSH DSHIP export
# Appended to the ___base_path___
all_dship_device_operation_file: files/PRJ23_all-device-operations.dat
# A folder that contains many navigation data exports from DSHIP (zip files)
# Appended to the ___base_path___
dship_navigation_folder: files/dship_zips/
# Make sure this is the correct time format in the navigation .dat files!
dship_date_format: "%Y/%m/%d %H:%M:%S"
# Frequency a which smoothed navigation data shall be extracted from DSHIP (in seconds)
data_frequency_seconds: 5
# Maximum reasonabble depth value achieved during this project in meters (to remove outliers)
max_depth: 6000
# Information on data curator
dship_user_name: jdoe
dship_user_mail: jdoe@geomar.de
# The equipment Git sensor id
satellite_navigation:
sensor_equipment_id: MO_NAV-1_GPS_Saab
underwater_navigation:
sensor_equipment_id: MO_NAV-2_USBL_Posidonia
sensor_type: posidonia
# How each gear has to be processed for USBL smoothing (key is device type, value is processing type)
navigation_processing_parameters:
DEFAULT:
- name: beacon_id
value: 2
- name: max_vertical_speed
value: 3.0
unit: m/s
- name: max_lateral_speed
value: 2.0
unit: m/s
- name: max_time_gap
value: 300
unit: s
- name: smoothing_gauss_half_width
value: 60
unit: s
- name: outlier_check_min_neighbors
value: 5
unit: number
- name: max_allowed_outlier_lateral_dist
value: 10
unit: m
- name: max_allowed_outlier_vertical_dist
value: 10
unit: m
- name: outlier_check_time_window_size
value: 60
unit: s
OFOS:
- name: processing_type
value: transect
- name: beacon_id
value: 1
ROV:
- name: processing_type
value: transect
- name: beacon_id
value: 4
annotations:
annotation_type: OFOP_obser
label_groups:
ignore: [all,the,annotation,categories,you,want,to,ignore]
major_group_1:
element_1.1: parent_element
element_1.2: parent_element
...
major_group_2:
element_2.1: parent_element
element_2.2: parent_element
...
label_mappings:
from_type: to_type
...
# Mapping the MareHub AG V/I vocabulary terms to other resources
The MareHub AG V/I vocabulary on images (for [iFDOs](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/blob/master/MareHub_AGVI_iFDO.md)) and on proxies for images ([pFDOs](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/blob/master/MareHub_AGVI_pFDO.md)) on marine images is an aggregate, combining the efforts of proven or up-and-coming vocabularies, ontologies, meta data standards etc. The following table provides an overview of the fields from these various sources:
## Image set: header information (frequency: per dataset)
| [iFDOs](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/blob/master/MareHub_AGVI_iFDO.md) | [DarwinCore](https://dwc.tdwg.org/terms) | [Pangaea](https://wiki.pangaea.de/wiki/Data_submission) | [schema.org](https://schema.org/docs/schemas.html) | [OBIS](https://obis.org/manual/darwincore/) | [MEDIN/BODC](https://www.medin.org.uk/sites/medin/files/documents/MEDIN_Schema_Documentation3_1_brief.pdf) | [INSPIRE](https://inspire.ec.europa.eu/Technical-Guidelines/Data-Specifications/2892) | [SMarTaR-ID](https://doi.org/10.1371/journal.pone.0218904) |
| - | - | - | - | - | - | - | - |
| image-set-name | datasetName | Dataset:Title | name | datasetName | Resource title | | |
| image-set-project | | Dataset:Project | isPartOf | | | | |
| image-set-context | | | | | | | |
| image-set-abstract | | Dataset:Abstract | | | Resource abstract | | |
| image-set-event | | Event | | | | | |
| image-set-platform | | Event:Platform | | | | | |
| image-set-sensor | | Event:Sensor | | | | | |
| image-set-uuid | datasetID | | identifier | datasetID | Unique resource identifier | | |
| image-set-doi | | DOI (after publication) | | | | | |
| image-set-data-handle | | PANGAEA API Link | | | Resource locator | | |
| image-set-metadata-handle | | PANGAEA API Link | | | | | |
| image-set-creators | rightsHolder(?) | Dataset:Authors | creator | rightsHolder(?) | Responsible party | | |
| image-set-pi | | Dataset:PI | | | Originator | | |
| image-set-license | license | Dataset:License | license | license | Limitations on public access/ Conditions applying for access and use | | |
| image-set-crs | verbatimSRS | | | verbatimSRS | Spatial reference system | | |
| image-set-type | type | | ImageObject / VideoObject | type | Resource type / Data format | | |
| image-set-coordinate-uncertainty | coordinatePrecision | Coordinate uncertainty (170981) | | coordinateUncertaintyInMeters | | | |
| *image-set-event-information | | | | | | | |
| *image-set-reference-calibration | | | | | | | |
| *image-set-time-synchronisation | | | | | | | |
| *image-set-item-identification-scheme | | | | | | | |
| *image-set-curation-protocol | | | | | Lineage(?) | | |
| *image-set-acquisition-settings | | | | | Lineage(?) | | |
| *image-set-camera-intrinsics | | | | | | | |
| *image-set-camera-extrinsics | | | | | | | |
_*: optional_
## Images: item information (frequency: per image or per second of video)
| [iFDOs](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/blob/master/MareHub_AGVI_iFDO.md) | [DarwinCore](https://dwc.tdwg.org/terms) | [Pangaea](https://wiki.pangaea.de/wiki/Data_submission) | [schema.org](https://schema.org/docs/schemas.html) | [OBIS](https://obis.org/manual/darwincore/) | [MEDIN/BODC](https://www.medin.org.uk/sites/medin/files/documents/MEDIN_Schema_Documentation3_1_brief.pdf) | [INSPIRE](https://inspire.ec.europa.eu/Technical-Guidelines/Data-Specifications/2892) | [SMarTaR-ID](https://doi.org/10.1371/journal.pone.0218904) |
| - | - | - | - | - | - | - | - |
| image-uuid | eventID | | identifier | eventID | | | |
| image-filename | | File name (25541) | name | | | | |
| image-hash | | | | | | | |
| image-datetime | eventDate | DATE/TIME (1599) | dateCreated | eventDate | Temporal reference | | |
| image-longitude | decimalLongitude | LONGITUDE (1601) | | decimalLongitude | | | |
| image-latitude | decimalLatitude | LATITUDE (1600) | | decimalLatitude | | | |
| image-depth | verbatimDepth | DEPTH, water [m] (1619) | | minimumDepthInMeters / maximumDepthInMeters | Vertical extent information | | |
| image-altitude | verbatimElevation | ALTITUDE [m] (4607) | | | | | |
| *image-pixel-per-millimeter | | Image resolution (172673) | | | Spatial resolution | | |
| *image-meters-above-ground | | HEIGHT above ground [m] (56349) | | | Distance | | |
| *image-area-square-meter | | | | | | | |
_*: optional_
## Disclaimer
Vocabularies, repositories or other sources that do not enforce structured meta data or data (like Zenodo or DRYAD) are not considered here as they do not lead to FAIRness of data. Similarly, all projects, workflows, etc. that rely on these sources are omitted as well.
Other sources not (yet?) included here:
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.733.1984&rep=rep1&type=pdf
- https://www.nature.com/articles/sdata2018181
> **Preamble:** We strive to make marine image data [FAIR](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/blob/master/FAIR-marine-images.md). We maintain - here in this repository - [data profiles](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos) to establish a common language for marine imagery, we develop best-practice [operating procedures](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/standard-operating-procedures) for handling marine images and we develop [software tools](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/software/mar-iqt) to apply the vocabulary and procedures to marine imagery.
# Introduction
Achieving FAIRness and Openness of (marine) image data requires structured and standardized metadata on the image data itself and the visual and semantic image data content. This metadata shall be provided in the form of FAIR digital objects (FDOs). The content of this repository describes how FDOs for images (aka iFDOs) shall be structured. If you want, an iFDO is a human and machine-readable file format for an entire image set, except that it does not contain the actual image data, only references to it!
## Delicious (marine) image data
iFDOs consist of various metadata fields. Some are required, some are recommended, some are optional. You will only achieve FAIRness of your image data with the required core fields populated. You will only gain visibility and credit for your image data with the recommended capture fields populated. And you will only have awesome image data in case you also populate the content fields. As a bonus you can add your own domain-specific optional fields.
<img style="margin: auto 0" src="https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-profiles-fdos/-/raw/master/graphics/delicious_iFDOs_CCC.png" title="Delicious iFDOs" width="800"><br>
An iFDO file can consists of a header and item information part. For the core iFDO fields, the header part contains all metadata information that is identical for all the image items. The image item information part contains all the information where the individual images require their specific metadata value. An example is the coordinate: for a moving camera, all image items require their own image-latitude value. In case of a stationary camera, the image-set-latitude value can be used instead.
## iFDO sections
- **Core fields (required)**: [iFDO core](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-vocabulary-profile/-/blob/master/MareHub_AGVI_iFDO-core.md)
- **Capture fields (recommended)**: [iFDO capture](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-vocabulary-profile/-/blob/master/MareHub_AGVI_iFDO-capture.md)
- **Content fields (optional)**: [iFDO content](https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/metadata-vocabulary-profile/-/blob/master/MareHub_AGVI_iFDO-content.md)
## Version
Continuous versioning of the FDOs is given by the commit hashes. Releases with numerical versioning will be created upon major changes.
## Metadata vocabulary terms
The following wording will be used throughout the documents.
- images: photos (still images) and videos (moving images) acquired by cameras, recording the optical spectrum of light (http://purl.org/dc/dcmitype/Image)
- still image: A static visual representation (http://purl.org/dc/dcmitype/StillImage)
- moving image: A series of visual representations imparting an impression of motion when shown in succession (http://purl.org/dc/dcmitype/MovingImage)
- image set: a collection of at least one, but usually many, images (http://purl.org/dc/dcmitype/Collection)
- `<tag>`: represent various placeholders for information (like variables)
- `<project>`: the project, expedition or cruise acronym
- `<event>`: part of a project, this refers to the station number in marine sciences
- `<sensor>`: a unique, human-readable identifier (or nickname) for the data acquisition device / camera
# This repository is outdated!
Please find the iFDO and SOP documentation pages here: https://datahub.pages.hzdr.de/marehub/ag-videosimages/fair-marine-images/sops/SOP_image-curation/ which are built from here: https://gitlab.hzdr.de/datahub/marehub/ag-videosimages/fair-marine-images
\ No newline at end of file
image-set-acquisition: photo
image-set-quality: raw
image-set-deployment: survey
image-set-navigation: beacon
image-set-scale-reference: laser marker
image-set-illumination: artificial light
image-set-resolution: mm
image-set-marine-zone: seafloor
image-set-spectral-resolution: rgb
image-set-bit-resolution: 8
image-set-header:
image-set-sequence-image: 20.500.12085/2a2360e9-f7be-4ad2-be04-0ea0b4cbdc58
...
image-set-items:
SO268-1_21-1_GMR_CAM-23_20190513_131415.jpg:
image-entropy: 0.475