Background
The AusSeabed program outlines 3 program themes, one of which is the development of a “data hub”.
Geoscience Australia(GA) is the lead organisation of the data hub program theme, and this is a primarily infrastructure/systems focused theme.
The data hub is being developed as a multi-component suite of services that are underpinned by a federated data storage model, with each connected data store being referred to as a “local hub”. At the time of writing, there are two local hubs under development, one at GA and one at CSIRO.
Purpose
The purpose of this document is to outline the incoming data policies for the AusSeabed GA Local Hub.
Data Acceptance Policies
The following are the principles for data being accepted into The GA Local Hub:
Data must be owned by GA (GA is the lead organisation of the survey) or GA must be a partner/collaborator in its acquisition, or
If GA has not held an interest in the acquisition or preparation of the data, teh GA Local Hub will accept the data if the data is not readily available through another publically accessible channel.
Data can only be submitted by an agreed Contributing Data partner. The basis for becoming a contributing data partner can be found in the becoming a contributing partner document.
Data must be provided in the prescribed formats outlined in appendix 1.
Data must be submitted along with contributing organisation profile, providing GA with the right to store, distribute and process the data for its own outputs.
Authority to publish does not negate the need for GA to provide appropriate citation to the data.
…..
All data submitted to the GA Local Hub must be compliant with the
In order for data to be processed to a AusSeabed consistent output, data should be provided in one of the identified incoming formats outlined in the AusSeabed Guidelines ver 2.0 (currently in draft) or should be easily converted to align with these formats.
Specifically, all incoming data should include any auxiliary data required to meet the AusSeabed Guidelines.
Incoming Channels (future state)
Two incoming data submission channels will be supported into the future:
The provision of a physical hard drive that will then be registered into The Hub’s infrastructure. This option will:
Provide an ongoing service that reflects the way data has always been provided by surveyors. This will allow for a gradual education process, as well as a maturation of the tools and management practices within AusSeabed to evolve as data managers into the future.
Support the provision of historical archives which is expected to be a significant source of previously unpublished bathymetry data in the first couple of years of operation.
The registration of an external data provider into The Hub’s infrastructure for direct submission. This option will:
Begin to service more technical users who would like to partner with GA to provide the data during, or immediately after survey for rapid visualisation and access.
Allow for rapid connection between the various components of the datahub and therefore support a “Real time” narrative of the survey’s progress.
Data Levels
Standard definitions of data levels are necessary to inform how data has been processed and to ensure consistently and limit ambiguity when discussing, delivering or describing data. The following definitions are modelled on those defined by NASA for Earth Observations data products.
Appendix A GA Local Hub Data Submission Requirements
Data Level | Notes |
L0 | L0 – Raw Data L0 data is native format, as recorded by the sensor. It should include all necessary datagrams required for a comprehensive bathymetry and backscatter processing, including raw backscatter per beam (BA) and raw backscatter in time series (TS), and all required ancillary data. Water column data if collected should be stored in a separate file. |
L1 | L1 – Georeferenced Point Data - uncleaned L1 Data is raw data processed to form a georeferenced bathymetric point cloud with no cleaning |
L2 | L2 – Georeferenced Point Data – Fully Cleaned L2 data will be cleaned relative to the CUBE Surface. |
L3 v.0 | L3 Gridded Final Survey Product data L3 is the delivered Final Survey Data for a survey. L3 will be a CSAR CUBE / SDTP surface, which will include density and uncertainty. |
L3 v.1 | L4 – Verified Final Survey Data L4 Data is the final Survey Surface that is verified and accepted by the contributing organisation. . |
Data Versions
Versioning of bathymetry data is classified as V0 and V1 and directly refers to the vertical datum that the data is published as, as well as the state of the data.
Version 0
V0 (Version 0) bathymetry data is published at geodetic vertical datums (e.g. Mean Sea Level (MSL)) and is mainly refers to data that has been manually cleaned, and distributed for release.
Rough Notes
At the time of writing there is a considerable “backlog” of data that has been cleaned for delivery in response to a client request but has either:
Been distributed through GA’s public facing tools as a bespoke deliverable (either for an individual survey, or as part of a compilation), or
Has never been distributed at all.
In both of these instances, the data represents a product of value, albeit limited in that the processing that has been undertaken to date has not been explicitly captured. It is with this data in mind that the GA-ASB data hub is defining a “version 0” for this backlog data.
Important to note – “version 0” is not relevant to any new, incoming data, or data that has never been cleaned or is only partially cleaned previously. Version 0 represents a “quick win” and will be replaced at a later date.
Version 1
V1 (Version 1) is classified as bathymetry data that may have been manually cleaned, surface cleaned, and exported to two vertical datums, Ellipsoid and MSL. This version will be used for all new surveys.
Rough Notes
Version 1 represents the first, truly consistent processing outcome for the GA ASB Data Hub data. Any data within the existing backlog that has not been produced as a V0 will be produced as a V1, via a prioritised publication schedule that takes into account both backlog and new incoming data.
Version 1 will be:
Created using an automated, reproducible processing pipeline