When two major workshops concluded by the EarthCube CRESCYNT Coral Reef Science and Cyberinfrastructure Network in March 2018, there were some interesting clear outcomes in addition to the practical training and data exploration goals accomplished. The workshops were both structured around Data Science for Coral Reefs. At the end of the first, focused on Data Rescue and data management, participants decided that the most important new topic they learned about was metadata and its uses. At the end of the second, focused on Data Integration and Team Science, people had realized how essential writing good metadata was for being able to make datasets at disparate scales work together well. The metadata lessons were important emergent outcomes, and participants asked that data and metadata experts get together, use the data challenges that arose, and recommend some metadata practices and standards that would work for the coral reef community and its very broad range of data types, repositories, and pre-repository research, storage, sharing and analytical metadata needs.
We were luckily able to do exactly that with one final workshop. Through a jointly staged CRESCYNT-DDStudio workshop, we pulled together a group of metadata experts, coral reef data managers, representative scientists, and the EarthCube Data Discovery Studio’s scientists and software developers focused on metadata enhancement for finding and using data.
Special guests included Ted Habermann (Metadata 2020 project, co-author of “The influence of community recommendations on metadata completeness”; Stephen Richard (experience with schema.org and metadata standards authoring); coral reef data experts Gastil Gastil-Buhl (Moorea Coral Reef LTER), Hannah Ake (BCO-DMO), and Sarah O’Connor and Zachary Mason (NOAA NCEI’s user metadata writing interface and CoRIS), the three biggest formal repositories for coral reef research data in the US or sponsored by NSF; Eric Lingerfelt, the EarthCube Technical Officer; guests from Scripps; and DDH team members Ilya Zaslavsky, Karen Stocks, Gary Hudman, David Valentine, and Tom Whitenack with their broad and integrative metadata, software, and domain expertise. Ilya and Karen kindly hosted the group at UCSD’s San Diego Supercomputer Center and Scripps Institution of Oceanography.
Important outcomes from the workshop were mutualistic for the two projects. For CRESCYNT, they included cross-mapping an essential set of metadata (as defined by appropriate community repositories) to web standards and producing a draft ISO metadata profile for coral reef data at two levels of dataset access: (1) discovery and sharing (a simpler form with freeform text entry in many of the fields), and (2) understanding and usability at the workbench level (a more detailed form with options to supply more highly specified fields). We will finish writing these and offer them to the coral reef community for feedback and potential adoption.
For the Data Discovery Studio (formerly known as Data Discovery Hub), important outcomes included exploration of the use of the enhanced metadata at different repositories and in science use cases (including the coral reef use case), a deep dive into focusing the future trajectory of Data Discovery Studio, and some initial planning for an upcoming data science competition that will involve the coral reef data (details to be announced). Read more about DDStudio and its broader work, and be on the alert for a Data Discovery Science Competition in January 2019!
We gratefully acknowledge the generosity of our hosts, workshop travel support from NSF, the active work and engagement of our participants, and the organizations that allowed their employees time to attend and contribute to this collective effort.