Written while embedded in our CRESCYNT Data Science for Coral Reefs workshops. Amazingly, everyone who participated in workshop 1 – Data Science for Coral Reefs: Data Rescue – learned even more than they thought they would. We’ve had wonderful NCEAS trainers, spectacular participants with amazing datasets, and a lot of hard work over 4 days (March 7-10, 2018).
In the second intensive workshop – Data Science for Coral Reefs: Data Integration and Team Science – people will be introduced to R Studio and GitHub if they have not used them before, and then we will work on exploring techniques for integrating disparate datasets. We’ll start with a pair of datasets at a time, and efforts may involve extracting data from one dataset based on observations from another; upscaling, downscaling, resampling, or summarizing to make intervals and scales mesh – exactly the kind of process that coral reef researchers have said is a recurring challenge in asking bigger science questions.
Each workshop group is writing a paper to summarize and share lessons learned, so please stay tuned for those!
We experimented with an unusual process for these workshops: two days of training followed by two days of workathon. We’re liking it! Tell us what you think about these topics and training materials. What other workshop outputs would you like to see?
In preparation for an upcoming Data Science for Coral Reefs: Data Rescue workshop, Dr. James W. Porter of the University of Georgia spoke eloquently about his own efforts to preserve historic coral reef imagery captured in Discovery Bay, Jamaica, from as early as 1976. It’s a story from the trenches with a senior scientist’s perspective, outlining the effort and steps needed to accomplish preservation of critical data, in this case characterizing a healthy reef over 40 years ago.
Enjoy this insightful 26-min audio description, recorded on 2018-01-04.
Transcript from 2018-01-04 (lightly edited):
This is Dr. Jim Porter from the University of Georgia. I’m talking about the preservation of a data set that is at least 42 years old now and started with a photographic record that I began making in Discovery Bay, Jamaica on the north coast of Jamaica in 1976. I always believed that the information that photographs would reveal would be important specifically because I had tried other techniques of line transecting and those were very ephemeral. They were hard to relocate in exactly the same place. And in addition to that they only captured a line’s worth of data. And yet coral reefs are three dimensional and have a great deal of material on them not well captured in the linear transect. So those data were… I was very consistent about photographing from 1976 to 1986.
But eventually funding ran out and I began focusing on physiological studies. But toward the end of my career I realized that I was sitting on a gold mine. So, the first thing that’s important when considering a dataset and whether it should be preserved or not is the individual’s belief in the material. Now it’s not always necessary for the material to be your own for you to believe in it. For instance, I’m working on Tom Goreau, Sr.’s collection which I have here at the University of Georgia. I neither made it nor in any way contributed to its preservation but I’ve realized that it’s extremely important and therefore I’m going to be spending a lot of time on it. But in both cases, the photographic record from Jamaica, as well as the coral collection itself – those two activities have in common my belief in the importance of the material.
The reason that the belief in the material is so important is that the effort required to capture and preserve it is high, and you’ve got to have a belief in the material in order to take the steps to assure the QA/QC of the data you’re preserving, as well as the many hours required to put it into digital format. And believing in the material then should take another step, which is a very self-effacing review of whether you believe the material to be of real significance to others. There’s nothing wrong with memorabilia. We all keep scrapbooks and photographs that we like – things relating to friends and family, and times that made us who we are as scientists and people. However, the kind of data preservation that we’re talking about here goes beyond that – could have 50 or 100 years’ worth of utility.
Those kinds of data really do require them to be of some kind of value, and the value could either be global, regional, or possibly even local. Many local studies can be of importance in a variety of ways: the specialness of the environment, or the possibility that people will come back to that same special environment in the future. The other thing that then is number two on the list – first is belief in the material – second is you’ve got to understand that the context in which you place your data is much more important to assure its survival and utility than the specificity of the data. Numbers for their own sake are numbers. Numbers in the service of science become science. It is the context in which you place your data that will assure its future utility and preservation.