CRESCYNT Data Science for Coral Reefs Workshop 1 – Data Rescue


We’re extremely pleased to be able to offer two workshops in March 2018 at NCEAS. The first is CRESCYNT Data Science for Coral Reefs Workshop 1: Data Rescue. Apply here.

When: March 7-10, 2018
Where: NCEAS, Santa Barbara, California, USA

Workshop description:

Recommended for senior scientists with rich “dark” data on coral reefs that needs to be harvested and made accessible in an open repository. Students or staff working with senior scientists are also encouraged to apply. Topics covered on days 1 and 2 of the workshop will cover the basic principles of data archiving and data repositories, including Darwin Core and EML metadata formats, how to write good metadata, how to archive data on the KNB data repository and elsewhere, data preservation workflow and best practices, and how to improve data discoverability and reusability. Additionally, participants will spend approximately 2 days working in pairs to archive their own data using these principles, so applying with a team member from your research group is highly recommended.

The workshop is limited to 20 participants. We encourage you to apply via this form. Workshop costs will be covered with support from NSF EarthCubeCRESCYNT RCN. Participants will publish data during the workshop process, and we anticipate widely sharing workshop outcomes, including workflows and recommendations. Because coral reef science embodies a wide range of data types (spreadsheets, images, videos, field notes, large ‘omics text files, etc.), anticipate some significant pre-workshop prep effort.

Related post: CRESCYNT Toolbox – Estate Planning for Your Data

Go to the blog Masterpost or the CRESCYNT website or NSF EarthCube.

CRESCYNT Toolbox – Disaster Planning and Recovery

With computers, the question is not whether they will fail, but when.

tl;dr – It’s very practical to have cloud storage backup in addition to still-useful external hard drive backup routines. Here are some secure cloud alternatives.

itcrowd_giphyPersonal note. I’ve had hard drive failures due to lightning strike; simultaneous death of mirrored hard drives within a RAID; drenching from an upper floor emergency shower left flowing by a disgruntled chemistry student; and most recently, demise of my laptop by sudden immersion in salt water (don’t ask). By some intersection of luck and diligence, on each occasion recent backups were available for data recovery. In the most recent remake, it was a revelation how much work is now backed up via regular entry into the casual cloud.

This latest digital landing was mercifully soft (…cloudlike). Because of work portability, my recent sequential backup habit has been to a paid unshared Dropbox account; $10/mo is a bargain for peace of mind (beyond a certain size, restoration is not drag-and-drop). A surprising number of files these days are embedded in multiple team projects – much on Google Drive – so all of that was available, with revision history. Group conversations and files were on Slack and email. One auxiliary brain (iPhone)  was in a waterproof case with cloud backup, and another auxiliary brain (project/task tracking) was in a web app, KanbanFlow. Past years of long-term archives were already on external hard drives in two different cities. GitHub is an amazing place to develop, document, recover and share work in progress and products, but it is not a long-term curated data repository. For valuable datasets, the rule is to simplify formats, attach metadata, and update media periodically.

Thinking about your own locations for data storage and access? Check out this review of more secure alternatives to – and apps on top of – Dropbox. Some, like OwnCloud, can serve as both storage and linked access for platforms like Agave. A strength of some current analytical platforms is that they can access multiple data storage locations; for example, Open Science Framework can access Dropbox, Google Drive, GitHub, Box, figshare, and now Dataverse and Amazon Web Services as well.

A collaborator recently pointed out that the expense of any particular type of data storage is really the expense of its backup processes: frequency, automation, security, and combination of archiving media. Justifying the expense can come down to this question: What would it cost to replace these data? Some things are more priceless than others.

Disaster Planning and Recovery tools.  To go beyond data recovery in your planning, here’s an online guide for IT disaster recovery planning and cyberattacks. How much of a problem is this really? See Google’s real-time attack map (hit “play”). Better to plan than fear. You did update those default passwords on your devices, yes?

Feel free to share your own digital-disaster-recovery story in the comments.

CRESCYNT Toolbox – Disaster Planning and Recovery