Written while embedded in our CRESCYNT Data Science for Coral Reefs workshops. Amazingly, everyone who participated in workshop 1 – Data Science for Coral Reefs: Data Rescue – learned even more than they thought they would. We’ve had wonderful NCEAS trainers, spectacular participants with amazing datasets, and a lot of hard work over 4 days (March 7-10, 2018).
In the second intensive workshop – Data Science for Coral Reefs: Data Integration and Team Science – people will be introduced to R Studio and GitHub if they have not used them before, and then we will work on exploring techniques for integrating disparate datasets. We’ll start with a pair of datasets at a time, and efforts may involve extracting data from one dataset based on observations from another; upscaling, downscaling, resampling, or summarizing to make intervals and scales mesh – exactly the kind of process that coral reef researchers have said is a recurring challenge in asking bigger science questions.
Each workshop group is writing a paper to summarize and share lessons learned, so please stay tuned for those!
We experimented with an unusual process for these workshops: two days of training followed by two days of workathon. We’re liking it! Tell us what you think about these topics and training materials. What other workshop outputs would you like to see?
We’re extremely pleased to be able to offer two workshops in March 2018 at NCEAS. The second is CRESCYNT Data Science for Coral Reefs Workshop 2: Data Modeling, Data Integration and Team Science. Apply here.
When: March 12-15, 2018
Where: NCEAS, Santa Barbara, CA
This workshop is recommended for early to mid-career and senior scientists with interest in applying technical skills to collaborative research questions and committed to subsequently sharing what they learn. Participants will learn how to structure and combine heterogeneous data sets relevant to coral reef scientists in a collaborative way. Topics covered on days 1 and 2 of the workshop will cover reproducible workflows using R/RStudio and RMarkdown, collaborative coding with GitHub, strategies for team research, data modeling and data wrangling, and advanced data integration and visualization tools. Participants will also spend 2 days working in small teams to integrate various coral reef datasets to practice the skills learned and develop workflows for data tidying and integration.
In a previous post we offered some solid supportive resources for learning R – a healthy dinner with lots of great vegetables. Here we offer a dessert cart of rich resources for data visualization and graphing. It’s a powerful motivation for using R.
First up is The New R Graph Gallery – extensive, useful, and actually new. “It contains more than 200 data visualizations categorized by type, along with the R code that created them. You can browse the gallery by types of chart (boxplots, maps, histograms, interactive charts, 3-D charts, etc), or search the chart descriptions. Once you’ve found a chart you like, you can admire it in the gallery (and interact with it, if possible), and also find the R code which you can adapt for your own use. Some entries even include mini-tutorials describing how the chart was made.” (Description by Revolutions.)
Sometimes we want (or need) plain vanilla – something clean and elegant rather than extravagant. Check out A Compendium of Clean Graphs in R, including code. Many examples are especially well-suited for the spartan challenge of conveying information in grayscale. The R Graph Catalog is a similar resource.
If you’re just getting started with R, take a look at the Painless Data Visualization section (p. 17 onward) in this downloadable Beginner’s Guide.
If you’re already skilled in R and want a new challenge, an indirect method of harnessing some of the power of D3.js for interactive web visualizations is available through plotly for R. Here’s getting started with plotly and ggplot2, plotly and Shiny, and a gallery. The resources offer code and in some cases the chance to open a visualization and modify its data.
We are driven to learn like sharks: constantly take in new flows, or die. In a recent workshop, when coral reef scientists were asked: “How many of you use R?” 60% raised a hand. To: “How many of you are comfortable with and love using R?” only about 15% kept a hand up.
Here’s where to go to learn to love R more.
You likely already know of the R Project, free and open source software for statistical computing and graphics. You may already know of the reliability of the Comprehensive R Archive Network or CRAN repository, favored by many over other potential sources of community-generated code because of their metadata and testing requirements; it now hosts over 9,300 packages (sorted by date and name).
You may not know of the new R course finder, an online directory you can search and filter to find the best online R course for your next step (note there are often free versions or segments of even the pay courses listed). There are YouTube videos for R learning, like twotorials (two-minute tutorials) and YaRrr! (because pirates) with book.
A very recent new book is getting rave reviews from both statistics and programming viewpoints: The Book of R by Tilman Davies (preview it here). The author writes:
“The Book of R …represents the introduction to the language that I wish I’d had when I began exploring R, combined with the first-year fundamentals of statistics as a discipline, implemented in R…. Try not to be afraid of R. It will do exactly what you tell it to – nothing more, nothing less. When something doesn’t work as expected or an error occurs, this literal behavior works in your favor…. Especially in your early stages of learning…try to use R for everything, even for very simple tasks or calculations you might usually do elsewhere. This will force your mind to switch to ‘R mode’ more often, and it’ll get you comfortable with the environment quickly.”
We’ll soon host a guest blogpost on some exploratory coral symbiont data analyses, visualizations, and comments generated in R Markdown, which is RStudio’s method for preserving code and output in one running web document. The work is beautiful and useful, and highlights the use of an electronic notebook as a way to capture and share data exploration, analysis and visualization, and to tell a data story. (A major advance to that software was announced this week in the form of R Notebook, which will ship within the next couple of months.)
Why is it worth learning to love R more?
R helps make sure your data work is reproducible (such an issue for science), repeatable (valuable for any processing you have to do periodically), and reusable (on other datasets or data versions, or by colleagues or your future self).
A couple of high-level languages, like R and Python, are becoming more popular each year, and are finding their way as general purpose tools into analytical platforms. These will serve as primary sources of flexibility in cyberinfrastructure platforms now available or under development. Our future selves thank us for the learning investment.
I would suggest that they get a copy of the R for Data Science book written by Hadley Wickham and Garrett Grolemund…. Also, when you have questions or run into problems don’t give up. There’s a lot of great activity around R on stackoverflow and other places and there’s an excellent chance you’re going to find the answers to your questions if you look carefully for them.
Further Update: In January 2018, Kaggle released resources for Hands-On Data Science learning, including lessons for R in data setup, data visualization, and machine learning.