This lesson is still being designed and assembled (Pre-Alpha version)

A Biologist's Guide to Programming and Data Analysis with R and BASH

Description

This is the second lesson in a two-part workshop series Programming and Data Analysis with R and BASH, which is delivered as part of the NFCDS pedagogdy fellowship program.

Given the increasing amount of data being generated today, programmatic data analysis is an important skill for a wide range of fields. The R programming language and Unix/Linux command line can be powerful tools for analyzing data. Even more powerfully, the huge variety of R and command line tools can be used together with scripting to create custom pipelines to analyze large or complex data sets.

Outcomes

In this hands-on workshop, participants will learn about biological data analysis with R and Unix/Linux tools, with a focus on techniques for creating modular R and BASH scripts. Additionally, participants will explore how to create custom scripts to automate the analysis of large data sets. This workshop includes an introduction to scripting and biological data analysis, with a focus on comparative techniques for creating modular R and BASH scripts.

Audience

This lesson is designed for anyone interested in learning how to combine R and BASH scripting to automate the analysis of biological data sets. For example, biologists interested in learning how to create pipelines that use a combination of R scripts to analyze and visualize data. Another potential application of the skills learned from this workshop is the ability to create a set of scripts that work together to integrate specific omics tools in a data analysis workflow.

Prerequisites

  1. Participants should be comfortable with using a computer and basic statistical methods. Furthermore, participants should have:
    • limited experience with BASH
    • limited experience with R
  2. Each participant needs to have access to a computer with Unix/Linux or the Windows Ubuntu app, and the necessary bioinformatics software. To get started, follow the directions in the Setup tab.

  3. Since this is an introductory workshop, we will be available 30 minutes prior to the workshop to walkthrough installing the necessary software.

  4. Please complete the pre-workshop survey before getting started with the workshop.

Exit Ticket

Please complete the post-workshop survey after completion of the workshop, and before you leave.

Schedule

Setup Download files required for the lesson
00:00 1. Omics Data Collection & Preparation What is the experimental design for the data set we are using?
How do I naviagate the terminal?
How should I prepare data for analysis?
What are the most common file formats of transcriptomic and genomic data?
What are some databases and software tools I can use to collect omics data?
01:00 2. DE Analysis with Exact Tests How do I need to prepare my data for analysis in R?
What standard statistical methods do I need to analyze this data set?
What packages are available to me for biostatistical data analysis in R?
How do I perform basic differential gene expression analysis using R packages?
02:00 3. Break Take a break!
02:10 4. DE Analysis with Generalized Linear Models How do I need to prepare my data for analysis in R?
What standard statistical methods do I need to analyze this data set?
What packages are available to me for biostatistical data analysis in R?
How do I perform advanced differential gene expression analysis using R packages?
03:30 5. Supplemental - Introduction to Omics Data Analysis What is bioinformatics and biostatistics?
What are the similarities between bioinformatics and biostatistics?
How can I combine R and BASH scripts to automate my data analysis workflow?
What will we be covering in this workshop?
How can I use R and BASH scripts to automate my data analysis process?
03:30 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.