This lesson is still being designed and assembled (Pre-Alpha version)

BASH Fundamentals

Overview

Teaching: 30 min
Exercises: 15 min
Questions
  • What is the BASH command language?

  • How do I write code in the BASH command language?

  • What are the components and features of BASH?

  • How can I write and run BASH code?

Objectives
  • Become familiar with the syntax and common functions of the BASH language.

  • Become comfortable with working in the terminal.

  • Extend knowledge of R to learn about complementary programs used in the Unix/Linux terminal.

  • Practice writing BASH code to perform basic operations.

  • Discover important similarities and differences between R and BASH programming.

The BASH Programming Language

The BASH command language (Bourne Again SHell) is a programming language that is sh-compatible. This means that it is a programming language through which a user communicates with the operating system or a software program (application).

BASH is the default shell on most Linux operating system installations, and its wide distribution with Linux and Unix systems makes it an important tool to know.

BASH Programming Language Image source

The BASH language is used to communicate with the interpreter component of a computer system. The interpreter executes program code commands read from the standard input (e.g., terminal) or from a file. BASH script files end with the .sh extension, in contrast to the .R or .r extension of R scripts.

The Utility and Components of BASH

To many beginning programmers BASH can appear intimidating, which can make it difficult to get started with BASH programming. But there are just a few componenets of BASH that we need to know to understand how BASH integrates with the computer system.

Components of the Computer with Shell Image source

The primary components of BASH include:

BASH vs RStudio

So, we can see that there are some important similarities and differences between BASH and RStudio components. These include:

Discussion

What are some other similarities and differences between BASH and RStudio?

Tip!

Notice that there is a Terminal tab in the RStudio window with the Console component. This allows you to run BASH commands in the RStudio interface, which also provides a convienient location to write and edit BASH script files.

Terminal in RStudio Image source

BASH Programming Language Syntax

Remember that the syntax of a programming language defines the meaning of specific combinations of words and symbols. This is why we call programming coding. Each programming language uses different combinations of words and symbols to get the computer to follow the instructions specified in your code.

Variables & Data Types

Similar to R, in the BASH programming language a combination of letters and symbols are used to give names (variables) to the data you are actively using in the memory of your computer system. However, in contrast to many programming laguages, you do not have to declare (set) the data type of variables. That is, BASH variables are untyped and in essence, character strings.

We use the = operator in BASH to initialize a variable and assign it a value. Again, this means that the variable is a name tag that points to a specific piece of data in the memory of the computer system. This is in contrast to the <- assignment operator that we typically use to assign value to variables in the R programming language.

# here is an integer value
8

# here is a variable with an assigned value of 8
my_value <- 8
my_value=8

Discussion

What happens when you enter the following BASH code in the command line?

8

And what happens when you enter this piece of BASH code in the command line?

my_value = 8

Finally, what happens when you enter this piece of BASH code in the command line?

# this is a comment

Checklist

Note that there are some features common to how we format and initialize variables in BASH:

  • variable names should be upper case
  • do not use spaces after the initialization of the variable name, or the specified value
  • variable names can have letters, numbers, or underscores

In the R language the = operator is used to set a variable equal to a value, rather than assign the value to the variable using the <- operator. The nuance of this difference hinges on how the value is being stored in memory, and the accessability of the value using the variable.

What this means is that to use a variable in R we simply need to call it by its name. However, for BASH variables we need to prepend the $ operator to the name of the variable that we have initialized to refer to it and similarly “call it by its name”. Before Bash interprets (runs) each line of code entered in the command line or shell script, it first checks to see if any variable names are present by looking for the $ operator.

# assign a variable a value of 1
my_value <- 1

# print the value stored in the previous variable
print(my_value)
my_value=1; echo $my_value

Discussion

Why happens if enter the following BASH code into the terminal’s command line?

my_value=1
echo $my_value

Now, what happens when you open a new terminal or tab (environment) and enter the following code?

echo $my_value

Since variables in BASH are essentially character strings, how can we perform mathematical operations? Well we can use functions in BASH to give context and perform arithmetic operations and comparisons on variables.

The let function in BASH allows you to perform arithmetic operations using the following operator symbols:

Recall that in the R programming language we have access to the following arithmetic operators:

As a first step to learning how to perform arithmetic in BASH, we should check out the documentation for the let function.

To find the documentation for functions in BASH we can search the internet for that function’s manual. So, to find the let documentation we will search “let manual bash”. The top search result has a description of the syntax and purpose of the let function.

Now let’s try an example comparison of arithmetic operations in R and BASH.

# addition using two different variables
my_value_1 <- 5
my_value_2 <- 10
my_result <- my_value_1 + my_value_2
my_value1=5; my_value2=10; let "my_result=$my_value1 + $my_value2"; echo $my_result

Discussion

You may have noticed a few interesting details about the formatting of the BASH code above, particularly in contrast to the R code. Let’s discuss some of those differences.

A couple of motivating questions:

  1. Why do we prepend the $ operator to the my_result variable only in the echo function?
  2. Why do we include the ; symbol at the end of each line (piece) of code?

BASH Commands - Printing & Arrays

Again, one of the most fundamental functions in any programming language is one that allows you to print data to the screen. The most common command to print outputs in BASH is named echo.

Tip!

Note that we call the BASH functions that we enter into the command line commands. Functions more specifically refer to the code definition that underlies the command being used to call the function.

After searching the internet for “echo manual bash”, we can see that this function has the following syntax:

echo [options]... [String]...

Checklist

We now see that the syntax for calling (running) a function in BASH has the following feautres:

  • function name
  • white space
  • options (arguments)

Now let’s take a look at these methods for printing data to the screen in both R and BASH in action.

# print a character object to the screen
print("cool cool cool")
echo "cool cool cool"

Recall that in the R programming language vectors, matrices, and data frames are the named storages that contains 1D and 2D collections of data. In BASH we can use arrays to create similar 1D and 2D collections of data. There are two types of arrays in the BASH language:

Tip!

As of the Catalina version of macOS they have adopted Z shell as their default shell, which is in replacement of BASH. There are a few differences between BASH and Z shell, many of which are centered on the user interface.

But there is an important difference with array indexing between BASH and Z shell. In the BASH language arrays start at the integer 0, whereas in Z shell array indexing begins with the integer 1.

BASH vs Z Shell Image source

Similar to R, we can easily create indexed 1D arrays using shorthand, without using an explicit function call (command).

# variable with an assigned value of a 1D vector object
my_vector <- 5:10

# view the data contents of myVector using the print function
print(my_vector)

# short hand way to view the data contents of myVector
my_vector

# access the second element of the vector stored in myVector
my_vector[2]
my_array=(5 4 3 2 1); echo $my_array

Discussion

Why do we need to use the echo BASH command to print the contents of a variable to the screen?

The simple shorthand form of creating arrays in BASH is very convienient. More powerfully, we can use the declare command to create both indexed and associative arrays in BASH.

First, we will create an indexed array in both the R and BASH languages:

# list of values with different data types
my_list <- c("first", "second")

# view the contents of the list variable
print(my_list[2])
declare -a my_indexed_array; my_indexed_array[1]="first"; my_indexed_array[2]="second"; echo $my_indexed_array[2]

Now, to create an associative array in R and BASH:

# list of values with different data types
my_list <- list(cat = "Meow", dog = "Woof")

# view the contents of the list variable
print(my_list$dog)
declare -A my_assoc_array; my_assoc_array[cat]="Meow"; my_assoc_array[dog]="Woof"; echo $my_assoc_array[dog]

Discussion

What happens when you enter the following BASH code into the command line?

declare -A my_assoc_array; my_assoc_array[cat]="Meow"; my_assoc_array[dog]="Woof"; echo $my_assoc_array[2]

And what happens when you enter the following R code in the RStudio console?

# list of values with different data types
my_list <- list(cat = "Meow", dog = "Woof")

# view the contents of the list variable
print(my_list[2])

Advanced Coding Challenge

Note that it is not possible to create multi-dimensional arrays, such as 2D arrays in the BASH language. But it is possible to basically simulate a multi-dimensional collection of data using associative arrays, for example.

Simulating 2D Arrays in BASH Image source

Try creating your own 2D array in the BASH command language!

So, we can use functions and evaluate mathematical expressions in BASH like we have done using the R programming language in RStudio. But our experience coding while using the BASH terrminal and command line so far has not been nearly as easy and streamlined as when using RStudio. For example, we have to write code in the restrictive and clunky terminal user interface.

Key Points

  • BASH and R share a lot of the same basic functionalities.

  • Use the -h flag to examine the description of some BASH commands.

  • Search the internet for further information about BASH commands.

  • Copy and paste!