BASH Fundamentals
Overview
Teaching: 30 min
Exercises: 15 minQuestions
What is the BASH command language?
How do I write code in the BASH command language?
What are the components and features of BASH?
How can I write and run BASH code?
Objectives
Become familiar with the syntax and common functions of the BASH language.
Become comfortable with working in the terminal.
Extend knowledge of R to learn about complementary programs used in the Unix/Linux terminal.
Practice writing BASH code to perform basic operations.
Discover important similarities and differences between R and BASH programming.
The BASH Programming Language
The BASH command language (Bourne Again SHell) is a programming language that is sh-compatible. This means that it is a programming language through which a user communicates with the operating system or a software program (application).
BASH is the default shell on most Linux operating system installations, and its wide distribution with Linux and Unix systems makes it an important tool to know.
The BASH language is used to communicate with the interpreter component of a computer system. The interpreter executes program code commands read from the standard input (e.g., terminal) or from a file. BASH script files end with the .sh extension, in contrast to the .R or .r extension of R scripts.
The Utility and Components of BASH
To many beginning programmers BASH can appear intimidating, which can make it difficult to get started with BASH programming. But there are just a few componenets of BASH that we need to know to understand how BASH integrates with the computer system.
The primary components of BASH include:
- shell - general name for any user space program with an interface (user interface) that allows access to resources (data) in the computer system
- terminal - allows a user wants to work with a shell interactively, using a keyboard to provide input and a display (monitor) to see the output on the screen
- command line (command prompt) - enables a human operator (user) to interface with a shell that is running in the terminal
BASH vs RStudio
So, we can see that there are some important similarities and differences between BASH and RStudio components. These include:
- terminal vs console - similar to the console in RStudio, the command line in the BASH terminal allows you to run code using a keyboard and display outputs using a screen
- R vs BASH scripts - both R and BASH scripts enable users to save code to run later and automate tasks, possibly in batches
Discussion
What are some other similarities and differences between BASH and RStudio?
Tip!
Notice that there is a Terminal tab in the RStudio window with the Console component. This allows you to run BASH commands in the RStudio interface, which also provides a convienient location to write and edit BASH script files.
BASH Programming Language Syntax
Remember that the syntax of a programming language defines the meaning of specific combinations of words and symbols. This is why we call programming coding. Each programming language uses different combinations of words and symbols to get the computer to follow the instructions specified in your code.
Variables & Data Types
Similar to R, in the BASH programming language a combination of letters and symbols are used to give names (variables) to the data you are actively using in the memory of your computer system. However, in contrast to many programming laguages, you do not have to declare (set) the data type of variables. That is, BASH variables are untyped and in essence, character strings.
We use the = operator in BASH to initialize a variable and assign it a value. Again, this means that the variable is a name tag that points to a specific piece of data in the memory of the computer system. This is in contrast to the <- assignment operator that we typically use to assign value to variables in the R programming language.
# here is an integer value
8
# here is a variable with an assigned value of 8
my_value <- 8
my_value=8
Discussion
What happens when you enter the following BASH code in the command line?
8
And what happens when you enter this piece of BASH code in the command line?
my_value = 8
Finally, what happens when you enter this piece of BASH code in the command line?
# this is a comment
Checklist
Note that there are some features common to how we format and initialize variables in BASH:
- variable names should be upper case
- do not use spaces after the initialization of the variable name, or the specified value
- variable names can have letters, numbers, or underscores
In the R language the = operator is used to set a variable equal to a value, rather than assign the value to the variable using the <- operator. The nuance of this difference hinges on how the value is being stored in memory, and the accessability of the value using the variable.
What this means is that to use a variable in R we simply need to call it by its name. However, for BASH variables we need to prepend the $ operator to the name of the variable that we have initialized to refer to it and similarly “call it by its name”. Before Bash interprets (runs) each line of code entered in the command line or shell script, it first checks to see if any variable names are present by looking for the $ operator.
# assign a variable a value of 1
my_value <- 1
# print the value stored in the previous variable
print(my_value)
my_value=1; echo $my_value
Discussion
Why happens if enter the following BASH code into the terminal’s command line?
my_value=1 echo $my_value
Now, what happens when you open a new terminal or tab (environment) and enter the following code?
echo $my_value
Since variables in BASH are essentially character strings, how can we perform mathematical operations? Well we can use functions in BASH to give context and perform arithmetic operations and comparisons on variables.
The let function in BASH allows you to perform arithmetic operations using the following operator symbols:
- addition +
- subtraction -
- division /
- modulous (remainder) %
Recall that in the R programming language we have access to the following arithmetic operators:
- addition +
- subtraction -
- division /
- exponentiation ^ or **
As a first step to learning how to perform arithmetic in BASH, we should check out the documentation for the let function.
To find the documentation for functions in BASH we can search the internet for that function’s manual. So, to find the let documentation we will search “let manual bash”. The top search result has a description of the syntax and purpose of the let function.
Now let’s try an example comparison of arithmetic operations in R and BASH.
# addition using two different variables
my_value_1 <- 5
my_value_2 <- 10
my_result <- my_value_1 + my_value_2
my_value1=5; my_value2=10; let "my_result=$my_value1 + $my_value2"; echo $my_result
Discussion
You may have noticed a few interesting details about the formatting of the BASH code above, particularly in contrast to the R code. Let’s discuss some of those differences.
A couple of motivating questions:
- Why do we prepend the $ operator to the my_result variable only in the echo function?
- Why do we include the ; symbol at the end of each line (piece) of code?
BASH Commands - Printing & Arrays
Again, one of the most fundamental functions in any programming language is one that allows you to print data to the screen. The most common command to print outputs in BASH is named echo.
Tip!
Note that we call the BASH functions that we enter into the command line commands. Functions more specifically refer to the code definition that underlies the command being used to call the function.
After searching the internet for “echo manual bash”, we can see that this function has the following syntax:
echo [options]... [String]...
Checklist
We now see that the syntax for calling (running) a function in BASH has the following feautres:
- function name
- white space
- options (arguments)
Now let’s take a look at these methods for printing data to the screen in both R and BASH in action.
# print a character object to the screen
print("cool cool cool")
echo "cool cool cool"
Recall that in the R programming language vectors, matrices, and data frames are the named storages that contains 1D and 2D collections of data. In BASH we can use arrays to create similar 1D and 2D collections of data. There are two types of arrays in the BASH language:
- indexed arrays - are ordered lists of items in which the keys (indexes) are integer numbers
- associative arrays or hash tables - are arrays in which the keys (indexes) are represented by arbitrary strings, rather than integers
Tip!
As of the Catalina version of macOS they have adopted Z shell as their default shell, which is in replacement of BASH. There are a few differences between BASH and Z shell, many of which are centered on the user interface.
But there is an important difference with array indexing between BASH and Z shell. In the BASH language arrays start at the integer 0, whereas in Z shell array indexing begins with the integer 1.
Similar to R, we can easily create indexed 1D arrays using shorthand, without using an explicit function call (command).
# variable with an assigned value of a 1D vector object
my_vector <- 5:10
# view the data contents of myVector using the print function
print(my_vector)
# short hand way to view the data contents of myVector
my_vector
# access the second element of the vector stored in myVector
my_vector[2]
my_array=(5 4 3 2 1); echo $my_array
Discussion
Why do we need to use the echo BASH command to print the contents of a variable to the screen?
The simple shorthand form of creating arrays in BASH is very convienient. More powerfully, we can use the declare command to create both indexed and associative arrays in BASH.
First, we will create an indexed array in both the R and BASH languages:
# list of values with different data types
my_list <- c("first", "second")
# view the contents of the list variable
print(my_list[2])
declare -a my_indexed_array; my_indexed_array[1]="first"; my_indexed_array[2]="second"; echo $my_indexed_array[2]
Now, to create an associative array in R and BASH:
# list of values with different data types
my_list <- list(cat = "Meow", dog = "Woof")
# view the contents of the list variable
print(my_list$dog)
declare -A my_assoc_array; my_assoc_array[cat]="Meow"; my_assoc_array[dog]="Woof"; echo $my_assoc_array[dog]
Discussion
What happens when you enter the following BASH code into the command line?
declare -A my_assoc_array; my_assoc_array[cat]="Meow"; my_assoc_array[dog]="Woof"; echo $my_assoc_array[2]
And what happens when you enter the following R code in the RStudio console?
# list of values with different data types my_list <- list(cat = "Meow", dog = "Woof") # view the contents of the list variable print(my_list[2])
Advanced Coding Challenge
Note that it is not possible to create multi-dimensional arrays, such as 2D arrays in the BASH language. But it is possible to basically simulate a multi-dimensional collection of data using associative arrays, for example.
Try creating your own 2D array in the BASH command language!
So, we can use functions and evaluate mathematical expressions in BASH like we have done using the R programming language in RStudio. But our experience coding while using the BASH terrminal and command line so far has not been nearly as easy and streamlined as when using RStudio. For example, we have to write code in the restrictive and clunky terminal user interface.
Key Points
BASH and R share a lot of the same basic functionalities.
Use the -h flag to examine the description of some BASH commands.
Search the internet for further information about BASH commands.
Copy and paste!