Contents

1 Readings

Practical Computing for Biologists: Chapters 1, 4, 5, Appendix 3.

Unix Basics from UConn CBC: http://bioinformatics.uconn.edu/unix-basics/

Software Carpentry Shell Novice lesson: Episodes 1-4: https://swcarpentry.github.io/shell-novice/

2 Lesson overview

Command Function
pwd Print Working Directory
cd Change Directoy (cd, cd -, cd ~/<directory>, cd ..)
ls List files (ls -F, ls --help)
man Manual page for a command
mkdir Make a new directory
emacs A rudimentary command-line text editor
rm Delete file(s); use rm -r <directory> to delete directory and contents
touch Create empty file
cp Copy files and directories
mv Move or rename files and directories
wc Word Count (wc -l)
cat Print an entire file, or concatenate multiple files
less Read a file, one page at a time
sort Sort lines (sort -n, sort -r)
head Read beginning lines of a file (head -n #)
tail Read last few lines of a file (tail -n #)

2.1 Concepts:

  • Terminology and basic commands
  • Directory structure
  • Navigating through command line: relative vs absolute paths
  • Examining file contents
  • Using the * wildcard to select multiple files in a directory, using the [] wildcard to select one or more letters, e.g. *[AB].txt for file names ending in A or B.
  • Standard Output (stdout) and Standard Input (stdin) from commands can be combined using with pipes (|).
  • Redirect stdout to files (>).

2.2 Why learn the command-line?

  • The majority of bioinformatics and computational software is developed for command-line use from the shell
  • Low system resource use (processor and memory)
  • Easier to automate and more adaptable than GUIs
  • Web-based tools are at risk of becoming obsolete (e.g. Galaxy, GenomeSpace) as more scientists devolop command-line competence

Notes about reading these documents:

Sections highlighted in grey are shell input or “standard input” (stdin).
Lines following it prefixed by '##' denote shell output or “standard output” (stdout):

x='Welcome to MEDS 5420'
echo $x
## Welcome to MEDS 5420

3 Interacting with your computer: the terminal

You will communicate with the operating system (OS) by typing commands into the terminal window. You can use the terminal window to:

4 Bash, Shell, Terminal, Command Line: What’s the difference?

Command Line is the most general and refers to typing commands directly into a terminal that can be executed by the computer.
Shell (sh) is specific program (language written by Steve Bourne while at Bell Labs) that processes commands and returns output.
Bash stands for Bourne Again Shell and is an updated version of the Shell language. This is the most popular Shell.
Terminal is a user interface that takes input and provides an output in text format; the interface runs the input through Shell or Bash to process the command.

6 Dealing with files and text:

We’re going to start using more system utilities or command line utilities.  The general format is:

command [options] target_file(s)

First we will make a MEDS5420 folder in our home directory:

cd ~
mkdir MEDS5420

Go to GitHub /guertinlab/meds5420/Lecture2_command_line/ and download the lec02_files.zip. If your browser automatically unzips compressed files, you need to change this preference (on Safari: Preferences > General > uncheck open “safe” files after downloading)

Use the Terminal window to list the contents of the downloads folder to confirm the download.

Let’s move the dowloaded file to the ‘MEDS5420’ folder you created:

mv ~/Downloads/lec02_files.zip ~/MEDS5420/

If you are using Ubuntu in Windows, you can access your Windows C drive in the Ubuntu Terminal through the PATH: /mnt/c/, then it is usually /mnt/c/Users/<username>/Downloads to navigate to the location the file downloaded. The following command will move the file to your directory.

mv /mnt/c/Users/<username>/Downloads/lec02_files.zip ~/MEDS5420

Move (mv) can also be used to rename files:

mv <old_name> <new_name>

Now switch (navigate) to the MEDS5420 folder.

cd ~/MEDS5420/

to unzip the file, use:


unzip -v lec02_files.zip

The format is:

unzip [options] <target_directory> <file.zip>

Check the contents of the folder to see the results.
What happens if you run this without the ‘-v’ option and without specifying the target directory?

*Note on unzip usage: Depending on your OS, the ‘-d’ option may be needed in order to unzip the contents into a specific folder. In this case you will also need to designate the name of the output directory to where the files will be unpacked. example:

unzip -d lec02_files lec02_files.zip

Viewing file content Data from HTS experiments is generally in the form of large text files. These files will crash your computer if you try to open them with standard GUI programs (gEdit, textEdit or Word). There are lots of ways to get around this.

To view the beginning of a file:

head Wonderful_world.txt
## What A Wonderful World
## 
## By Bob Thiele, George David White
## 
## I see trees of green
## Red roses too
## I see them bloom
## For me and you
## And I think to myself
## What a wonderful world

To view the end of a file:

tail -n 3 Wonderful_world.txt
## 
## Yes, I think to myself
## What a wonderful world

You can incrementally load parts of a file with less:

less the_raven.txt

When using less you can navigate with the following commands (see Appendix 3 for more):

Print entire contents to screen:

cat the_raven.txt

* If you accidentally print a large file to the screen, stop it with control-c to Cancel it.

Getting information about files

How many lines, words, or characters does my file have:

wc the_raven.txt
##      127    1073    6906 the_raven.txt

Just count the number of lines:

wc -l the_raven.txt
##      127 the_raven.txt

Have a look at the manual for wc to see other output options.

Basic file manipulation Use touch to create an empty file

touch empty_file.txt

Anything that is printed to screen can be saved in a file using the redirection operator (>):

cat the_raven.txt > raven_copy.txt

Screen output can also be appended to the end of an existing file:

cat the_raven.txt >> empty_file.txt

Multiple files can be pooled in this way:

cat the_raven.txt Wonderful_world.txt > pool.txt

6.1 Exercise 2: Copy with Multiple Filenames

What does cp do when given several filenames and a directory name, as in:

mkdir backup
cp the_raven.txt Thoreau_quotes.txt backup

What does cp do when given three or more filenames, as in:

cp the_raven.txt Thoreau_quotes.txt animal.txt

7 Next Time

7.1 TO DO: Get a scripting text editor

MAC USERS:
Download Text Wrangler here: http://download.cnet.com/TextWrangler/3000-2351_4-10220012.html and then install it.

OR

BBedit: https://www.barebones.com/products/bbedit/

PC USERS: download notepad++ here: https://notepad-plus-plus.org/
Can also use Sublime or Visual Studio


Note: You can also use emacs or other command line editors such as nano or vim. We will be using emacs when we work on the server soon.

8 Answers to Exercises

8.1 Answers to Exercise 1

  1. No: . stands for the current directory
  2. No: / stands for the root directory
  3. Yes: Dr. McClintock’s home directory is /home/mcclintock
  4. No: this goes up two levels, i.e. ends in /home
  5. Yes: ~ stands for the home directory, /home/mcclintock
  6. No: this would navigate into a directory home
  7. Yes: unnecessarily complicated, but correct
  8. Yes: shortcut to go back to the home directory
  9. Yes: goes up one level

8.2 Answers to Exercise 2

In the first instance, cp will make a copy of each of the files, citations.txt and quotations.txt into the directory backup/.

In the second instances, cp gives an error when we provide 3 files as arguments. To understand the error, see the output of cp --help or man cp. The usage line towards the top indicates that the last argument must be a directory when we are providing more than 2 arguments.