Contents

1 Readings

Practical Computing for Biologists: Chapters 2, 5, 6, 16, Appendices 2, 3.

Unix “Basics” and “Finding Things” from UConn CBC: http://bioinformatics.uconn.edu/unix-basics/

Software Carpentry Shell Novice lesson: Episodes 5-7: https://swcarpentry.github.io/shell-novice/

Review basic commands and server access from UConn_Unix_basics

2 Last Time:

2.1 Command line navigation:


1. Complete path:

head /home/username/MEDS5420/lec02_files/the_raven.txt
head /Users/username/MEDS5420/lec02_files/the_raven.txt
head ~/MEDS5420/lec02_files/the_raven.txt

OR

2. By using the relative path starting from where you are:

head ./username/MEDS5420/lec02_files/the_raven.txt
head username/MEDS5420/lec02_files/the_raven.txt

2.2 Command line utlities so far:

1. pwd - print working directory
2. ls - list directory contents
3. mkdir - create a directory
4. unzip - decompression
5. mv - move file
6. cp - copy file
7. cat - print contents of file
8. touch - create empty file
9. rm - remove file
10. wc - count lines/words/characters/ in file
11. > - redirects output to new file
12. >> - redirects output to append to existing file
13. * - wildcard that specifies any input

3 Pipes, filtering with wildcards, redirecting outputs to files

One can select multiple files using the * wildcard. Navigate to the ~/MEDS5420/lec02_files directory and type:

wc *.txt

Instead of seeing the 3 columns of numbers for the number of lines, words and characters, we can limit the wc command to only show us the number of lines using the -l argument:

wc -l *.txt

One can also add some specificity to wild cards using brackets: []

wc -l [Wt]*.txt
    # this is equivalent to saying files that start with a "W" or "t"

Let’s find which file is shortest. Let’s save the wc output to disk with the redirection > operator; then we can verify the contents of length.txt are the same as what wc produces using cat or less:

wc -l *.txt > lengths.txt
cat lengths.txt
less lengths.txt

To find the shortest file, we then sort the lengths using the sort command. We then pick the top shortest file using head -n 1:

sort -n lengths.txt > sorted-lengths.txt
head -n 1 sorted-lengths.txt

Using the intermediate files can be confusing, especially in more complex problems. We can save a lot of messy files and typing using pipes (|):

wc -l *.txt | sort -n | head -n 1

3.1 Exercise 1: Pipe Reading Comprehension

A file called animals.txt contains the following data:

deer
rabbit
raccoon
rabbit
deer
fox
rabbit
bear

3.1.1 Part 1:

What text passes through each of the pipes and the final redirect in the pipeline below? Manually rearrange and parse the input before you run or deconstruct the command.

cat animals.txt | head -n 5 | tail -n 3 | sort > final.txt

3.1.2 Part 2:

Alter the commands to get only all three rabbits as the final output.

4 Additional Commands:

4.1 File Compression

Command Function
gzip compression/decompression tool using Lempel-Ziv coding (LZ77)
tar Bundling files in folders

4.2 Finding things:

  • Files in directories
  • words in files
Command Function
grep Global Regular Expression Print (useful flags: -w, -i, -v, -n)
find Recursively list all files and directories and filter

4.3 Concepts:

1. Variables (creating and printing to screen).
2. Basics of shell scripts.

5 Dealing with compressed files (archives)

Download and move the data-shell.tar from GitHub to your MEDS5420 folder. See the third code chunk of section 6 of Lecture 2 for how to accomplish this for Windows OS.

We already unzipped a file using unzip:

unzip -d Example_files Example_files.zip

Other types of archives you will encounter:
.tar # bundles multiple files or folders
.gzip # compressed file