Practical Computing for Biologists: Chapters 2, 5, 6, 16, Appendices 2, 3.
Unix “Basics” and “Finding Things” from UConn CBC: http://bioinformatics.uconn.edu/unix-basics/
Software Carpentry Shell Novice lesson: Episodes 5-7: https://swcarpentry.github.io/shell-novice/
Review basic commands and server access from UConn_Unix_basics
1. pwd
- print working directory
2. ls
- list directory contents
3. mkdir
- create a directory
4. unzip
- decompression
5. mv
- move file
6. cp
- copy file
7. cat
- print contents of file
8. touch
- create empty file
9. rm
- remove file
10. wc
- count lines/words/characters/ in file
11. >
- redirects output to new file
12. >>
- redirects output to append to existing file
13. *
- wildcard that specifies any input
One can select multiple files using the *
wildcard. Navigate to the ~/MEDS5420/lec02_files
directory and type:
wc *.txt
Instead of seeing the 3 columns of numbers for the number of lines,
words and characters, we can limit the wc
command to only show us
the number of lines using the -l
argument:
wc -l *.txt
One can also add some specificity to wild cards using brackets: []
wc -l [Wt]*.txt
# this is equivalent to saying files that start with a "W" or "t"
Let’s find which file is shortest. Let’s save the wc
output to disk
with the redirection >
operator; then we can verify the contents of
length.txt
are the same as what wc
produces using cat
or less
:
wc -l *.txt > lengths.txt
cat lengths.txt
less lengths.txt
To find the shortest file, we then sort the lengths using the sort
command. We then pick the top shortest file using head -n 1
:
sort -n lengths.txt > sorted-lengths.txt
head -n 1 sorted-lengths.txt
Using the intermediate files can be confusing, especially in more
complex problems. We can save a lot of messy files and typing using
pipes (|
):
wc -l *.txt | sort -n | head -n 1
A file called animals.txt contains the following data:
deer
rabbit
raccoon
rabbit
deer
fox
rabbit
bear
What text passes through each of the pipes and the final redirect in the pipeline below? Manually rearrange and parse the input before you run or deconstruct the command.
cat animals.txt | head -n 5 | tail -n 3 | sort > final.txt
Alter the commands to get only all three rabbits as the final output.
Command | Function |
---|---|
gzip |
compression/decompression tool using Lempel-Ziv coding (LZ77) |
tar |
Bundling files in folders |
Command | Function |
---|---|
grep |
Global Regular Expression Print (useful flags: -w , -i , -v , -n ) |
find |
Recursively list all files and directories and filter |
1. Variables (creating and printing to screen).
2. Basics of shell scripts.
Download and move the data-shell.tar from GitHub to your MEDS5420 folder. See the third code chunk of section 6 of Lecture 2 for how to accomplish this for Windows OS.
We already unzipped a file using unzip:
unzip -d Example_files Example_files.zip
Other types of archives you will encounter:
.tar # bundles multiple files or folders
.gzip # compressed file