Readings: Practical Computing for Biologists: Chapter 6.
Review basic commands and server access from UConn_Unix_basics
find
find .
find
shows us files and directories.
The power of find is in specifying “tests” or “filters”.
find . -type f
find . -type d
The above filters search for files and directories, respectively.
Search depth:
One can specify how far down the file hierarchy to go by controlling depth
(first you should navigate one directory closer to root than the data-shell
directory):
find ./data-shell -maxdepth 2 -mindepth 2 -type f
The above command searches for all files two directory levels within the data-shell folder.
Quick try: Try other combinations of levels and types and verify by counting number of items in output.
Let’s try matching by name:
find ./data-shell -name "*.txt"
Quick try: Combine find
and grep
to find the number of text files within 2 and 3 levels inside of the data-shell folder.
The find command can be given several other criteria known as “tests”
to locate files with specific attributes, such as creation time, size,
permissions, or ownership. Use man find
to explore these, and then
write a single command to find all .txt files in or below the current
directory that were modified in the last 168 hours.
Hint 1: you will need to use three tests: -type
, -name
, and -mtime
Hint 2: The value for -mtime
will need to be negative - why?
variables are strings that can be assigned values. To create a variable use the following format:
var=variable # when setting variable do not use spaces
To see what the variable is you can print it to the screen with echo:
echo $var # the '$' designates that this is a variable
*Try using echo without the dollar sign.
Whole sentences or lists can be designated as variables:
fileList=*.txt
echo $fileList
## SC_example.txt TS_example.txt Thoreau_quotes.txt Wonderful_world.txt animals.txt color-table.txt haiku.txt mm9_genes.txt newTable.txt the_raven.txt
If your variable is going to be combined with another string, make sure you surround the variable with a curly bracket. For instance:
School=MEDS
echo listing: $School
echo this class is $School_5420 # something's wrong
echo this class is ${School}_5420 # properly inserts variable
## listing: MEDS
## this class is
## this class is MEDS_5420
In short, double quotes allows variables within strings to be interpreted, whereas single quotes makes them literal.
Try out this example:
instructor="Michael Guertin"
echo "My instructor for MEDS5420 is $instructor"
## My instructor for MEDS5420 is Michael Guertin
or
instructor="Michael Guertin"
echo 'My instructor for MEDS5420 is $instructor'
## My instructor for MEDS5420 is $instructor
Uses for backticks - the key usually just under the escape key.
Backticks allow one to insert the result of a command in place within a command line.
One nice use for this is to set variables as outputs of commands.
Here’s an example with the command date
that prints the date and time to the screen:
Compare the two following examples:
info=date
echo The date and time is: $info
## The date and time is: date
vs.
info=`date`
echo The date and time is: $info
## The date and time is: Thu Feb 9 15:54:42 EST 2023
Backticks can cause problems when using other quotations, so there is another way to run a command in place:
echo The date and time is: $(date)
## The date and time is: Thu Feb 9 15:54:42 EST 2023
Scripts are a logially ordered set of commands used to process files. They can be simple routines or complex programs. There are three main things one needs when writing scripts in general:
1. The commands themselves
2. Information about what language should be used to interpret the commands
3. Permission to execute commands in script file.
Certain text editors are designed for scripting and can recognize code. Importantly, they do not embed the document or fonts with hidden characteristics that can cause problems when running programs on you computer. There are three features that you should look for in an editor:
1. language specific highlighting
2. line number display
3. version control
MAC USERS: Download BBedit here: http://www.barebones.com/products/bbedit/download.html?ref=tw_alert: http://www.barebones.com/products/bbedit/download.html?ref=tw_alert and then install it:
Open your text editor: gedit on Linux or BBEdit or textEdit on Mac.
PC USERS: download notepad here: https://notepad-plus-plus.org/download/v7.5.8.html
Note: You can also use emacs or other command line editors such as nano
or vim
. These command line editors are the functional equivalent of opening a file in BBEdit, TextEdit, or NotePad. THe interface is a bit clunky and requires keyboard prompts to save, write, and exit. We will be using emacs
or nano
when we work on the server next time.
A quick primer for emacs
is:
#generate the file
touch filename.sh
# open the emacs command line editor
emacs filename.sh
#you are now in EMACS
write some code
ctrl-X ctrl-W to save as another name
make edits
ctrl-X ctrl-S to save
ctrl-X ctrl-C to exit
#you are back in the Terminal
A quick primer for nano
is:
touch filename_nano.sh
nano filename_nano.sh
#you are now in NANO
write some code
ctrl-O, then <Enter/Return> to save
ctrl-O, then backspace to write as a new file name, then <Enter/Return> to save
make edits
ctrl-O, then <Enter/Return> to save
ctrl-X to exit
The first line is the Shebang line:
#! /bin/sh
or sometimes
#! /usr/bin/sh
to find out where your shell is type:
which sh
Let’s try a simple script:
ls -la
echo "Above are the directory listings for the following folder"
pwd
Create a new folder in your MEDS5420 folder called ‘scripts’
Save your script as ls.sh
Go to the directory where ls.sh is and try to execute it:
./ls.sh
In order to run this script we need to give the proper permissions. To see permissions for files, type:
ls -la
The columns are:
1. permissions
2. owner
3. group
4. size
5. last modifiation
6. name
In permissions: ‘d’=directory, ‘-’ = file, ‘r’ = read, ‘w’ = write, ‘x’ = execute.
Three sets of permissions are shown: User (owner), Group, Other users.
To give permission to execute type:
chmod +x ls.sh
Now use ls -la
to view permissions and then try to execute.
Other ways to designate permissions:
To give permission for everone to read, write, and execute a script use:
chmod 777 ls.sh
list="1 2 4 6"
for x in $list
do
echo Hello people of MEDS_5420
done
## Hello people of MEDS_5420
## Hello people of MEDS_5420
## Hello people of MEDS_5420
## Hello people of MEDS_5420
The numbers mean nothing here. They are just placeholders such that every time an item is encountered the loop repeats itself. For instance:
list="a b c"
for x in $list
do
echo Hello people of MEDS_5420
done
## Hello people of MEDS_5420
## Hello people of MEDS_5420
## Hello people of MEDS_5420
Create a shell script, called grad_folders.sh
that does the following:
1. Create a variable that lists the following items: notebook, raw_data, figures, manuscripts.
2. Create a folder for each item that is named the same as each item.
3. Print to the screen what is happening (i.e. that you are creating a folder).
4. Copy the haiku.txt file into each folder.
1. How to navigate your computer from your terminal and create or find files and folders (cd, ls, mv, rm, mkdir, touch, find
)
2. How to view the content of files (head, tail, less), and search for specific lines or items (grep)
3. How to combine multiple files together (cat) and redirect terminal output into new or existing files (> or > > ).
4. How to string several commands together with pipes (‘|’)
5. The importance of quoting syntax.
6. The beginnings of shell scripting.
1. Learn a few more useful terminal commands: sort, cut, uniq
Download color-table.txt
from the Lecture 4 folder in GitHub and move this file to your MEDS5420 folder.
It is not immediately apparent how to download the file from GitHub, but you do have access to the Raw file by clicking Raw. We can use a command in the Terminal to directly retrive this raw file without having to click Save As in the browser.
If you have a Mac, then curl
is the default method to retrieve files from URLs:
curl -O https://raw.githubusercontent.com/guertinlab/meds5420/main/Lecture4_command_line/color-table.txt
#the manual will tell you what the -O (the letter, not a zero) option does
Linux OS have wget
:
wget https://raw.githubusercontent.com/guertinlab/meds5420/main/Lecture4_command_line/color-table.txt
Let’s tuck the color-table.txt file away into some new directories:
#start from the MEDS5420 folder:
mkdir ./in_class
mkdir ./in_class/colors
mv color-table.txt ./in_class/colors
table="./in_class/colors/color-table.txt"
head $table
## This 1 red
## is 2 orange
## a 4 yellow
## test 4 green
## this 7 blue
## is 6 purple
## only 80 red
## a 19 orange
## test 100 yellow
## if 6 green
The cut command is useful for splitting strings based on user-defined delimiters. For instance, if I want to extract only the time from the date command you can try this:
# selects the 4th item after the line is split by spaces (" ")
echo "the date and time is:"
echo $(date)
echo "the time is:"
echo $(date) | cut -d " " -f 4
## the date and time is:
## Thu Feb 9 15:54:42 EST 2023
## the time is:
## 15:54:42
-d
: is the delimiter set by user. Default is tab: \t
-f
: the fields to select for after splitting. Can list each (1,2,3) or list a range (1-3)
reversing a string:
# returns backwards string
echo `date` | rev
## 3202 TSE 24:45:51 9 beF uhT
Can be done on any part of a pipe
echo $(date)
echo "the reverse time is:"
echo $(date) | cut -d " " -f 4 |rev
## Thu Feb 9 15:54:42 EST 2023
## the reverse time is:
## 24:45:51
Extracting columns from tables: Cut can also be used to extract columns from tables.
Let’s just get the first column of color-table.txt
:
table="./in_class/colors/color-table.txt"
cat $table | cut -f 1
## This
## is
## a
## test
## this
## is
## only
## a
## test
## if
## this
## had
## been
## a
## real
## emergency
Sorting columns: Simple sorting of columns
sort -k 2 color-table.txt | head -n 5
# k followed by a number represents the column to sort by.
##
## This 1 red
## test 100 yellow
## a 19 orange
## is 2 orange
Notice how numbers are handled. They are handled as a string of numbers and each position in evaluated seperately. If you want a true numeric sort, try this:
#numerical sort on column 2
sort -k 2n color-table.txt | head -n 5
##
## This 1 red
## is 2 orange
## a 4 yellow
## real 4 yellow
The option -k 2
sorts on column 2, but if column 2 is identical, the row will continue to be sorted until a distinct character can differentiate. The following will only sort on column 2 and retain the original relative order of row that have identical column 2 values.
#numerical sort on column 2
sort -k 2,2n color-table.txt | head -n 5
##
## This 1 red
## is 2 orange
## a 4 yellow
## real 4 yellow
Numerical sorts are ascending, to return a descending sort, try the following:
#numerical sort on column 2
sort -k 2nr color-table.txt | head -n 5
## test 100 yellow
## only 80 red
## a 60 orange
## had 54 purple
## been 23 red
Finding unique items in list. You can use uniq
to determine how many times an item appears in a list.
# -c prints the number of each item
cat color-table.txt | cut -f 3 | uniq -c | head
## 1 red
## 1 orange
## 1 yellow
## 1 green
## 1 blue
## 1 purple
## 1 red
## 1 orange
## 1 yellow
## 1 green
One pitfall is that it only operates on adjacent items, so lists must be sorted first:
# sorting first gives true number of unique items
cat color-table.txt | cut -f 3 |sort | uniq -c | head
## 1
## 2 blue
## 3 green
## 3 orange
## 2 purple
## 3 red
## 3 yellow
Consider this filename at the end of the path: /tempdata3/MEDS5420/data/raw/chip_repA.txt
Devise a way to split the string and report the filename without referencing the exact position. i.e. imagine that you want to get the last item in the path without knowing how long the path is.
find . -type f -name '*.txt' -mtime -7
#! /bin/sh
folders="notebook raw_data figures manuscripts"
for x in $folders
do
echo "creating the following directory": $x
mkdir $x
cp ~/path/to/haiku.txt $x
done
Retrieve the last item in a string of unknown length
file="/tempdata3/MEDS5420/data/raw/chip_repA.txt"
echo $file | rev | cut -d "/" -f 1 | rev