R
and Basic of plotting with R
Download and install the correct version of R
(and RStudio
if you’d like) from:
http://www.rstudio.com/products/rstudio/download/
Also, copy the following files from the server to your computer or home directory on Xanadu for loading into R
:
/home/FCAM/meds5420/data/ATF1/closest_peak_TSS/peaks_promoters.bed
/home/FCAM/meds5420/data/ATF1/reads_in_peaks/ATF1_summits_rep1_counts_2col.bed
/home/FCAM/meds5420/data/ATF1/reads_in_peaks/ATF1_summits_rep2_counts_2col.bed
Make a directory in your class folder on your machine (or on Xanadu), and move the file there.
R
and Basic of plotting with R
R
: subsetting tablesOpen an R
environment and set you working directory to folder where the peaks_promoters.bed
file is.
getwd() # no input or options needed
## [1] "/Users/guertinlab/Documents/class/2023_lec20"
setwd("/Users/guertinlab/MEDS5420")
getwd()
The peaks_promoters.bed
file contains information on which gene TSS is closest to ATF1 peaks that we called with MACS. Recall that we used bedtools closest
to find the distance between ATF1 summits and TSSs and report them relative to genes with consideration of strand.
Read this table into your R session with read.table
:
#read in table
closest <- read.table("peaks_promoters.bed")
#check structure
head(closest,3)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## 1 chr1 778706 778707 ATF1_peak_1 82.7650 chr1 778747 778748 LINC01409 ENSG00000237491 + 41
## 2 chr1 1000890 1000891 ATF1_peak_2 41.9708 chr1 1001138 1001139 ISG15 ENSG00000187608 + 248
## 3 chr1 1471676 1471677 ATF1_peak_3 30.5627 chr1 1471765 1471766 ATAD3B ENSG00000160072 + 89
Let’s check how many chromosomes are present in this data:
#report unique items in a list or vector
unique(closest[,1])
## [1] "chr1" "chr10" "chr11" "chr11_KI270832v1_alt" "chr12" "chr13"
## [7] "chr14" "chr14_GL000225v1_random" "chr14_KI270846v1_alt" "chr15" "chr16" "chr17"
## [13] "chr17_KI270857v1_alt" "chr18" "chr18_GL383572v1_alt" "chr18_KI270863v1_alt" "chr18_KI270911v1_alt" "chr19"
## [19] "chr1_KI270763v1_alt" "chr2" "chr20" "chr21" "chr21_KI270872v1_alt" "chr22"
## [25] "chr22_KI270733v1_random" "chr3" "chr3_KI270779v1_alt" "chr3_KI270924v1_alt" "chr4" "chr5"
## [31] "chr5_GL339449v2_alt" "chr5_KI270791v1_alt" "chr5_KI270897v1_alt" "chr6" "chr6_GL000252v2_alt" "chr6_GL000254v2_alt"
## [37] "chr6_GL000255v2_alt" "chr6_KI270797v1_alt" "chr7" "chr7_KI270809v1_alt" "chr8" "chr8_KI270816v1_alt"
## [43] "chr9" "chr9_KI270718v1_random" "chrUn_GL000216v2" "chrUn_GL000220v1" "chrX" "chrX_KI270880v1_alt"
## [49] "chrX_KI270913v1_alt"
Looks like this is the whole dataset, but what if you only want the part from your chromosome. Here are two ways to do this:
#subset a table based on a logical match to a column
closest_chr17<- subset(closest, closest[,1]=='chr17')
head(closest_chr17, 3)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## 203 chr17 782326 782327 ATF1_peak_202 66.2886 chr17 782353 782354 MRM3 ENSG00000171861 + 27
## 204 chr17 4143675 4143676 ATF1_peak_203 14.7257 chr17 4143168 4143169 CYB5D2 ENSG00000167740 + -507
## 205 chr17 4704157 4704158 ATF1_peak_204 30.8864 chr17 4704336 4704337 PELP1 ENSG00000141456 - 179
#subset a table based on a logical match to a column
closest_chr17<- closest[closest[,1]=='chr17',]
#notice the syntax is based on table indexing
head(closest_chr17, 3)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## 203 chr17 782326 782327 ATF1_peak_202 66.2886 chr17 782353 782354 MRM3 ENSG00000171861 + 27
## 204 chr17 4143675 4143676 ATF1_peak_203 14.7257 chr17 4143168 4143169 CYB5D2 ENSG00000167740 + -507
## 205 chr17 4704157 4704158 ATF1_peak_204 30.8864 chr17 4704336 4704337 PELP1 ENSG00000141456 - 179
Notes on relational operators for logical tests:
Site: http://sites.stat.psu.edu/~dhunter/R/html/base/html/Comparison.html
Operator : Interpretation
<
y : x is less than y>
y : x is greater than y<=
y : x is less than or equal to y>=
y : x is greater than or equal to y==
y : x is equal to y!=
y : x is not equal to yAlso when conducting logical tests:
&
= and|
= orOne simple way to plot the distribution of a data set is to use boxplots:
boxplot(closest[,12])
Figure 1: anatomy of a box plot
median - Middle dark line.
box - Interquartile ranges. i.e. extending from 25th percentile to 75th percentile.
whisker - Extremes (cover 90% of the data).
outliers - Points outside of whiskers.
The default parameters from boxplot
are not great for publication. However, there are options specific to the boxplot function that we can use to make the figure more meaningful and understandable.
For instance: see help(boxplots)
)
outline=F
)border="red"
)lwd=2
)boxplot(closest[,12], outline=FALSE, border='red', col='white', lwd=2)
lwd = line width
col = inside box color
border= border color
outline= display outliers (if TRUE)