First, let’s look at the results from our meme
analysis. You can copy it to from the server at this location:
/home/FCAM/meds5420/motif
Open the .html and .txt files to see the results
Position Frequency Matrix (PFM): represents the frequency of each base occurrence at each position within the motif (Raw data):
Position Weight Matrix (PWM): Score for probablility that a base will be present at a given position. Considers the numbers of sequences and background frequency of bases. PWMs are a more realistic reflection of the binding strength of a protein for a given sequence.
Matrixes can be in many formats, see: http://meme-suite.org/doc/overview.html#motif_conversion_utilities.
You can create Sequence Logos from your enriched sequences with weblogo: http://weblogo.berkeley.edu/logo.cgi or a vector image using meme ceqlogo
:
singularity exec /isg/shared/apps/meme/5.4.1/meme.sif ceqlogo -i meme.txt -m 2 -f EPS -o atf1.eps
Motif databases:
JASPAR:http://jaspardev.genereg.net/.
CisBP: http://cisbp.ccbr.utoronto.ca/TFTools.php
Transfac: http://www.gene-regulation.com/pub/databases.html
HOCOMOCO: http://hocomoco11.autosome.ru/
Notes:
JASPAR DB is highly curated from a number of sources.
CisBP is a largely single experimental effort:
see: http://www.sciencedirect.com/science/article/pii/S0092867414010368
HOMER: http://homer.ucsd.edu/homer/custom.motifs
Transfac in NOT open access. However, UConn recently purchased University-wide licenses for GeneXplain which accesses the Transfac database: http://genexplain.com/
HOCOMOCO was made from the motif search tool ChIPmunk (not covered in this course):http://autosome.ru/ChIPMunk/
NOTE: Database must be in meme
format. Several formats can be converted using tools from the MEME tools suite.
see: http://meme-suite.org/doc/overview.html#motif_conversion_utilities.
For instance, the JASPAR motifs look like this:
head ./JASPAR_all_matrix.txt
## >MA0001.1 AGL3
## A [ 0 3 79 40 66 48 65 11 65 0 ]
## C [94 75 4 3 1 2 5 2 3 3 ]
## G [ 1 0 3 4 1 0 5 3 28 88 ]
## T [ 2 19 11 50 29 47 22 81 1 6 ]
## >MA0002.1 RUNX1
## A [10 12 4 1 2 2 0 0 0 8 13 ]
## C [ 2 2 7 1 0 8 0 0 1 2 2 ]
## G [ 3 1 1 0 23 0 26 26 0 0 4 ]
## T [11 11 14 24 1 16 0 0 25 16 7 ]
To convert a whole directory of JASPAR motif files:
singularity exec /isg/shared/apps/meme/5.4.1/meme.sif jaspar2meme -pfm DIRECTORY_INPUT > jaspar.meme
-pfm
: specifies input format
head -28 ./jaspar.meme | tail -20
##
## MOTIF CN0001.1 LM1
##
## letter-probability matrix: alength= 4 w= 16 nsites= 5332 E= 0
## 0.168230 0.079895 0.383721 0.368155
## 0.045949 0.003938 0.918792 0.031320
## 0.009377 0.051013 0.010315 0.929295
## 0.031508 0.018942 0.009002 0.940548
## 0.107839 0.052138 0.721868 0.118155
## 0.005064 0.874156 0.014254 0.106527
## 0.003563 0.904351 0.000750 0.091335
## 0.918980 0.022881 0.021943 0.036197
## 0.066954 0.029632 0.039197 0.864216
## 0.207802 0.000750 0.788635 0.002813
## 0.019505 0.006377 0.969242 0.004876
## 0.509002 0.249625 0.051013 0.190360
## 0.744561 0.018192 0.197299 0.039947
## 0.955551 0.010128 0.023631 0.010690
## 0.015191 0.915229 0.010503 0.059077
## 0.368530 0.346774 0.064891 0.219805
A complete list of JASPAR and other database motifs can be found and downloaded from the MEME
website:
https://meme-suite.org/meme/db/motifs
I typically download the databases directly from meme: https://meme-suite.org/meme/meme-software/Databases/motifs/motif_databases.12.22.tgz
You can find a few database files here: /home/FCAM/meds5420/TF_db/JASPAR/
I found that you need to copy them locally to use the files as input for TOMTOM.
We can use TOMTOM to compare our discovered motif to databases of known motifs. See documentation: http://meme-suite.org/doc/tomtom.html?man_type=web Output is a .html file and a text file. The file shows the name of the motifs (database ID), the significance of the match and the relevant consensus sequences.
Basic usage:
singularity exec /isg/shared/apps/meme/5.4.1/meme.sif tomtom -eps -oc tomtom_OUTPUT meme.txt DATABASE.meme
-eps
: creates an seqLogo of your motif aligned to each known motif that is similar.-oc
: output foldermeme.txt
is the output text file from your meme analysis.DATABASE.meme
is the database containing the PWM of known TFs.Here’s an example usage with more options:
singularity exec /isg/shared/apps/meme/5.4.1/meme.sif tomtom -no-ssc -oc tomtom_OUPUT -verbosity 1 -min-overlap 5 -mi 1 -evalue -thresh 0.05 meme.txt DATABASE.meme
-m
which motif(s) to use (depends on the number in your meme.txt file)-verbosity (1-5)
progress reporting-min-overlap
minimum number of bases overlapping between your motif and the database motifs-evalue
p-value for the match corrected for multiple testing-thresh
threshold to apply to the significance testing.So far we have done the basic motif analysis in discreet steps. MEME now offers a semi-customizable pipeline for motif discovery and comparison to databases. However, it does not run MAST and FIMO.
MEME-ChIP can:
I had to copy JASPAR2022_CORE_vertebrates*redundant.meme
to my local directory for tomtom
to work properly.
Documentation:
http://meme-suite.org/doc/meme-chip.html?man_type=web
Example:
singularity exec /isg/shared/apps/meme/5.4.1/meme.sif meme-chip -oc meme_chip_ATF1 -db JASPAR2022_CORE_vertebrates_non-redundant.meme ATF1_summit_101bp_top200.fasta -meme-nmotifs 2 -minw 5 -maxw 8 -meme-mod zoops
Note: options for each program in the pipeline can be specified by prefixing the option with the program names as shown with meme
above.
In Class Exercise:
&
to end of command) and then work on the other exercises or the midterm.There are converted JASPAR databases in the following location
/home/FCAM/meds5420/TF_db/JASPAR/
. Use either database to compare to your motif using tomtom
.
View the beginning of the tomtom.tsv output—this is just a tab-separated values text file: .tsv
. Notice that the protein ID is in the format like MA0604.1
Use grep
on the original JASPAR.meme file to find out the common transcription factor name of some of your top hits. Use grep
to see if any Atf1 motifs are found
Copy the .html
file to your local computer and view it in the browser. Any surprises regarding the TFs found?
Try searching for your motif using the web version of TOMTOM. Do you see any differences in the results?
Optional: Try running meme-chip
on your ATF1 data and viewing all the resulting files.
Many transcription factors are part of families of transcription factors that have arisen through genome or local duplication events, or in rare cases through convergent evolution. TFs in families provide redundancy, but sequence divergence amongst family members allows certain TFs to interact with different partners and / or respond to different signaling queues. However, the DNA binding domains are often the most conserved part of these proteins, which results in overlapping and similar binding sites for seemingly distinct TFs.
running TomTom
#switch to directory above where meme was performed:
singularity exec /isg/shared/apps/meme/5.4.1/meme.sif tomtom -no-ssc -oc tomtom_OUPUT -verbosity 1 -min-overlap 5 -mi 1 -evalue -thresh 0.05 ATF1_classic.meme_output/meme.txt JASPAR2022_CORE_vertebrates_non-redundant.meme
grep out some top hits:
grep MA1131.1 /home/FCAM/meds5420/TF_db/JASPAR/JASPAR2022_CORE_vertebrates_redundant.meme
grep MA0604.1 /home/FCAM/meds5420/TF_db/JASPAR/JASPAR2022_CORE_vertebrates_redundant.meme
#the - means that stdin is passed to grep:
grep -i atf1 JASPAR2022_CORE_vertebrates_redundant.meme | cut -d ' ' -f2 | grep -f - tomtom.tsv
You can also search JASPAR online as well.
Example working meme-ChIP command:
singularity exec /isg/shared/apps/meme/5.4.1/meme.sif meme-chip -oc meme_chip_ATF1 -db JASPAR2022_CORE_vertebrates_non-redundant.meme ATF1_summit_101bp_top200.fasta -meme-nmotifs 2 -minw 5 -maxw 8 -meme-mod zoops