PDF version

Programas de bioinformatica
instalados
1.0 Edition
Programas de bioinformatica instalados :
1.0 Edition
Published Feb 04 2016
Copyright © 2016 IBt-UNAM
Table of Contents
Prologo ...........................................................................................................................................................................v
1. Lista de los programas instalados ...........................................................................................................................1
2. Uso de los programas..............................................................................................................................................11
2.1. Module .........................................................................................................................................................11
2.2. Uso de script file...........................................................................................................................................12
2.3. Programas especiales....................................................................................................................................12
3. Copyrights ...............................................................................................................................................................15
3.1. Copyrigths de los programas........................................................................................................................15
iii
List of Tables
1-1. Lista programas .......................................................................................................................................................1
iv
Prologo
Ese documento presenta los programas de bioinformatica que se instalaron en el cluster, en la idea de proponer una
ayuda a los que quisieran usarlos.
v
Chapter 1. Lista de los programas instalados
La lista siguiente muestra los programas que se instalaron ademas de los por default en el cluster.
Table 1-1. Lista programas
Tema
UbicaciÃ3 n
Assembly By Short Sequences
/share/apps/uuab/abyss-1.5.1
Art: a next-generation sequencing read simulator.
/share/apps/uuab/Art-1.5.1
AutoDock is a suite of automated docking tools.
/share/apps/Autodock-4.2
AutoDock Vina is an open-source program for doing molecular docking.
/share/apps/Autodock_vina-1.1.2/
Programa
Version
Abyss
1.5.1
Art
1.5.1
Autodock
4.2
Autodock
1.1.2
Vina
toolkit for reading, writing, and manipulating BAM files
/share/apps/uuab/bamtools-2.3.0/
Bamtools
2.3.0
To demultiplex data and convert BCL files to FASTQ file formats
bcl2fastq
2.17.1.14
Finding feature overlaps and computing coverage
/share/apps/bcl2fastq2-2.17.1.14/
/share/apps/BEDTools-2.20.0/
BEDTools
2.22.0
BEAST is a cross-platform program for Bayesian MCMC analysis of
/share/apps/BEASTv1.7.5/
BEAST
1.7.5
molecular sequences.
1
Chapter 1. Lista de los programas instalados
Tema
UbicaciÃ3 n
The NCBI Basic Local Alignment Search Tool (BLAST) finds regions of
/share/apps/ncbi-blast-2.2.29/
Programa
Version
Blast+
2.2.29
local similarity between sequences.
ultrafast and memory-efficient tool for aligning sequencing reads to long
/share/apps/Bowtie1-1.1.2/
Bowtie1
1.1.2
reference sequences.
ultrafast and memory-efficient tool for aligning sequencing reads to long
/share/apps/uuab/Bowtie2-2.2.4/
Bowtie2
2.2.4
reference sequences.
Burrows-Wheeler Aligner
/share/apps/uuab/Bwa-0.7.12/
Illuma toolkit
/share/apps/CASAVA-1.8.2/
Cluster Database at High Identity with Tolerance.
/share/apps/Cd-hit-4.6.4/
Circos is a software package for visualizing data and information.
/share/apps/Circos-0.69/
Bwa
0.7.12
Casava
1.8.2
Cd-hit
4.6.4
Circos
0.69
CleaveLand 3 is desgined to process degradome data and small RNA
/share/apps/uuab/Cleaveland-3.0.1/
Cleaveland
3.0.1
target predictions and to output concise results depicted sliced small
RNA target RNAs.
Clustal Omega is the latest addition to the Clustal family. It offers a
Clustal
1.1.0
significant increase in scalability over previous versions, allowing
Omega
hundreds of thousands of sequences to be aligned in only a few hours.
/share/apps/Clustal_Omega-1.1.0/
Cortex is an efficient and low-memory software framework for analysis
Cortex
1.0.5.15
of genomes using sequence data.
/share/apps/uuab/cortexassembler-1.0.5.15
2
Chapter 1. Lista de los programas instalados
Tema
UbicaciÃ3 n
Fast and parallel gapped read alignment to large genomes
/share/apps/uuab/Cushaw2-2.4.3
Programa
Version
Cushaw2
2.4.3
Dfilter is a generalized signal detection tool for analyzing next-gen
/share/apps/uuab/Dfilter
Dfilter
N/C
massively-parallel sequencing data by using ROC-AUC maximizing
linear filter.
A Quality Control application for FastQ files
/share/apps/uuab/FastQC
A short-reads pre-processing tools
/share/apps/FastX-Toolkit
Fast Length Adjustment of Short reads
/share/apps/Flash
Bayesian haplotype-based polymorphism discovery and genotyping
/share/apps/Freebayes
FastQC
0.11.2
FastX-Toolkit
0.7
Flash
1.2.11
FreeBayes
0.9.20
FragGeneScan is an application for finding (fragmented) genes in short
/share/apps/uuab/FragGeneScan-1.17
reads.
FreeGeneScan
1.18
FusionMap is an efficient fusion aligner which aligns reads spanning
/share/apps/FusionMap
FusionMap
12/08/2012
fusion junctions directly to the genome without prior knowledge of
potential fusion regions.
software package developed at the Broad Institute to analyse /share/apps/uuab/GenomeAnalysisTKLite-2.2-5
GenomeAnalysisToolkit
2.2.5
next-generation resequencing data
(GATK)
Short read mapping tool designed for accurate read alignments
/share/apps/uuab/genomemapper-0.4.3
Genomemapper
0.4.3
3
Chapter 1. Lista de los programas instalados
Programa
Version
Tema
HISAT2 is a fast and sensitive alignment program for mapping
UbicaciÃ3 n
/share/apps/Hisat2-2.0.1
Hisat2
2.0.1
next-generation sequencing reads (both DNA and RNA) against the
general human population (as well as against a single reference genome).
HMMER is used for searching sequence databases for homologs of
/share/apps/Hmmer-3.1b1
Hmmer
3.1b1
protein sequences, and for making protein sequence alignments.
Iterative Correction of Reference Nucleotides
/share/apps/uuab/iCORN-0.97
iCORN
0.97
Efficient phylogenetic tree reconstruction and ultrafast bootstrap
approximation
iq-tree
0.9.5
Fast, Parallel k-mer Counting for DNA
/share/apps/iq-tree-0.9.5/
/share/apps/jellyfish-1.1.11/
Jellyfish
1.1.11
Tool to carry out statistical selection of best-fit models of nucleotide
/share/apps/jmodeltest-2.1.2/
Jmodeltest2
2.1.2
substitution
Multiple alignment program for amino acid or nucleotide sequences
/share/apps/Mafft-7.058/
Mafft
7.058
Motif-based sequence analysis tools
/share/apps/Meme-4.9.1
Meme
4.10.0
Meta-IDBA is an iterative De Bruijn Graph De Novo short read
Meta-IDBA
1.1.1
assembler specially designed for de novo metagenomic assembly.
Short read assember for metagenomics
/share/apps/Meta-IDBA-1.1.1
/share/apps/MetaVelvet-1.2.01
MetaVelvet
1.2.01
4
Chapter 1. Lista de los programas instalados
Tema
UbicaciÃ3 n
MicroRazerS - Rapid Alignment of Small RNA Reads
/share/apps/uuab/MicroRazerS
Programa
Version
MicroRazerS
1.0
MIP Scaffolder is a program for scaffolding contigs produced by
/share/apps/uuab/mip-scaffolder-0.5/
MIP
0.5
fragment assemblers using mate pair data such as those generated by ABI
Scaffolder
SOLiD or Illumina Genome Analyzer.
MIRA is the swiss army knife of sequence assembly that I’ve used and
/share/apps/uuab/mira-4.0rc1/
developed during the past 10 years to get assembly jobs I work on done
MIRA
4.0rc1
efficiently - and especially accurately.
Discovering known and novel miRNAs from deep sequencing data.
/share/apps/mirdeep2/
mirDeep2
0.0.5
MOPAC (Molecular Orbital PACkage) is a semiempirical quantum
/share/apps/Mopac
chemistry program based on Dewar and Thiel’s NDDO approximation.
Mopac
2012
This project seeks to develop a single piece of open-source, expandable
/share/apps/Mothur-1.36.1/
Mothur
1.36.1
software to fill the bioinformatics needs of the microbial ecology
community.
Bayesian Inference of Phylogeny
/share/apps/MrBayes-3.2.5
MrBayes
3.2.5
Multiple sequence alignment (Faster and more accurate than
/opt/Bio/muscle/
CLUSTALW)
Muscle
3.8.31
Scalable molecular dynamics
/share/apps/NAMD-2-9
Namd
2.9
5
Chapter 1. Lista de los programas instalados
Tema
UbicaciÃ3 n
Scalable molecular dynamics
/share/apps/NAMD-2-9-CUDA5.5
Programa
Version
Namd
2.9
(con
GPU
cuda)
PeakSeq is a program for identifying and ranking peak regions in
/share/apps/PeakSeq-1.1
ChIP-Seq experiments.
PeakSeq
1.1
Parallel-META is a software toolkit which can parallelly analyze massive
/share/apps/uuab/parallel-meta
metagenomic data for taxonomical and functional structure.
Parallel-Meta
2.3
A Bayesian software for phylogenetic reconstruction using mixture
/share/apps/PhyloBayesMPI
PhyloBayes
1.5a
models
Mpi
Tools (in Java) for working with next generation sequencing data in BAM
/share/apps/picard-tools-1.114
Picard-tools
1.114
file
Detect breakpoints of large deletions, medium sized insertions,
/share/apps/Pindel-0.2.4
Pindel
0.2.4
inversions, tandem duplications ..
The PITA executable allows you to identify and score microRNA targets
/share/apps/pita-6
Pita
6 on UTRs
Phylogenetic dataset construcion
/share/apps/Phlawd
microbial Tree of Life using 400 universal proteins
/share/apps/phylophlan
PHLAWD
3.3a
PhyloPhlAn
6
Chapter 1. Lista de los programas instalados
Tema
UbicaciÃ3 n
Phylogenetic and taxonomic analysis for genomes and metagenomes
/share/apps/PhyloSift
Programa
Version
PhyloSift
00.1
PhyML is a phylogeny software based on the maximum-likelihood
/opt/Bio/phyml
Phyml
20120412
principle.
Pplacer places query sequences on a fixed reference phylogenetic tree to
/share/apps/pplacer-1.1.alpha14
Pplacer
1.1.alpha14
maximize phylogenetic likelihood or posterior probability according to a
reference alignment.
Quake is a package to correct substitution sequencing errors in
experiments with deep coverage (e.g. >15X), specifically intended for
Quake
0.3.4
Illumina sequencing reads.
/share/apps/uuab/Quake-0.3.4/
QuickTree is an efficient implementation of the Neighbor-Joining
/share/apps/uuab/Quicktree-1.1/
Quicktree
1.1
algorithm, capable of reconstructing phylogenies from huge alignments
in time less than the age of the universe..
QIIME is an open-source bioinformatics pipeline for performing
...Programa en Python...
c
Quiime
1.9.1
microbiome analysis from raw DNA sequencing data.
Software environment for statistical computing and graphics.
/share/apps/R-3.0.3
R 3.0.3
Software environment for statistical computing and graphics.(Con glibc
/share/apps/R-3.0.3-gcc48
R 3.0.3
= 4.8 ; ver en programas especiales)
The software compute the probability that, for a given RNA sequence,
Randfold
2.0
the Minimum Free Energy (MFE) of the secondary structure is different
from a distribution of MFE computed with random sequences.
Parallel genome assemblies for parallel DNA sequencing.
/share/apps/randfold-2.0
/share/apps/Ray-2.3.1
Ray
2.3.1
7
Chapter 1. Lista de los programas instalados
Programa
Version
Tema
RDP provides quality-controlled, aligned and annotated Bacterial and
UbicaciÃ3 n
/share/apps/RDPTools/
RDPTools
11.4
Archaeal 16S rRNA sequences...
Recognising Errors in Assemblies using Paired Reads
/share/apps/uuab/Reapr_1.0.16
Reapr
1.0.16
Screens DNA sequences for interspersed repeats and low complexity
/share/apps/uuab/RepeatMasker
RepeatMasker
3.0.0
DNA sequences
A discriminative method for local ancestry inference (python 2.7)
RFMix
1.5.4
/share/apps/RFMix_v1.5.4
RNA-seq quality control package
/share/apps/uuab/RSeQC-2.3.2/
Flexible generic format for storing nucleotide sequence alignment
/share/apps/uuab/samtools-0.1.18
RSeQC
2.3.2
Samtools
1.0.18
Molecule modeling and simulation
/share/apps/Schrodinger
Schrodinger
2012
de novo assembler designed to assemble large genomes from high
/share/apps/uuab/SGA
SGA
0.9.34
coverage short read data.
SHORE, for Short Read, is a mapping and analysis pipeline for short
Shore
0.8.0
read data produced on the Illumina platform.
SIOMICS Extension is a Extension Version of SIOMICS, which adjust
/share/apps/uuab/shore-0.8.0
/share/apps/SIOMICS-1.4
the predicted motifs based on the motifs instances..
Siomics
1.4
8
Chapter 1. Lista de los programas instalados
Tema
UbicaciÃ3 n
Efficiently aligns DNA sequencing reads with a reference genome
/share/apps/uuab/Smalt-0.7.6
Programa
Version
Smalt
0.7.6
It’s a variant annotation and effect prediction tool. It annotates and
/share/apps/snpEff
SnpEff
3.1
predicts the effects of variants on genes (such as amino acid changes).
SNP-o-matic is a fast, stringent short-read mapping software.
/share/apps/uuab/snpomatic
snpomatic
Updated version of SOAP software for short oligonucleotide alignment
Soap2
2.21
novel short-read assembly method that can build a de novo draft
/share/apps/uuab/soap2.21
/share/apps/uuab/SOAPdenovo-V1.05/
SOAPdenovo
1.05
assembly for the human-sized genomes.
An empirically improved memory-efficient short-read de novo assembler
/share/apps/uuab/SOAPdenovo2-r223
SOAPdenovo2
r223
a biological sequence analysis tool for filtering, mapping and
/share/apps/SortMeRNA
OTU-picking NGS reads.
SortMeRNA
2.0
SPAdes Genome Assembler
/share/apps/uuab/spades-2.4.0
Spades
3.0.0
Sspace
SSPACE is a stand-alone program for scaffolding pre-assembled contigs
ba2.0
using paired-read data.
sic
TopHat is a fast splice junction mapper for RNA-Seq reads.
/share/apps/uuab/Sspace-basic-2.0
/share/apps/uuab/Tophat-2.0.12
TopHat
2.0.12
9
Chapter 1. Lista de los programas instalados
Tema
UbicaciÃ3 n
A flexible read trimming tool for Illumina NGS data
/share/apps/Trimmomatic-0.33/
Programa
Version
Trimmomatic
0.33
RNA-Seq De novo Assembly
/share/apps/trinityrnaseq-2.1.1/
Trinotate is an extension of the Trinity project.
/share/apps/Trinotate-2.0.2/
Trinity
2.1.1
Trinotate
2.0.2
USEARCH is a unique sequence analysis tool with thousands of users
world-wide.
Usearch
8.1.1756
a program package designed for working with VCF files, such as those
/share/apps/Usearch-8.1.1756/
/share/apps/vcftools_0.1.9
Vcftools
0.1.9
generated by the 1000 Genomes Project.
Sequence assembler for very short reads
/share/apps/uuab/velvet-1.2.09
RNA Secondary Structure Prediction and Comparison
/share/apps/ViennaRNA-2.0.7/
Whole-Genome Shotgun Assembler
/share/apps/uuab/wgs-7.0/
Velvet
1.2.09
ViennaRNA
2.0.7
WGS
7.0
10
Chapter 2. Uso de los programas
2.1. Module
Algunos programas ya cuentan con la facilidad de usarlos via el comando module. Este comando permite configurar
variables de ambientes para los programas, pero ademas permite desconfigurarlas de la misma manera...
En una session normal, podran ver los modules existente que estan usando:
[jerome@cluster ~]$ module list
Currently Loaded Modulefiles:
1) rocks-openmpi
Se puede averiguar los modulos disponibles:
[jerome@cluster ~]$ module avail
--------------------------------------------------------------------- /usr/share/Modules/modulefiles
compilers/gcc-4.8.1
dot
module-info
null
programs
compilers/python-2.7
module-cvs
modules
programs/R-3.0.3
programs
En caso de querrer usar MrBayes en su ultima version 3.2.2:
[jerome@compute-0-0 ~]$ mb --version
MrBayes v3.1.2
.../..
[jerome@compute-0-0 ~]$ module load programs/mrbayes-3.2.2
[jerome@compute-0-0 ~]$ mb --version
MrBayes v3.2.2 x64
.../..
[jerome@compute-0-0 ~]$ module unload programs/mrbayes-3.2.2
[jerome@compute-0-0 ~]$ mb --version
MrBayes v3.1.2
.../..
Por el momento no todos los programas incluyen la opcion de uso via el comando module. Pueden revisar los que
estan incluidos viendo la tabla completa de los programas disponibles, en la columna "Module".
!IMPORTANTE! Para usar un module en un script, deben de escribir esta linea antes de usarlo:
11
Chapter 2. Uso de los programas
#$ .... opciones de SGE
source $HOME/.bashrc
<---Añadir esta linea!!!!
module load el_module_que_quiero
.../...
El problema es debido a que no se genera una coneccion con ejecucion normal del Shell. Estamos revisando
como corregir este problema.
2.2. Uso de script file
Para poder usar los programas de bioinformatica, hay que definir por lo menos la ruta de acceso a ellos mismo. En
ese sentido se genero archivos que permiten facilitar la definicion de esas rutas.
Dependiendo de la ubicacion de programa que quiserian usar, es recomendable incluir en su script de jobs, al inicio,
las siguientes lineas:
source /share/apps/Profiles/share-profile.sh
source /share/apps/uuab/Profiles/uuab-profile.sh
Los programas escritos en Python (como PhyloPhlAn), dependen en buena medida de librerias dedicadas a la
bioinformatica que se instalaron previamente. Para poder usarlos de manera facil, se les recomienda definir las rutas
de acceso a esas librerias de esta manera:
source /share/apps/Profiles/python-modules.sh
2.3. Programas especiales
Para Abyss, usan el archivo profile de esa manera:
source /share/apps/uuab/Profiles/abyss-profile.sh
Eso les permitira tener las variables aecuadas para su uso.
Para usar JmodelTest2:
12
Chapter 2. Uso de los programas
export JMODELTEST_HOME=/share/apps/jmodeltest-2.1.2
java -jar $JMODELTEST_HOME/jModelTest.jar -d $JMODELTEST_HOME/example-data/aP6.fas -f -i -g 4 -s 11
Eso les permitira tener las variables aecuadas para su uso.
Para MrBayes v3.2.2, usan el archivo profile de esa manera:
source /share/apps/Profiles/mrbayes-3.2.2-profile.sh
Para Mopac, usan el archivo profile de esa manera:
source /share/apps/Profiles/mopac-profile.sh
Eso les permitira tener las variables aecuadas para su uso.
Para R v3.0.3, se debe de tener definido el uso de las librerias glibc v4.8. La manera mas simple es usando el script
(define tambien el PATTH para el R! :
source /share/apps/Profiles/gcc481-profile.sh
Eso les permitira tener las variables aecuadas para su uso.
Para usar Python y BioPython, usan el archivo profile de esa manera:
source /share/apps/Profiles/python-modules.sh
Eso les permitira tener las variables aecuadas para su uso.
Para Scipy (Python), usan el archivo profile de esa manera:
source /share/apps/Profiles/python-scipy.sh
Eso les permitira tener las variables aecuadas para su uso.
Unos programas son compilados con versiones más recientes de gcc (>v4.4), como por ejemplo MicroRazerS. El
tipo de error es generalmente de este indole:
/usr/lib64/libstdc++.so.6: version ‘GLIBCXX_3.4.9’ not found (required by .....)
13
Chapter 2. Uso de los programas
En este caso, se debe de usar las librerias glibc que se compilaron en el cluster a proposito. Para agilizar la
configuraciÃ3 n, se puede usar este comando antes de usar los programas:
source /share/apps/Profiles/gcc46-profile.sh
El programa iq-tree se compilo con versiones mas reciente de gcc, asi que antes de usarlo, es conveniente configurar
su ambiente de esta manera:
source /share/apps/Profiles/gcc481-profile.sh
14
Chapter 3. Copyrights
3.1. Copyrigths de los programas
Los programas que se presentan en ese documento son libres de derecho de pago.
The software contained in this distribution is released under the academic license agreement which requires to
acknowledge the use of the software that results in any published work.
15