Programas de bioinformatica instalados 1.0 Edition Programas de bioinformatica instalados : 1.0 Edition Published Feb 04 2016 Copyright © 2016 IBt-UNAM Table of Contents Prologo ...........................................................................................................................................................................v 1. Lista de los programas instalados ...........................................................................................................................1 2. Uso de los programas..............................................................................................................................................11 2.1. Module .........................................................................................................................................................11 2.2. Uso de script file...........................................................................................................................................12 2.3. Programas especiales....................................................................................................................................12 3. Copyrights ...............................................................................................................................................................15 3.1. Copyrigths de los programas........................................................................................................................15 iii List of Tables 1-1. Lista programas .......................................................................................................................................................1 iv Prologo Ese documento presenta los programas de bioinformatica que se instalaron en el cluster, en la idea de proponer una ayuda a los que quisieran usarlos. v Chapter 1. Lista de los programas instalados La lista siguiente muestra los programas que se instalaron ademas de los por default en el cluster. Table 1-1. Lista programas Tema UbicaciÃ3 n Assembly By Short Sequences /share/apps/uuab/abyss-1.5.1 Art: a next-generation sequencing read simulator. /share/apps/uuab/Art-1.5.1 AutoDock is a suite of automated docking tools. /share/apps/Autodock-4.2 AutoDock Vina is an open-source program for doing molecular docking. /share/apps/Autodock_vina-1.1.2/ Programa Version Abyss 1.5.1 Art 1.5.1 Autodock 4.2 Autodock 1.1.2 Vina toolkit for reading, writing, and manipulating BAM files /share/apps/uuab/bamtools-2.3.0/ Bamtools 2.3.0 To demultiplex data and convert BCL files to FASTQ file formats bcl2fastq 2.17.1.14 Finding feature overlaps and computing coverage /share/apps/bcl2fastq2-2.17.1.14/ /share/apps/BEDTools-2.20.0/ BEDTools 2.22.0 BEAST is a cross-platform program for Bayesian MCMC analysis of /share/apps/BEASTv1.7.5/ BEAST 1.7.5 molecular sequences. 1 Chapter 1. Lista de los programas instalados Tema UbicaciÃ3 n The NCBI Basic Local Alignment Search Tool (BLAST) finds regions of /share/apps/ncbi-blast-2.2.29/ Programa Version Blast+ 2.2.29 local similarity between sequences. ultrafast and memory-efficient tool for aligning sequencing reads to long /share/apps/Bowtie1-1.1.2/ Bowtie1 1.1.2 reference sequences. ultrafast and memory-efficient tool for aligning sequencing reads to long /share/apps/uuab/Bowtie2-2.2.4/ Bowtie2 2.2.4 reference sequences. Burrows-Wheeler Aligner /share/apps/uuab/Bwa-0.7.12/ Illuma toolkit /share/apps/CASAVA-1.8.2/ Cluster Database at High Identity with Tolerance. /share/apps/Cd-hit-4.6.4/ Circos is a software package for visualizing data and information. /share/apps/Circos-0.69/ Bwa 0.7.12 Casava 1.8.2 Cd-hit 4.6.4 Circos 0.69 CleaveLand 3 is desgined to process degradome data and small RNA /share/apps/uuab/Cleaveland-3.0.1/ Cleaveland 3.0.1 target predictions and to output concise results depicted sliced small RNA target RNAs. Clustal Omega is the latest addition to the Clustal family. It offers a Clustal 1.1.0 significant increase in scalability over previous versions, allowing Omega hundreds of thousands of sequences to be aligned in only a few hours. /share/apps/Clustal_Omega-1.1.0/ Cortex is an efficient and low-memory software framework for analysis Cortex 1.0.5.15 of genomes using sequence data. /share/apps/uuab/cortexassembler-1.0.5.15 2 Chapter 1. Lista de los programas instalados Tema UbicaciÃ3 n Fast and parallel gapped read alignment to large genomes /share/apps/uuab/Cushaw2-2.4.3 Programa Version Cushaw2 2.4.3 Dfilter is a generalized signal detection tool for analyzing next-gen /share/apps/uuab/Dfilter Dfilter N/C massively-parallel sequencing data by using ROC-AUC maximizing linear filter. A Quality Control application for FastQ files /share/apps/uuab/FastQC A short-reads pre-processing tools /share/apps/FastX-Toolkit Fast Length Adjustment of Short reads /share/apps/Flash Bayesian haplotype-based polymorphism discovery and genotyping /share/apps/Freebayes FastQC 0.11.2 FastX-Toolkit 0.7 Flash 1.2.11 FreeBayes 0.9.20 FragGeneScan is an application for finding (fragmented) genes in short /share/apps/uuab/FragGeneScan-1.17 reads. FreeGeneScan 1.18 FusionMap is an efficient fusion aligner which aligns reads spanning /share/apps/FusionMap FusionMap 12/08/2012 fusion junctions directly to the genome without prior knowledge of potential fusion regions. software package developed at the Broad Institute to analyse /share/apps/uuab/GenomeAnalysisTKLite-2.2-5 GenomeAnalysisToolkit 2.2.5 next-generation resequencing data (GATK) Short read mapping tool designed for accurate read alignments /share/apps/uuab/genomemapper-0.4.3 Genomemapper 0.4.3 3 Chapter 1. Lista de los programas instalados Programa Version Tema HISAT2 is a fast and sensitive alignment program for mapping UbicaciÃ3 n /share/apps/Hisat2-2.0.1 Hisat2 2.0.1 next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome). HMMER is used for searching sequence databases for homologs of /share/apps/Hmmer-3.1b1 Hmmer 3.1b1 protein sequences, and for making protein sequence alignments. Iterative Correction of Reference Nucleotides /share/apps/uuab/iCORN-0.97 iCORN 0.97 Efficient phylogenetic tree reconstruction and ultrafast bootstrap approximation iq-tree 0.9.5 Fast, Parallel k-mer Counting for DNA /share/apps/iq-tree-0.9.5/ /share/apps/jellyfish-1.1.11/ Jellyfish 1.1.11 Tool to carry out statistical selection of best-fit models of nucleotide /share/apps/jmodeltest-2.1.2/ Jmodeltest2 2.1.2 substitution Multiple alignment program for amino acid or nucleotide sequences /share/apps/Mafft-7.058/ Mafft 7.058 Motif-based sequence analysis tools /share/apps/Meme-4.9.1 Meme 4.10.0 Meta-IDBA is an iterative De Bruijn Graph De Novo short read Meta-IDBA 1.1.1 assembler specially designed for de novo metagenomic assembly. Short read assember for metagenomics /share/apps/Meta-IDBA-1.1.1 /share/apps/MetaVelvet-1.2.01 MetaVelvet 1.2.01 4 Chapter 1. Lista de los programas instalados Tema UbicaciÃ3 n MicroRazerS - Rapid Alignment of Small RNA Reads /share/apps/uuab/MicroRazerS Programa Version MicroRazerS 1.0 MIP Scaffolder is a program for scaffolding contigs produced by /share/apps/uuab/mip-scaffolder-0.5/ MIP 0.5 fragment assemblers using mate pair data such as those generated by ABI Scaffolder SOLiD or Illumina Genome Analyzer. MIRA is the swiss army knife of sequence assembly that I’ve used and /share/apps/uuab/mira-4.0rc1/ developed during the past 10 years to get assembly jobs I work on done MIRA 4.0rc1 efficiently - and especially accurately. Discovering known and novel miRNAs from deep sequencing data. /share/apps/mirdeep2/ mirDeep2 0.0.5 MOPAC (Molecular Orbital PACkage) is a semiempirical quantum /share/apps/Mopac chemistry program based on Dewar and Thiel’s NDDO approximation. Mopac 2012 This project seeks to develop a single piece of open-source, expandable /share/apps/Mothur-1.36.1/ Mothur 1.36.1 software to fill the bioinformatics needs of the microbial ecology community. Bayesian Inference of Phylogeny /share/apps/MrBayes-3.2.5 MrBayes 3.2.5 Multiple sequence alignment (Faster and more accurate than /opt/Bio/muscle/ CLUSTALW) Muscle 3.8.31 Scalable molecular dynamics /share/apps/NAMD-2-9 Namd 2.9 5 Chapter 1. Lista de los programas instalados Tema UbicaciÃ3 n Scalable molecular dynamics /share/apps/NAMD-2-9-CUDA5.5 Programa Version Namd 2.9 (con GPU cuda) PeakSeq is a program for identifying and ranking peak regions in /share/apps/PeakSeq-1.1 ChIP-Seq experiments. PeakSeq 1.1 Parallel-META is a software toolkit which can parallelly analyze massive /share/apps/uuab/parallel-meta metagenomic data for taxonomical and functional structure. Parallel-Meta 2.3 A Bayesian software for phylogenetic reconstruction using mixture /share/apps/PhyloBayesMPI PhyloBayes 1.5a models Mpi Tools (in Java) for working with next generation sequencing data in BAM /share/apps/picard-tools-1.114 Picard-tools 1.114 file Detect breakpoints of large deletions, medium sized insertions, /share/apps/Pindel-0.2.4 Pindel 0.2.4 inversions, tandem duplications .. The PITA executable allows you to identify and score microRNA targets /share/apps/pita-6 Pita 6 on UTRs Phylogenetic dataset construcion /share/apps/Phlawd microbial Tree of Life using 400 universal proteins /share/apps/phylophlan PHLAWD 3.3a PhyloPhlAn 6 Chapter 1. Lista de los programas instalados Tema UbicaciÃ3 n Phylogenetic and taxonomic analysis for genomes and metagenomes /share/apps/PhyloSift Programa Version PhyloSift 00.1 PhyML is a phylogeny software based on the maximum-likelihood /opt/Bio/phyml Phyml 20120412 principle. Pplacer places query sequences on a fixed reference phylogenetic tree to /share/apps/pplacer-1.1.alpha14 Pplacer 1.1.alpha14 maximize phylogenetic likelihood or posterior probability according to a reference alignment. Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Quake 0.3.4 Illumina sequencing reads. /share/apps/uuab/Quake-0.3.4/ QuickTree is an efficient implementation of the Neighbor-Joining /share/apps/uuab/Quicktree-1.1/ Quicktree 1.1 algorithm, capable of reconstructing phylogenies from huge alignments in time less than the age of the universe.. QIIME is an open-source bioinformatics pipeline for performing ...Programa en Python... c Quiime 1.9.1 microbiome analysis from raw DNA sequencing data. Software environment for statistical computing and graphics. /share/apps/R-3.0.3 R 3.0.3 Software environment for statistical computing and graphics.(Con glibc /share/apps/R-3.0.3-gcc48 R 3.0.3 = 4.8 ; ver en programas especiales) The software compute the probability that, for a given RNA sequence, Randfold 2.0 the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences. Parallel genome assemblies for parallel DNA sequencing. /share/apps/randfold-2.0 /share/apps/Ray-2.3.1 Ray 2.3.1 7 Chapter 1. Lista de los programas instalados Programa Version Tema RDP provides quality-controlled, aligned and annotated Bacterial and UbicaciÃ3 n /share/apps/RDPTools/ RDPTools 11.4 Archaeal 16S rRNA sequences... Recognising Errors in Assemblies using Paired Reads /share/apps/uuab/Reapr_1.0.16 Reapr 1.0.16 Screens DNA sequences for interspersed repeats and low complexity /share/apps/uuab/RepeatMasker RepeatMasker 3.0.0 DNA sequences A discriminative method for local ancestry inference (python 2.7) RFMix 1.5.4 /share/apps/RFMix_v1.5.4 RNA-seq quality control package /share/apps/uuab/RSeQC-2.3.2/ Flexible generic format for storing nucleotide sequence alignment /share/apps/uuab/samtools-0.1.18 RSeQC 2.3.2 Samtools 1.0.18 Molecule modeling and simulation /share/apps/Schrodinger Schrodinger 2012 de novo assembler designed to assemble large genomes from high /share/apps/uuab/SGA SGA 0.9.34 coverage short read data. SHORE, for Short Read, is a mapping and analysis pipeline for short Shore 0.8.0 read data produced on the Illumina platform. SIOMICS Extension is a Extension Version of SIOMICS, which adjust /share/apps/uuab/shore-0.8.0 /share/apps/SIOMICS-1.4 the predicted motifs based on the motifs instances.. Siomics 1.4 8 Chapter 1. Lista de los programas instalados Tema UbicaciÃ3 n Efficiently aligns DNA sequencing reads with a reference genome /share/apps/uuab/Smalt-0.7.6 Programa Version Smalt 0.7.6 It’s a variant annotation and effect prediction tool. It annotates and /share/apps/snpEff SnpEff 3.1 predicts the effects of variants on genes (such as amino acid changes). SNP-o-matic is a fast, stringent short-read mapping software. /share/apps/uuab/snpomatic snpomatic Updated version of SOAP software for short oligonucleotide alignment Soap2 2.21 novel short-read assembly method that can build a de novo draft /share/apps/uuab/soap2.21 /share/apps/uuab/SOAPdenovo-V1.05/ SOAPdenovo 1.05 assembly for the human-sized genomes. An empirically improved memory-efficient short-read de novo assembler /share/apps/uuab/SOAPdenovo2-r223 SOAPdenovo2 r223 a biological sequence analysis tool for filtering, mapping and /share/apps/SortMeRNA OTU-picking NGS reads. SortMeRNA 2.0 SPAdes Genome Assembler /share/apps/uuab/spades-2.4.0 Spades 3.0.0 Sspace SSPACE is a stand-alone program for scaffolding pre-assembled contigs ba2.0 using paired-read data. sic TopHat is a fast splice junction mapper for RNA-Seq reads. /share/apps/uuab/Sspace-basic-2.0 /share/apps/uuab/Tophat-2.0.12 TopHat 2.0.12 9 Chapter 1. Lista de los programas instalados Tema UbicaciÃ3 n A flexible read trimming tool for Illumina NGS data /share/apps/Trimmomatic-0.33/ Programa Version Trimmomatic 0.33 RNA-Seq De novo Assembly /share/apps/trinityrnaseq-2.1.1/ Trinotate is an extension of the Trinity project. /share/apps/Trinotate-2.0.2/ Trinity 2.1.1 Trinotate 2.0.2 USEARCH is a unique sequence analysis tool with thousands of users world-wide. Usearch 8.1.1756 a program package designed for working with VCF files, such as those /share/apps/Usearch-8.1.1756/ /share/apps/vcftools_0.1.9 Vcftools 0.1.9 generated by the 1000 Genomes Project. Sequence assembler for very short reads /share/apps/uuab/velvet-1.2.09 RNA Secondary Structure Prediction and Comparison /share/apps/ViennaRNA-2.0.7/ Whole-Genome Shotgun Assembler /share/apps/uuab/wgs-7.0/ Velvet 1.2.09 ViennaRNA 2.0.7 WGS 7.0 10 Chapter 2. Uso de los programas 2.1. Module Algunos programas ya cuentan con la facilidad de usarlos via el comando module. Este comando permite configurar variables de ambientes para los programas, pero ademas permite desconfigurarlas de la misma manera... En una session normal, podran ver los modules existente que estan usando: [jerome@cluster ~]$ module list Currently Loaded Modulefiles: 1) rocks-openmpi Se puede averiguar los modulos disponibles: [jerome@cluster ~]$ module avail --------------------------------------------------------------------- /usr/share/Modules/modulefiles compilers/gcc-4.8.1 dot module-info null programs compilers/python-2.7 module-cvs modules programs/R-3.0.3 programs En caso de querrer usar MrBayes en su ultima version 3.2.2: [jerome@compute-0-0 ~]$ mb --version MrBayes v3.1.2 .../.. [jerome@compute-0-0 ~]$ module load programs/mrbayes-3.2.2 [jerome@compute-0-0 ~]$ mb --version MrBayes v3.2.2 x64 .../.. [jerome@compute-0-0 ~]$ module unload programs/mrbayes-3.2.2 [jerome@compute-0-0 ~]$ mb --version MrBayes v3.1.2 .../.. Por el momento no todos los programas incluyen la opcion de uso via el comando module. Pueden revisar los que estan incluidos viendo la tabla completa de los programas disponibles, en la columna "Module". !IMPORTANTE! Para usar un module en un script, deben de escribir esta linea antes de usarlo: 11 Chapter 2. Uso de los programas #$ .... opciones de SGE source $HOME/.bashrc <---Añadir esta linea!!!! module load el_module_que_quiero .../... El problema es debido a que no se genera una coneccion con ejecucion normal del Shell. Estamos revisando como corregir este problema. 2.2. Uso de script file Para poder usar los programas de bioinformatica, hay que definir por lo menos la ruta de acceso a ellos mismo. En ese sentido se genero archivos que permiten facilitar la definicion de esas rutas. Dependiendo de la ubicacion de programa que quiserian usar, es recomendable incluir en su script de jobs, al inicio, las siguientes lineas: source /share/apps/Profiles/share-profile.sh source /share/apps/uuab/Profiles/uuab-profile.sh Los programas escritos en Python (como PhyloPhlAn), dependen en buena medida de librerias dedicadas a la bioinformatica que se instalaron previamente. Para poder usarlos de manera facil, se les recomienda definir las rutas de acceso a esas librerias de esta manera: source /share/apps/Profiles/python-modules.sh 2.3. Programas especiales Para Abyss, usan el archivo profile de esa manera: source /share/apps/uuab/Profiles/abyss-profile.sh Eso les permitira tener las variables aecuadas para su uso. Para usar JmodelTest2: 12 Chapter 2. Uso de los programas export JMODELTEST_HOME=/share/apps/jmodeltest-2.1.2 java -jar $JMODELTEST_HOME/jModelTest.jar -d $JMODELTEST_HOME/example-data/aP6.fas -f -i -g 4 -s 11 Eso les permitira tener las variables aecuadas para su uso. Para MrBayes v3.2.2, usan el archivo profile de esa manera: source /share/apps/Profiles/mrbayes-3.2.2-profile.sh Para Mopac, usan el archivo profile de esa manera: source /share/apps/Profiles/mopac-profile.sh Eso les permitira tener las variables aecuadas para su uso. Para R v3.0.3, se debe de tener definido el uso de las librerias glibc v4.8. La manera mas simple es usando el script (define tambien el PATTH para el R! : source /share/apps/Profiles/gcc481-profile.sh Eso les permitira tener las variables aecuadas para su uso. Para usar Python y BioPython, usan el archivo profile de esa manera: source /share/apps/Profiles/python-modules.sh Eso les permitira tener las variables aecuadas para su uso. Para Scipy (Python), usan el archivo profile de esa manera: source /share/apps/Profiles/python-scipy.sh Eso les permitira tener las variables aecuadas para su uso. Unos programas son compilados con versiones más recientes de gcc (>v4.4), como por ejemplo MicroRazerS. El tipo de error es generalmente de este indole: /usr/lib64/libstdc++.so.6: version ‘GLIBCXX_3.4.9’ not found (required by .....) 13 Chapter 2. Uso de los programas En este caso, se debe de usar las librerias glibc que se compilaron en el cluster a proposito. Para agilizar la configuraciÃ3 n, se puede usar este comando antes de usar los programas: source /share/apps/Profiles/gcc46-profile.sh El programa iq-tree se compilo con versiones mas reciente de gcc, asi que antes de usarlo, es conveniente configurar su ambiente de esta manera: source /share/apps/Profiles/gcc481-profile.sh 14 Chapter 3. Copyrights 3.1. Copyrigths de los programas Los programas que se presentan en ese documento son libres de derecho de pago. The software contained in this distribution is released under the academic license agreement which requires to acknowledge the use of the software that results in any published work. 15
© Copyright 2025