Biologically-relevant data comes in a variety of forms: next generation sequencing (NGS), gene expression, mass spectrometry, sequence data and more. As a result, tools which process these types of data require deep technical knowledge of both data science and biology to derive value.

The BIOVIA Pipeline Pilot Biology Collections provide the technical foundation to allow scientists and programmers to rapidly develop solutions to analyze biological data.

Craft actionable, scientifically relevant insights.

Next Generation Sequencing Collection

Life science research organizations are applying large-scale genomic sequencing data to explore areas such as personalized medicine, agricultural research and biofuel development. The Next Generation Sequencing (NGS) Collection offers a comprehensive assortment of NGS data analysis pipelines ready to analyze data with unparalleled power and flexibility.

  • Generate a wide range of NGS analyses
    • De novo assembly, mapping to reference, SNP and structural variant detection, RNA-Seq, CNV-Seq, ChIP-Seq, Methyl-Seq and large scale genome comparisons
  • Integrate industry-standard algorithms
    • BWA-MEM, Bowtie 2, GATK, BreakDancer, TopHat, Cufflinks, SAMtools, Velvet
  • Simplify the use of NGS data files
    • Reference sequences, alignments and feature annotations
  • Streamline analysis, interpretation and reporting
    • GBrowse, IGV, Tablet and Circos, plus interactive graphs, tables and charts
Sequence Analysis

BIOVIA Pipeline Pilot Sequence Analysis offers essential bioinformatics capabilities and algorithms for creating practical sequence analysis workflows. With 180+ different component functions, analyze and annotate DNA and protein sequences using a variety of industry standard methods or build your own.

  • Generate sequence alignments
    • Align multiple sequences with ClustalW and build hidden Markov models (HMMs)
  • Simplify pattern matching
    • Identify PROSITE regions, GC rich regions, proteolytic cleavage sites, restriction enzyme sites, signal peptide cleavage sites, open reading frames or regular expression patterns
  • Perform similarity searching
  • Annotate and manipulate sequences
    • For DNA: primer identification, GC content, six-frame translations, reverse complement and siRNA target site prediction
    • For Proteins:  back translation, secondary structure prediction and isoelectric point
  • Integrate 3rd party tools and databases
    • Run BioPerl, NCBI BLAST, GCG programs, EMBOSS tools, BioJava, Entrez and EB-eye queries
Gene Expression and Mass Spectrometry for Proteomics

–Omics –based analyses require large volumes and a wide range of interdisciplinary data types. The Gene Expression and Mass Spectrometry Collections offer a comprehensive tools to create and automate customized –omics workflows.

  • Increase accessibility
    • Use BioConductor tools without scripting or R packages
  • Access 3rd party data sources
    • Download and analyze GEO data sets
  • Extract peptides & mapped proteins
    • X!Tandem
  • Extract, identify and align feature peaks
    • XCMS
  • Analyze tagged samples
    • Calculate ASAPRatio of protein abundance
  • Support a variety of formats
    • Read in with .RAW*, .wiff, SEQUEST DTA, ANDI (netCDF), Mascot MGF or mzXML files
  • Visualize data
    • Utilize Interactive chromatograms, 2D mass spec run charts, scan charts, feature peak charts, retention time drift charts, fragmentograms, peptide and protein viewers with drill down and heat maps