A set of Perl scripts for querying and analyzing genomic data
The tools are included for processing are microarray data, next generation sequencing data, data file format conversion, querying datasets, and general high level analysis of datasets.
This tool box of programs relies on storing genome annotation, microarray, and next generation sequencing data in local bioperl databases, allowing for data retrieval relative to any annotated feature in the database.
While referencing genomic annotation and features from a database are convenient, they are not required. Simple Bed style input files are also supported for data collection.
Also included are programs for converting and importing data from UCSC gene tables and ensEMBL, as well as a variety of other formats, into a GFF3 file that can be loaded into a bioperl database.
Detailed instructions on how to install and use the biotoolbox utility on your Mac are available HERE.
In a hurry? Add it to your Download Basket!
What's New in This Release:
- Updated to include native support for USeq archive files with data collection scripts. USeq files may be used in the same manner as BigWig, BigBed, or Bam files for data collection. USeq files may be generated using tools from the USeq package (useq.sourceforge.net). The Bio::DB::USeq adaptor is available via CPAN.
- Added new script filter_bam.pl, which can filter alignments based on various criteria and write a new Bam file. Filters are one or more boolean tests, including attributes, scores, lengths, sequence, etc.
- Added new script get_bam_seq_stats.pl, which collects information about the read sequences themselves and summarizes the sequence composition and nucleotide frequencies, suitable for generating sequence logos.
- Updated script manipulate_datasets.pl to allow any integer to be used when formatting decimal values.