PyCogent Changelog

What's new in PyCogent 1.5.3

Sep 15, 2012
  • New Features:
  • Added a withoutLostSpans() method to Feature objects in
  • cogent.core.annotation. Useful after projecting features from one aligned
  • sequence across to another. Implemented for ordinary Features and
  • SimpleVariables but not xxy_list Variables.
  • Changes:
  • Make Span.remapWith() a little clearer in cogent.core.location.
  • Tidy annotation remapping code in cogent.core.annotation and
  • cogent.core.sequence so that new positions are only calculated once when
  • slicing, projecting, or otherwise remapping parts of sequences. The old code
  • was needlessly doing it twice.
  • Bug Fixes:
  • Fixed bug in BLAT application controller (cogent.app.blat) which would drop
  • some input sequences when running assign_dna_reads_to_protein_database.
  • Prevent negative widths from arising in cogent.draw.compatibility when
  • alignment is too wide.

New in PyCogent 1.5.2 (Sep 8, 2012)

  • New Features:
  • Added new mantel_test function to cogent.maths.stats.test that allows the
  • type of significance test to be specified. This function is meant to replace
  • the pre-existing mantel function.
  • Added new correlation_test function to cogent.maths.stats.test that computes
  • the correlation (pearson or spearman) between two vectors, in addition to
  • parametric and nonparametric tests of significance and confidence intervals.
  • This function gives more control and information than the pre-existing
  • correlation function. The spearman function is also a new addition.
  • Added new mc_t_two_sample function to cogent.maths.stats.test that performs a
  • two-sample t-test and uses Monte Carlo permutations to determine
  • nonparametric significance (similar to R's Deducer::perm.t.test).
  • Added guppy 1.1, pplacer 1.1, ParsInsert 1.04, usearch 5.2.32, rtax 0.981,
  • raxml 7.3.0, BLAT 34, and BWA 0.6.2 application controllers.
  • Added new functions to cogent.maths.stats.rarefaction that provide
  • alternative ways to perform rarefaction subsampling.
  • Added convenience wrappers assign_dna_reads_to_database,
  • assign_dna_reads_to_protein_database, and assign_dna_reads_to_dna_database
  • for BLAT, BWA, and usearch with consistent interface across all three.
  • Changes:
  • Minimum matplotlib version now set to 1.1.0.
  • Minimum Vienna package version now set to 1.8.5.
  • The pearson function in cogent.maths.stats.test has more robust
  • error-checking.
  • The mantel and mantel_test functions in cogent.maths.stats.test now check for
  • symmetric, hollow distance matrices as input by default, with an option to
  • disable these checks.
  • cogent.draw.distribution_plots now uses matplotlib proxy Artists for legend
  • creation (this simplifies the code a bit). Added ability to set the size of
  • plot figures through two new optional parameters to generate_box_plots and
  • generate_comparative_plots. More robust checks have been put in place in
  • case making room for labels fails (this now uses matplotlib 1.1.0's new
  • tight_layout() functionality, but this can still fail in some cases).
  • cogent.app.raxml (version 7.0.3) is now deprecated and will be removed in
  • 1.6.0. Please use cogent.app.raxml_v730 instead (version 7.3.0).
  • cogent.app.muscle (version 3.6) is now deprecated and will be removed in
  • 1.6.0. Please use cogent.app.muscle_v38 instead (version 3.8).
  • Updated cogent.app.uclust to handle --stepwords and --w.
  • Bug Fixes:
  • improve handling of reading frames from Ensembl
  • actually included the test_ensembl/test_metazoa.py file that was
  • accidentally overlooked.
  • fixed small diff in postcript output from RNAfold
  • Deprecation and discontinued warnings are now not ignored by default.
  • cogent.util.warning was ignored in Python 2.7 because it uses
  • DeprecationWarnings. These warnings are temporarily forced to not be ignored.
  • Included test_app/test_formatdb.py and test_app/test_mothur.py files in
  • alltests.py.
  • Fixed test_safe_md5 in tests.test_util.test_misc to no longer run an MD5 over
  • a file in PyCogent (this caused the test to break when a new release went out
  • because the MD5 changes due to the new version string). The test now writes a
  • temporary file populated with fixed data and computes the MD5 from that.
  • Fixed data_file_links.html in the PyCogent documentation to correctly point
  • to several data files that were previously unreachable.

New in PyCogent 1.5.1 (Jun 7, 2011)

  • New Features:
  • Alignments can now add sequences that are pairwise aligned to a sequence already present in the alignment.
  • Alignment.addSeqs has more flexibility with the specific order of sequences now controllable by the user. Thanks to Jan Kosinski for these two very useful patches!
  • Increased options for reading Table data from files: limit keyword; and, line based (as distinct from column-based) type-casting of delimited files.
  • Flexible parser for raw Greengenes 16S records
  • Add fast pairwise distance estimation for DNA/RNA sequences. Currently only Jukes-Cantor 1969 and Tamura-Nei 1993 distances are provided. A cookbook entry was added to building_phylogenies.
  • Added a PredefinedNucleotide substitution model class. This class uses Cython implementations for nucleotide models where analytical solutions are available. Substantial speedups are achieved. These implementations do not support obtaining the rate matrix. Use the older style implementation if you require that (toggled by the rate_matrix_required argument).
  • Added fit_function function. This allows to fit any model to a x and y dataset using simplex to reduce the error between the model and the data.
  • Added parsers for bowtie and for BLAT’s PSL format
  • Table can now read/write gzipped files.
  • GeneticCode class has a getStopIndices method. Returns the index positions of stop codons in a sequence for a user specified translation frame.
  • Added LogDet metric to cogent.evolve.pairwise_distance. With able assistance from Yicheng Zhu. Thanks Yicheng!
  • Added jackknife code to cogent.maths.stats.jackknife. This can be used to measure confidence of an estimate from a single vector or a matrix. Thanks to Anuj Pahwa for help implementing this!
  • Added abundance-based Jaccard beta diversity index (Chao et. al., 2005)
  • Changes:
  • python 2.6 is now the minimum required version
  • We have removed code authored by Ziheng Yang as it is not available under an open source license. We note a modest performance hit for nucleotide and dinucleotide models. Codon models are not affected. The PredefinedNucleotide models recently added are faster than the older approach that used Yang’s code.
  • The PredefinedNucleotide models are now available via cogent.evolve.models. The old-style (slower) nucleotide models can be obtained by setting rate_matrix_required=True.
  • RichGenbankParser can now return WGS blocks
  • Bug Fixes:
  • fixed bug that crept into doing consensus trees from tree collections. Thanks to Klara Verbyla for catching this one!
  • fixed a bug (#3170464) affecting obtaining sequences from non-chromosome level coordinate systems. Thanks to brandoninvergo for reporting and Hua Ying for the patch!
  • fixed a bug (#2987278) associated with missing unit tests for gbseq.py
  • fixed a bug (#2987264) associated with missing unit tests for paml_matrix.py
  • fixed a bug (#2987238) associated with missing unit tests for tinyseq.py

New in PyCogent 1.5 (Nov 9, 2010)

  • New Features:
  • major additions to Cookbook. Thanks to the many contributors (too many to
  • list here)!
  • added AlleleFreqs attribute to ensembl Variation objects.
  • added getGeneByStableId method to genome objects.
  • added Introns attribute to Transcript objects and an Intron class. (Thanks
  • to Hua Ying for this patch!)
  • added Mann-Whitney test and a Monte-Carlo version
  • exploratory and confirmatory period estimation techniques (suitable for
  • symbolic and continuous data)
  • Information theoretic measures (AIC and BIC) added
  • drawing of trees with collapsed nodes
  • progress display indicator support for terminal and GUI apps
  • added parser for illumina HiSeq2000 and GAiix sequence files as
  • cogent.parse.illumina_sequence.MinimalIlluminaSequenceParser.
  • added parser to FASTQ files, one of the output options for illumina's
  • workflow, also added cookbook demo.
  • added functionality for parsing of SFF files without the Roche tools in
  • cogent.parse.binary_sff
  • Changes:
  • thousand fold performance improvement to nmds
  • >10-fold performance improvements to some Table operations
  • Bug Fixes:
  • Fixed a Bug in cogent.core.alphabet that resulted in 4 tests err'ing out
  • when using NumPy v1.4.1
  • Sourceforge bugs 2987289, 2987277, 2987378, 2987272, 2987269 were addressed
  • and fixed

New in PyCogent 1.4.1 (Apr 15, 2010)

  • New Features
  • Simplified getting genetic variation from Ensembl and provide the protein
  • location of nonsynonymous variants.
  • rate heterogeneity variants of pre-canned continuous time substitution
  • models easier to define.
  • Added implementation of generalised neighbour joining.
  • New capabilities for examining genetic variants using Ensembl.
  • Phylogenetic methods that can return collections of trees do so as a
  • TreeCollections object, which has writeToFile and getConsensusTree methods.
  • Added uclust application controller which currently supports uclust v1.1.579.
  • Changes
  • Major additions to Cookbook documentation courtesy of Tom Elliot. Thanks
  • Tom!
  • Improvements to parallelisation.

New in PyCogent 1.4 (Apr 5, 2010)

  • added support for manipulating and handling macromolecular structures. This includes a PDB file format parser, a hierarchical data structure to represent molecules. Various utilities to manipulate e.g. clean-up molecules, efficient surface area and proximity-contact calculation via cython. Expansion into unit-cells and crystal lattices is also possible.
  • added a KD-Tree class for fast nearest neighbor look-ups. Supports k-nearest neighbours and all neighbours within radius queries currently only in 3D.
  • added new tools for evaluating clustering stresses, goodness_of_fit. In cogent.cluster .
  • added a new clustering tool, procrustes analysis. In cogent.cluster .
  • phylo.distance.EstimateDistances class has a new argument, modify_lf. This allows the use the modify the likelihood function, possibly constraining parameters, freeing them, setting bounds, pre-optimising etc..
  • cogent Table mods; added transposed and normalized methods while the summed method can now return column or row sums.
  • Added a new context dependent model class. The conditional nucleotide frequency weighted class has been demonstrated to be superior to the model forms of Goldman and Yang (1994) and Muse and Gaut (1994). The publication supporting this claim is In press at Mol Biol Evol, authored by Yap, Lindsay, Easteal and Huttley.
  • added new argument to LoadTable to facilitate speedier loading of large files. The static_column_types argument auto-generates a separator format parser with column conversion for numeric columns.
  • added BLAST XML parser + tests.
  • Add compatibility matrix module for determining reticulate evolution.
  • Added ‘start’ and ‘order’ options to the WLS and ML tree finding method .trex() These allow the search of tree-space to be constrained to start at a particular subtree and to proceed in a specified order of leaf additions.
  • Consensus tree of weighted trees from phylo.maximum_likelihood.
  • Add Alignment.withGapsFrom() aka mirrorGaps, mirrors the gaps into a provided alignment.
  • added ANOVA to maths.stats.test
  • LoadTable gets an optional argument (static_column_types) to simplify speedy loading of big files.
  • Changes
  • Python 2.4 is no longer supported.
  • NumPy 1.3 is now the minimum supported NumPy version.
  • zlib is now a dependency.
  • cogent.format.table.asReportlabTable is being discontinued in version 1.5. This is the last dependency on reportlab and removal will simplify installation.
  • the conditional nucleotide model (Yap et al 2009) will be made the default model form for context dependent models in version 1.5.
  • Change required MPI library from PyxMPI to mpi4py.
  • Move all of the cogent.draw.matplotlib.* modules up to cogent.draw.*
  • Substitute matplotlib for reportlab throughout cogent.draw
  • cogent.db.ensembl code updated to work with the latest Ensembl release (56)
  • motif prob pseudocount option, used for initial values of optimisable mprobs
  • The mlagan application controller has been removed.

New in PyCogent 1.3.1 (Nov 18, 2009)

  • PyPi install now works, see work around on bug 2807542, 2807539 Cogent 1.2 - 1.3 New Features
  • Python2.6 is now supported
  • added cogent.cluster.nmds, code to perform nonmetric multidimensional scaling. Not as fast as others (e.g.: R, MASS package, isoMDS)
  • Documentation ported to using the Sphinx documentation generator.
  • Major additions to documentation in doc/examples.
  • Added partial support for querying the Ensembl MySQL databases. This capacity has additional dependencies (MySQL-python and SQLAlchemy). This module should be considered alpha level code, (although it has worked reliably for some time in the hands of the developers).
  • Introduced a new substitution model family. This family has the same form as that originally described by Muse and Gaut.

New in PyCogent 1.3 (Jun 17, 2009)

  • New Features:
  • Python2.6 is now supported
  • added cogent.cluster.nmds, code to perform nonmetric multidimensional scaling. Not as fast as others (e.g.: R, MASS package, isoMDS)
  • Documentation ported to using the Sphinx documentation generator.
  • Major additions to documentation in doc/examples.
  • Added partial support for querying the Ensembl MySQL databases. This capacity has additional dependencies (MySQL-python and SQLAlchemy). This module should be considered alpha level code, (although it has worked reliably for some time in the hands of the developers).
  • Introduced a new substitution model family. This family has the same form as that originally described by Muse and Gaut (Mol Biol Evol, 11, 715-24). These models were applied in the article by Lindsay et al. (2008, Biol Direct, 3, 52). Model state defaults to the tuple weighted matrix (eg. the Goldman and Yang codon models). Selecting the nucleotide weighted matrix is done using the mprob_model argument.
  • Likelihood functions now have a getStatistics method. This returns cogent Table objects. Optional arguments are with_motif_probs and with_titles where the latter refers to the Table.Title attribute being set.
  • Added rna_struct formatter and rna_plot parser
  • A fast unifrac method implementation.
  • Added new methods on tree related objects: TreeNode.getNodesDict; TreeNode.reassignNames; PhyloNode.tipsWithinDistance; PhyloNode.totalDescendingBranchLength
  • Adopted Sphinx documentation system, added many new use cases, improved existing ones.
  • added setTablesFormat to likelihood function. This allows setting the spacing, display precision of the stats tables resulting from printing a likelihood function.
  • Added non-parametric multidimensional scaling (nmds) method.
  • Added a seperate app controller for FastTree v1.0
  • new protein MolType, PROTEIN_WITH_STOP, that supports the stop codon new sequence objects, ProteinWithStopSequence and ModelProteinWithStopSequence to support the new MolType.
  • Support for Cython added.
  • Changes:
  • reconstructAncestralSequences has been deprecated to reconstructAncestralSeqs. It will be removed in version 1.4.
  • updated parsers
  • TreeNode.getNewick is now iterative. For recursive, use TreeNode.getNewickRecursive. Both the iterative and recursive methods of getNewick now support the keyword 'escape_name'. DndParser now supports the keyword 'unescape_name'. DndParser unescape_name will now try to remove underscores (like underscore_unmunge).
  • Generalized MinimalRnaalifoldParser to parse structures and energies from RNAfold as well.
  • PhyloNode.tipToTipDistances can now work on a subset of tips by passing either a list of tip names or a list of tip nodes using the endpoints param.
  • deprecating reconstructAncestralSequences to deprecating reconstructAncestralSeqs.
  • updated app controller parameters for FastTree v1.1
  • Allow and require a more recent version of Pyrex.
  • LoadTree is now a method of cogent.__init__
  • .. warning:: Pyrex is no longer the accepted way to develop extensions. Use `Cython `_ instead.
  • Bug Fixes:
  • the alignment sample methods and xsample (randint had the wrong max argument)
  • Fixes the tests that no longer work with NCBI's API changes, and sticks a big warning for the unwary in the ncbi module pointing users to the "official" list of reported rettypes. Note that the rettypes changed recently and NCBI says they do not plan to support the old behavior.
  • The TreeNode operators cmp, contains, and any operator that relies on those methods will now only perform comparisons against the objects id. Prior behavior first checked the TreeNode's Name attribute and then object id if the Name was not present. This resulted in ambiguous behavior in certain situations.
  • Added type conversion to Mantel so it works on lists.
  • kendall tau fix, bug 2794469
  • Table now raises a RuntimeError if provided malformed data.
  • Fixed silent type conversion in TreeNode, bug 2804431
  • RangeNode is now properly passing kwargs to baseclass, bug 2804441
  • DndParser was not producing correct trees in niche cases, bug 2798580

New in PyCogent 1.2 (Dec 19, 2008)

  • New Features:
  • Code for performing molecular coevolution/covariation analyses on multiple sequence alignments, plus support files. (Described in J. Caporaso et al. BMC Evol Biol, 8(1):327, 2008.)
  • App controller for CD-HIT (http://www.bioinformatics.org/cd-hit/)
  • A ParameterEnumerator object is now available in cogent.app.util. This method will iterate over a range of parameters, returning parameter dicts that can be used with the relevant app controller.
  • Sequence and alignment objects that inherit from Annotatable can now mask regions of sequence, returning new objects where the observed sequence characters in the regions spanned by the annotations are replaced by a mask character (mask_char).
  • Table.count method. Counts the number of rows satisfying some condition.
  • Format writer for stockholm and clustal formats.
  • App controllers for dotur, infernal, RNAplot, RNAalifold. Parsers for infernal and dotur.
  • Empirical nucleotide substitution matrix estimation code (Described in M. Oscamou et al. BMC Bioinformatics, 9(1):511, 2008)
  • New Documentation:
  • Querying NCBI
  • The motif module.
  • UPGMA clustering
  • For using the ParameterCombinations object and generating command lines
  • Coevolution modelling
  • Sequence annotation handling
  • Table manipulation
  • Principal components analysis (PCoA)
  • Genetic code objects
  • How to construct profiles, consensus seqs etc ..
  • Changes:
  • PyCogent no longer relies on the Python math module. All math functions are now imported from numpy. The main motivator was to remove casting between numpy and Python types. Such as, a 'numpy.float64' variable unknowingly being converted to a Python 'float' type.
  • Tables.getDistinctValues now handles multiple columns.
  • Table.Header is now an immutable property of Tables. Use the withNewHeader method modifying Header labels.
  • Bug Fixes:
  • LoadTable was ignoring title argument for standard file read.
  • Fixed bug in Table.joined. When a join produces no result, now returns a Table with 0 rows.
  • Improved consistency of LoadTable with previous behaviour of cogent.Table
  • Added methods to detect large sequences/alphabets and handle counts from sequence triples correctly.
  • goldman_q_dna_pair() and goldman_q_rna_pair() now average the frequency matrix used.
  • reverse complement of annotations with disjoint spans correctly preserve order.