June 14th, 2013Features:
· Added (nearly-complete) Python support via SWIG
· Qt5 support for matrix visualization (Display and Spy)
· Switched from QR algorithm to DQDS for computing singular values of bidiagonal matrices
· Many more test matrices
· A preliminary toolchain file for Stampede
· Simplified exception handling via ReportException
· Version of Gemv which explicitly zeros the output
Fixed Bugs:
· Fixed major call-stack manipulation problem
· Fixed several bugs in test matrices
· Fixed syntax errors in ID and SkewHermitianEig
· Matrix (and DistMatrix) no longer allocate memory when shrinking a view
· Avoiding MPI_{Init,Query}_thread when they are not available
Maintenance:
· Shortened Grid constructors since grid width was redundant
· Simplified typical usage of SetDiagonal/MakeTrapezoidal/ScaleTrapezoid
· Defaulting to one thread for PMRRR
· Removed NullStreamBuffer since it conflicted with SWIG
May 2nd, 2013Features:
· Added thresholded version of BusingerGolub QR
· Added interpolative decompositions (ID's)
· Added skeleton decompositions
· Added configurable tolerance for pseudoinverses
Bug fixes:
· Fixed several recently introduced mistakes in Determinant calculations
Maintenance:
· Fixed several mistakes in the documentation
· Merged HermitianPseudoinverse.hpp into Pseudoinverse.hpp
April 29th, 2013Features:
· Added Businger/Golub pivoted QR and the ability to early-exit (qr::BusingerGolub)
· Added thresholded SVD based upon the cross-product algorithm (svd::Thresholded)
· Added RRQR preprocessing for faster SingularValueSoftThresholding (SVT)
· Updated RPCA example to allow for use of RRQR-preprocessed SVT
· Added implementation of Cannon's algorithm (gemm::Cannon_NN)
· Added Kahan, GKS, and Extended Kahan matrices
· Added HermitianSign and UnitaryCoherence calculations
· Added sequential version of HermitianEig
· Added versions of a few routines (e.g., Gemm) which handle zero initialization
· Added a MemSwap function to complement MemCopy and MemZero
Bug fixes:
· Fixed several zero initialization bugs
· Fixed shared library linking for PMRRR on Macs
· Fixed syntax error in DistMatrix driver
Maintenance:
· Beginning to maintain REFERENCES and PUBLICATIONS files
· Switched to much simpler RAII-based call-stack manipulation
· Added BASE(F) macro to avoid repeatedly using 'typename Base::type'
· Removed Parallel Linear Congruential Generator in favor of simpler scheme
· All special matrix function routines now take the output matrix as first argument
· Began removing 'internal' namespaces in favor of routine-specific namespaces
· Renamed 'HouseholderSolve' to 'LeastSquares'
March 8th, 2013Features:
· Added Schatten, KyFan, and entrywise norms (and explicitly exposed each norm)
· Added Hadamard products
· Added Cholesky-based QR factorization for tall-skinny matrices
· Faster builds due to support for non-monolithic header inclusion
Bug fixes:
· Fixed bugs in Trtrmm and Trdtrmm which effected upper HPDInverse
· Fixed bugs in [MD,* ] and [* ,MD] alignment routines
· Fixed bug in [MC,MR] TransposeFrom member function
· Added check for MPI threading support when initializing Elemental in hybrid mode
· Fixed conjugation problem in sequential complex Hilbert-Schmidt inner product
· Fixed detection of MPI_Reduce_scatter_block
Miscellaneous:
· Removed unnecessary 'Local' prefix from many routines, e.g., LocalLength and LocalMatrix
· Separated LogBarrier and LogDetDivergence into folder for convex optimization
· No longer attempt to automatically detect MKL during configuration
· Simplified inclusion of Elemental as CMake subproject
· Pulled many member functions out of (and generally simplified) DistMatrix class
· Removed many redundancies with respect to symmetric and Hermitian implementations
· Greatly simplified Dot and Dotu implementations
· Detection for MPI_Comm_set_errhandler (replaces MPI_Errhandler_set in MPI3)
· Shortened code for MPI wrappers
· Many more toolchain files (and removed support for XL)
Open issues:
· Forcing PMRRR to avoid threading in Pure builds when launched in a multi-threaded environment
December 17th, 2012Features:
· Nuclear norm, two-norm, and condition number routines
· Hilbert-Schmidt inner products
· Log barrier and log-det divergence routines for HPD matrices
· More special matrices (e.g., Jacobi matrices for Legendre polynomials)
· Several improved algorithms for Trmm and Trsm (thanks to Bryan Marker)
· Greatly lowered communication in Trsv
· Hegst has been split into TwoSidedTrmm/TwoSidedTrsm and extended to handle more general triangular matrices
· Trtrsm partially implemented (triangular solve against triangular matrix)
· Better command-line argument processing (via Choice)
· There are now Blue Gene/Q toolchain files (thanks to Jeff Hammond)
· 'elemvariables' Makefile include is now more robust
Bug fixes:
· Fixed mishandling of input buffers for PMRRR (the interface cannot be const since the buffers are modified)
· Avoided bug in HybridDebug mode caused by modifying the call stack from within an OpenMP loop (originally pointed out by Miles Lubin)
Miscellaneous:
· Driver for HermitianEig now allows for testing clustered eigenvalues
· Removed GFlops utility drivers since they were better left to the drivers
· New BSD License is now pointed to instead of listed in every file
· Restructuring of implementations within header files
Open issues:
· Forcing PMRRR to avoid threading in Pure builds when launched in a multi-threaded environment
August 7th, 2012Improvements:
· Added a PETSc-style Makefile include (elemvariables) to simplify the usage of the library
· Added a large number of links into the documentation
· Added a toolchain file for NERSC's HOPPER (and simplified the others)
Bug fixes:
· Fixed several mistakes in workspace sizes for calls to LAPACK's bidiagonal SVD
· Fixed several mistakes in the SVD routines
· Fixed a missing alignment free in Trdtrmm
· Restored support for MPI-1 by only using MPI_IN_PLACE when it is available
· Moved prototypes for BLAS and LAPACK functions out of header files to avoid conflicts with previous definitions
· Avoiding problems from the availability of OpenMP changing between configuration and compilation
Syntactic changes:
· Renamed the CMake options "BUILD_TESTS" and "BUILD_EXAMPLES" to "ELEM_TESTS" and "ELEM_EXAMPLES"
· Renamed "LUSolve" to "SolveAfterLU"
July 7th, 2012New functionality:
· SVD support through the bidiagonal QR algorithm. If libFLAME is linked, a high-performance QR algorithm will be used.
· Pseudoinverses and polar decompositions through the new SVD routine
· QR-based Dynamically-Weighted Halley iteration (QDWH) for computing the polar decomposition, with versions for both general and Hermitian matrices
· Support for fast expansions of packed Householder reflectors for a few cases (i.e., those needed for QR and LQ decompositions)
· Explicit QR and LQ decompositions
· Cheap two-norm estimates
· 'Norm' now supports all DistMatrix distributions, instead of just [MC,MR]
· DistMatrix now supports 'viewing' processes that do not actively own data; this makes temporarily distributing to a subset of processes (e.g., a perfect square) less of a hack
· MakeHermitian, MakeSymmetric, and MakeReal were added
· LUSolve was added for solving systems using an existing LU factorization, with or without partial pivoting
· The routine Hetrmm, for forming one half of the Hermitian result L^H L or U U^H, was generalized to also support symmetric updates and the name was changed to Trtrmm
· The routine Trdtrmm was added in order to aid in the inversion of symmetric/Hermitian-indefinite matrices and forms L^H inv(D) L or U inv(D) U^H (or the symmetric counterpart)
Performance improvements:
· Faster ApplyPackedReflectors implementations
· Many variants of Gemm are now faster due to avoiding cache-unfriendly redistributions
Bug fixes:
· Fixed subtle issue in Householder reflection generation when the norm of the lower part of the vector was zero
· Fixed namespacing complaints from new versions of GCC and Clang
· Fixed mistakes in 1-2-1 and Wilkinson matrix generation
· Fixed missing installation of FCMangle.h and cmake-dummy-lib
· Fixed leakage of viewingGroup in the Grid destructor
· Fixed mistake in parallel Adjoint and Transpose routines
Semantic changes:
· Shortened 'SetLocalEntry' and friends to the form 'SetLocal' in order to be more consistent with the distributed equivalent, 'Set'
· Expanded routines for extracting real and imaginary parts of complex data from the form 'Real' to 'RealPart'
· Shortened many redundant filenames
May 8th, 2012Feature Improvements:
· Support for generating shared libraries was added with the "-D SHARED_LIBRARIES=ON" configuration option
· Elemental can now generate many different types of special matrices (Hilbert, Walsh, DFT, etc.)
· The routine "NormalUniformSpectrum" was added which generates a complex matrix by uniformly sampling the eigenvalues from a ball in the complex plane.
· The routines "MakeHermitianUniform" and "MakeUniformHPD" were removed in favor of the function "HermitianUniformSpectrum", which takes in the interval to uniformly sample the eigenvalues from.
· Many more examples in the examples/ folder
· HermitianSVD was added
· More functions are now supported by the Complex class
· A Shaheen (not just Intrepid) BG/P toolchain file now exists
Minor improvements and changes:
· The routine "SquareRoot" was changed to "HPSDSquareRoot" since versions which do not assume a Hermitian Positive Semi-Definite matrix may eventually be added
· The enum "Side" was changed to "LeftOrRight" for clarity
· The enum "Diagonal" was changed to "UnitOrNonUnit" for clarity
· The DistMatrix implementation was greatly simplified, and many utility functions were pulled out of the class. For instance, DistMatrix member functions "MakeIdentity", "MakeTrapezoidal", and "MakeZero" were all removed from the class and are now external.
· The documentation now uses a better style (Haiku instead of Default)
Bug fixes:
· Several buffers in ApplyPackedReflectors are now explicitly initialized as zero since any entries which happened to be initialized as a NaN would propagate
· Fixed several bugs in the new DistMatrix constructors where the proper row and column shifts were not correctly set
· MPI_Comm_f2c is now used to translate Fortran communicator handles into C in the experimental F90 interface
· "CMAKE_REQUIRED_INCLUDE" -> "CMAKE_REQUIRED_INCLUDES" in main CMakeLists.txt