docx2txt Changelog

What's new in docx2txt 1.4

May 16, 2014
  • New feature:
  • Added configuration variable config_unzip_opts. This removes dependency on unzip program, and allows users to use unzipping programs like 7z, pkzipc, winzip as well.
  • Updates:
  • Fixed list numbering.
  • Improved list/paragraph indentation and corresponding code.
  • Updated README with brief guidance on how this utility can be used to recover text from corrupted docx file.

New in docx2txt 1.3 (Apr 8, 2014)

  • New features:
  • Added support for handling lists (bullet, decimal, letter, roman) along with (attempt at) indentation.
  • Updates:
  • Added configuration variable config_twipsPerChar.
  • Removed configuration variables: config_listIndent, config_exp_extra_deEscape.
  • Text output omits deleted text. This matters in case changes are being tracked in docx document.
  • Text output omits non-document_text content marked by wp/wp14 tags.

New in docx2txt 1.2 (Jan 16, 2012)

  • New features:
  • Perl script usage is extended to accept docx file from standard input. It also works with input/output redirection now. Please refer to the documentation for more information.
  • Script files and configuration file can be installed in separate directories on (non-Windows) systems using Makefile for installation.
  • Linux Makefile also attempts to update the system configuration directory to desired directory in installed Perl script.
  • User specific and system wide configuration files can be maintained separately even on windows.
  • Updates:
  • "-h" has to be given as the first argument to Perl script to get usage help.
  • Added new configuration variable "config_tempDir".
  • Configuration file is uniformly looked for in current directory, user configuration directory (APPDATA on Windows and HOME on non-Windows), system configuration directory (same location as script files on Windows, /etc or as set during installation on non-Windows systems) in the specified order.
  • Documentation has been updated with usage examples and information on how .docx file text content can directly be viewed using Vim and Emacs editors.
  • Improved handling of special (non-text) characters, along with support for more non-text characters like fractions.
  • Fixed Bug #3463033: added ' and " to docx specific escape character conversions.
  • Fixed the wrong code that had got committed during earlier fixing of nullDevice for Cygwin.

New in docx2txt 1.1 (Dec 12, 2011)

  • New features:
  • Added a check for existence of unzip command.
  • Configuration file is looked for in HOME directory as well.
  • Updates:
  • Configuration variables now begin with config_ .
  • Fixed bugs #3003903, #3082018 and #3082035.
  • Fixed nulldevice for Cygwin.
  • Superscripted cross-references are placed within now.

New in docx2txt 1.0.0 (Oct 6, 2009)

  • New features:
  • Input argument can also be a directory holding the unzipped content of .docx file.
  • Windows wrapper script, and support for using CakeCmd command line unzipper.
  • Configuration file support for easy control over settings.
  • Windows installation script.
  • Updates:
  • Hyperlink is not displayed if hyperlink and hyperlinked text are same, even though user has enabled hyperlink display.
  • Improved handling of short line justification, capturing many cases that were missed in earlier approach.
  • Path names containing spaces are now handled.

New in docx2txt 0.4 (Sep 7, 2009)

  • user can control display of hyperlink along with linked text.
  • TOC related cleanup. TOC was not addressed so far.
  • many new character conversions (check the script code for details).
  • character conversion mappings are now organised in a tabular form.
  • currency characters are converted to respective full currency name.
  • code tweaks to speedup the conversion process.

New in docx2txt 0.3 (Aug 22, 2009)

  • docx2txt.pl invocation has been changed a little.
  • user involvement during installation is reduced.
  • some suggestions on how Windows users can use this tool.