K2pdfopt Changelog

What's new in K2pdfopt 2.33

Nov 19, 2015
  • NEW FEATURES:
  • Compiled with GCC v5.2.0 and MuPDF v1.7a (released May 7, 2015). The MuPDF upgrade involved modifying a significant amount of the MuPDF interface code in the willus library since Artifex changed the APIs on several functions, but the bulk of the logic did not change. I uncovered a bug in the pdf_dict_del() function as well (reported).
  • The -i option displays information about the source PDF file. Added to MS Windows GUI also.
  • Added -fr option to rotate wide-aspect-ratio figures to landscape. http://www.mobileread.com/forums/showthread.php?p=3060339#post3060339
  • Added Kindle Paperwhite 3 (2015 release) and Pocketbook Basic 2 to dev list (from http://www.mobileread.com/forums/showthread.php?t=253579)
  • Smarter sorting of red regions on a multiple-column page. See pageregion_sort() function in pageregions.c.
  • New -ibox option has same format as -cbox, but these boxes are ignored by k2pdfopt--they are "whited out" in the source file. For native output, the contents may still be visible in the output.
  • The -neg option now attempts to only negate text passages to white on black and to leave figures alone. Use -neg+ to negate everything. http://www.mobileread.com/forums/showthread.php?p=3104536#post3104536
  • Added option -ehl to erase horizontal lines in the document. Works exactly like the -evl option.
  • Added -author and -title options to specify the author and title of the output PDF. http://www.mobileread.com/forums/showthread.php?p=3112052#post3112052
  • Added -px option to exclude a set of pages, e.g. -px 4,7,10-20. http://www.mobileread.com/forums/showthread.php?p=3112052#post3112052
  • User can use color markings to tell k2pdfopt where to apply page breaks to the output file. http://www.mobileread.com/forums/showthread.php?p=3152988#post3152988
  • The -? option can now be followed by a (wildcard) matching string to show the usage of a particlar option, e.g. -? -ws.
  • BUG FIXES:
  • With notes options turned on (-nl / -nr), k2pdfopt will still search for multiple columns if no notes are found on the page. In addition, the -crgh option now more directly affects column divider finding. See textrows_remove_small_rows() call in bmpregion_find_multicolumn_divider(). http://www.mobileread.com/forums/showthread.php?p=3148589#post3148589
  • Fixed multiple file select (broke when I converted to wide chars in v2.30).
  • Modified bmpregion_hyphen_detect() to be less strict about rejecting hyphens that aren't exactly centered. Also modified calculation of lcheight in bmpregion_calc_bbox()--see the function. http://www.mobileread.com/forums/showthread.php?p=3119501#post3119501
  • The k2pdfopt web site and help pages work again from the help menu.
  • Turned off some debugging text from the bmp_autocrop2 function in k2bmp.c.
  • Not really a bug fix, but the command-line help is now shown in Courier New in MS Windows (a mono-spaced font).
  • In info_update() in wmupdf.c in the willus library, I check to see if I can resolve the Info dictionary. This checks to see if it can be parsed correctly. If not, I discard the dictionary. This was causing a bug that a user submitted to me in an e-mail on 15 April 2015. The users had a PDF file with a corrupt "Info" dictionary.
  • WPDFOUTLINE structures correctly freed.
  • MuPDF v1.7 stores ligatured characters differently than previous versions in its internal character arrays, so I had to compensate for this.

New in K2pdfopt 2.32 (Mar 7, 2015)

  • NEW FEATURES:
  • A new auto-cropping feature (-ac) has been added where k2pdfopt will attempt to crop out dark edges due to scanning / copying artifacts.
  • There is a checkbox for it in the MS Windows GUI.
  • MS Windows GUI: graphical selection of crop margins has now been implemented. When you click the "Select Margins" button, k2pdfopt will overlay all of the pages in the "Pages to Convert" box and allow you to select a rectangular crop region with the mouse, which it will then use to populate the "Crop Margins" values.
  • The -ls option now takes an optional page range so that it can be applied to specified pages. A control was added to the MS Windows GUI for this.
  • Improved the context sensitive help in the MS Windows GUI.
  • OTHER:
  • Update the list of devices and their dimension from the information collected at http://www.mobileread.com/forums/showthread.php?t=253579
  • Clarified -cbox usage.
  • Added source code flow description to k2pdfopt.c.
  • BUG FIXES:
  • Fixed a bug with notes in the margins (-nl/-nr options), checking for notesrows->n==0 (was causing a crash).
  • Clarified usage for -m option.
  • Added MS Windows GUI confirmation of Tesseract initialization (in the conversion dialog box).
  • -cbox- did not work correctly if was beyond the last page of the source PDF. This is fixed.
  • Fixed a preview error noted by a mobileread member. Was due to not correctly clearing a WTEXTCHARS structure in ocrlayer_bounding_box_inches().
  • Fixed some issues with masterinfo_should_flush() where it wasn't correctly figuring out the next page.
  • Fixed issues selecting the text in text edit boxes as they gain focus (either through tabbing or mouse clicks).
  • The number of inserted rows added by textrows_find_doubles() is now limited to a reasonable number. This was going out of control in one oddball case and (I think) causing k2pdfopt to crash.

New in K2pdfopt 2.31 (Dec 29, 2014)

  • NEW FEATURES:
  • Added -ppgs option to post process output with ghostscript pdfwrite device. This can improve text selection when there are overlapping cropped regions. Recommended in 7 Dec 2014 e-mail.
  • NEW DEVICES:
  • Added separate Kindle Paperwhite 2 resolution: 710x960. From 12 Dec 2014 e-mail.
  • Added kindle voyage.
  • BUG FIXES:
  • Erase vertical lines correctly works from MS Windows GUI again.
  • OTHER:
  • Compiled with MuPDF v1.6 library.
  • Separated out functions in wmupdf.c that do not depend on MuPDF. Those are now in wpdf.c. This helps the KOReader build. https://github.com/koreader/koreader-base/pull/290 (2 Dec 2014) Also e-mail on 7 Dec 2014.
  • No longer assumes that WIN32 = GUI. To compile so as not to use the WIN32 API, define NO_WIN32_API from the compile command line.

New in K2pdfopt 2.30 (Nov 27, 2014)

  • NEW FEATURES:
  • Added -colorfg and -colorbg options to adjust the foreground (text) and background colors of the output file (only works for bitmapped output files--doesn't work for native PDF output). You can even use a background bitmap (which will be tiled).
  • BUG FIXES:
  • Marking corners now works for color bitmapped output.
  • Compiled with OpenJPEG v2.1.0--fixed some cases with incorrectly reading PDF files. (beta2) The typical symptom is reported as "JPX stream not read coorectly." Issue reported in 2 Nov 2014 e-mail.
  • Removed "wrectmaps->n=..." debugging output from k2ocr.c. (beta3)
  • Clarified intended use of -rt vs. -ls.
  • Fixed buffer overrun in k2gui_cbox_set_pages_completed(). Reported via e-mail on 10 Nov 2014.
  • Preview mode correctly turns on color for native PDF output (also turns off gamma). If the user tries to turn off color output in the MS Windows GUI while native PDF output is checked, a dialog box will pop up explaining why it won't uncheck.
  • Preview mode correctly puts up alert box when source file is not found.
  • File open / folder open puts up alert box if file or folder is not found.
  • The -ocrsp+ command-line option is now correctly recognized.
  • GOCR is correctly used if Tesseract cannot be initialized.

New in K2pdfopt 2.21 (Jul 26, 2014)

  • Compiled with MuPDF v1.5 (a highly recommended, mostly-bug-fix upgrade recommended by the MuPDF folks).

New in K2pdfopt 2.18 (Jul 4, 2014)

  • Fixed problem when scaling sometimes gets out of control with tall regions. Was causing excessively large bitmaps to be allocated which would sometimes run the system out of memory. Search for "2.18" in k2proc.c.

New in K2pdfopt 2.17a (Jun 3, 2014)

  • Fixes MuPDF v1.4 problem where it was not correctly using MS Windows system fonts (introduced in v2.17).
  • Compiled w/gcc 4.8.3.

New in K2pdfopt 2.17 (May 19, 2014)

  • ENHANCEMENTS:
  • Compiled with the latest versions of MuPDF (1.4), Turbo JPEG (1.3.1), libpng (1.6.10), and freetype (2.5.3).

New in K2pdfopt 2.16 (May 5, 2014)

  • BUG FIXES:
  • Avoid zero-value return from masterinfo_break_point().
  • TOC positioning fixed when source pages aren't large enough to cause a new destination page (see k2publish.c). Also, wmupdf output now correctly handles UTF-8 outline/TOC titles.

New in K2pdfopt 2.15 (Mar 28, 2014)

  • ENHANCEMENTS
  • The -cbox option usage has been rewritten and hopefully clarified. It is a very powerful and useful new option as of v2.10.
  • BUG FIXES:
  • Specific variable to track Tesseract initialization (it was sometimes getting missed if it was turned on after a conversion in the GUI).
  • Mode selection works in GUI (wasn't correctly selecting "fitpage") and in text menu again (1-7-14 e-mail).
  • Fixed memory leak in bmp_detect_vertical_lines() in k2bmp.c. http://www.mobileread.com/forums/showthread.php?p=2737816#post2737816
  • Fixed several memory leaks--made sure bmpregion_free() is called for each declared BMPREGION, and also patched wmupdf.c to fix two memory leaks. The most significant memory leak was in the dtcompress.c library function, though, which has also been fixed. http://www.mobileread.com/forums/showthread.php?p=2752370#post2752370
  • Fixed incorrectly formed #ifdef HAVE_OCR_LIB in k2publish.c. Thanks to user facut on mobileread.com, who e-mailed me about this on 21 March 2014.

New in K2pdfopt 2.14 (Jan 3, 2014)

  • Compiled using dtcompress.c module in willus library which avoids requiring the dev's custom modification to zlib.
  • willus lib modules use more standard include callouts for MuPDF and DjVu.
  • Added CMakeLists.txt files to source distribution
  • Correctly re-compiled Win32 build (wasn't done correctly in v2.13).

New in K2pdfopt 2.12 (Dec 3, 2013)

  • BUG FIXES:
  • No longer writes k2pdfopt_out.png when previewing in the GUI.
  • Removed DLL dependencies from 64-bit Windows compile.

New in K2pdfopt 2.10 (Nov 25, 2013)

  • NEW FEATURES:
  • The PDF "Outlines" tree (often called "bookmarks" by PDF viewers) that helps you navigate the PDF file and is usually shown in the left pane of the PDF viewer is now preserved in the converted file.Or you can create your own bookmarks from a simple text file if your PDF source file doesn't have one (or if you want to change it). See the -toc, -toclist, and -tocsave command-line options. (toc = Table of Contents.)Destination page breaks are forced at outline anchor pages by default (see -bp option).
  • A new -cbox option allows you to specify a crop box to be applied to each page.You can specify more than one, and each separate crop box will be rendered to a different output page, similar to the way the -grid option works.See -cbox in the command usage. Using -mode crop with -cbox, you can crop a source PDF file to a destination PDF file.You can specify different crop boxes for even and odd pages, as well.
  • The -bpl option now allows you to specify a list of source pages where destination page breaks will be forced.
  • Three new modes:-mode trim causes the source page to be trimmed and the destination to be sized to the trimmed source.-mode fitpage is similar, but squeezes the trimmed source page into the specified device output screen size.-mode crop is a complement to the -cbox option and causes each cropped box to be placed on a new page the size of the cropped box.
  • ENHANCEMENTS:
  • Windows versions are compiled with gcc 4.8.2.
  • The Win64 binary is now compressed with UPX 3.91w which finally is able to compress the Win64/PE format.
  • BUG FIXES:
  • In native output, consecutive streams now delimited by white space.
  • http://www.mobileread.com/forums/showthread.php?p=2655550#post2655550
  • Pages with no "/Contents" entry are correctly handled.
  • Re-wrote masterinfo_break_point() to make use of bmpregion_find_textrows() so that decisions on where to break pages in the "fitwidth" mode should be more consistent and also will be affected by the -gtr option. http://www.mobileread.com/forums/showthread.php?p=2686067#post2686067
  • Removed last vestiges of -pi option (interactive menu 'w' option was incorrectly still using it).
  • The vert_line_erase() function in k2bmp.c correctly handle the cbmp pointer when it is an 8-bit bitmap now.
  • Fixed a flow problem in k2file.c (k2pdfopt_proc_one() function) which was causing the GUI preview not to work with -mode copy.
  • The textrows_remove_small_rows() function no longer includes figures (REGION_TYPE_FIGURE) when doing statistics on the row heights.

New in K2pdfopt 2.03 (Sep 23, 2013)

  • ENHANCEMENTS:
  • MuPDF library now uses the Sumatra versions of pdf-font.c and pdf-fontfile.c so that it correctly checks Windows system fonts for non-embedded fonts in the PDF file.
  • BUG FIXES:
  • Native mode is correctly turned off as the default setting.
  • Native mode output works correctly from the MS Windows GUI.
  • Check boxes made consistent (native/wrap/OCR) with quick sanity check call.

New in K2pdfopt 2.02 (Sep 19, 2013)

  • ENHANCEMENTS:
  • The main bitmap resampling function in the willus library, bmp_resample(), now uses an alternate fixed-precision version (with virtually no accuracy loss) on non-64-bit compiles, where it has been tested to be significantly (50% to 100%) faster (on 64-bit modern Intel CPUs, the floating-point version is actually fastest). This should improve k2pdfopt performance on 32-bit and ARM implementations, including on the KOReader, for example. The resample routine also checks to see if no size change is required--if so, it does a simple bmp_copy() call instead of resampling.
  • GENERIC BUG FIXES:
  • echo_source_page_count correctly initialized to zero.
  • hyphen detection turned on (has been disabled since at least v1.66). (See bmpregion_hyphen_detect() function.)
  • gap_sorted variable in textwords_add_word_gaps() is no longer static.
  • Changed kindle paperwhite dims to 658 x 889 based on 9-15-13 e-mail feedback.

New in K2pdfopt 2.01 (Sep 16, 2013)

  • BUG FIXES:
  • Fixed significant memory leak in wmupdf.c (added wtextchars_free call in wtextchars_text_inside()). Was causing k2pdfopt to crash on conversions of large PDF files.
  • Better feedback after preview button is pressed. Also works correctly when re-sizing the window during a preview.
  • Inserted 0.1-second sleep at end of k2gui_preview_start()--seems to prevent occasional problems with the preview.

New in K2pdfopt 1.66 (Jul 24, 2013)

  • NEW FEATURES:
  • Option -bp+ will break pages between the green regions as marked by the -sm option (feature request from 6-2-13 e-mail).
  • BUG FIXES:
  • Option -mode def correctly sets margins to zero instead of 0.25. It also now correctly turns off native mode and landscape mode.
  • Prevents infinite OCR font sizes from being written to PDF file. (5-26-2013 private message at mobileread.com about a problem converting a DjVu file.)
  • Fixed array-out-of-bounds issue when searching for the column divider, particularly with blank pages. See "v1.66 fix" in k2proc.c. Fixes these reported issues: 1. 4-27-13 e-mail (alice.pdf) 2. http://www.mobileread.com/forums/showthread.php?p=2558185 3. 7-10-13 e-mail (failed on pages 1, 2, and 14).
  • Fixed breakinfo_find_doubles() in breakinfo.c to avoid an infinite loop situation. See "v1.66 fix" notes. Fixes this reported issue: 7-7-13 e-mail (pages 98 and 187 failed).
  • Fixed bug in MuPDF library when fz_ensure_buffer() is called with buf->cap==1 (results in infinite loop). Reported in 6-18-13 e-mail on a conversion that hung with -mode fw.

New in K2pdfopt 1.65 (Apr 9, 2013)

  • NEW FEATURES / OPTIONS:
  • Added Kobo Glo and Kobo Touch device settings. (http://www.mobileread.com/forums/showpost.php?p=2441354&postcount=336)
  • Re-vamped the bmp_source_page_add() function so that the logic that breaks the page out into displayable rectangular regions can be used in other places (e.g. by the OCR fill-in function).
  • Added option -ocrcols which sets the max number of columns for processing with OCR (if different from the -col value). You would use this if you want to OCR a PDF file using -mode copy, but the file has multiple columns of text. (http://www.mobileread.com/forums/showpost.php?p=2442523&postcount=341)
  • Added option -rsf (row-split figure-of-merit) which controls a new algorithm which goes back and looks for rows of text which should be split into two (or three) separate rows. This is meant to help catch those cases where k2pdfopt should have split apart two rows of text but did not because of a small amount of overlap. See breakinfo_find_doubles() in breakinfo.c.
  • LIBRARY UPDATES:
  • Compiled with latest versions of major libraries: MuPDF 1.2, DjVu 3.5.25.3, FreeType 2.4.11, Turbo JPEG 1.2.1, PNG 1.5.14, Z-lib 1.2.7.
  • Linux version now compiled with gcc 4.7.2 in Ubuntu 12.
  • TWEAKS:
  • Clarified usage for -vb in k2usage.c
  • Changed "destination" to "E-reader" in places on the k2 interactive menu and device menu.
  • Put "disclaimer" in OCR usage which clarifies the purpose.
  • Default crop margins are now zero (was 0.25 inches). This was confusing too many people. (http://www.mobileread.com/forums/showpost.php?p=2456032&postcount=352)
  • In bmp_region_vertically_break(), different width regions and regions with different ending/starting row heights cause a vertical gap to be inserted in the output.
  • BUG FIXES:
  • Call k2pdfopt_settings_sanity_check() once per source document. This fixes a crash when converting multiple files. (Certain vars weren't getting correctly initialized on the 2nd, 3rd, etc. conversion files.) (http://www.mobileread.com/forums/showpost.php?p=2409726&postcount=317)
  • Fixed array-out-of-bounds access in k2proc.c (bmpregion_find_multicolumn_divider function) which occasionally caused k2pdfopt to terminate abnormally (typically when converting mostly blank pages). (http://www.mobileread.com/forums/showpost.php?p=2456548&postcount=356)
  • Fixed k2pdfopt_proc_one() in k2file.c so that native PDF output is turned off if the source file is not PDF (e.g. DjVu conversion).
  • Fixed spacing between regions with -vb -2 or -vb -1 (gap between pages where new chapter starts, for example--font change, etc.). (http://www.mobileread.com/forums/showpost.php?p=2373550&postcount=292)
  • Minimum width in vertical line detection is now 1 pixel. (http://www.mobileread.com/forums/showpost.php?p=2452356&postcount=345)
  • Better diagnostic output on TESSDATA_PREFIX env var.
  • Fixed native PDF output so that scientific notation is not allowed in PDF clipping commands. This was causing native conversions not to work correctly in some cases. (http://www.mobileread.com/forums/showpost.php?p=2467063&postcount=371)

New in K2pdfopt 1.64a (Jan 11, 2013)

  • Fixed bug in Native PDF output introduced in v1.64. (stream_deflate function in wmupdf.c)

New in K2pdfopt 1.64 (Jan 11, 2013)

  • Native PDF output changed so that source pages are converted to XObjects (Form type). This should be much more robust when putting contents from multiple source pages onto a single destination page.
  • Added profile for Kindle paperwhite. (-dev kpw)
  • The fontdata.c file in willus lib has been reduced to only one font in order to reduce the size of the k2pdfopt binaries since k2pdfopt only uses one font for the -sm option.
  • The page width and height can now be specified in terms of the trimmed source page width and height. Use 't' for the units, e.g. -w 1t -h 1t. This would typically be used with the -mode copy and/or -grid options.
  • The -bp option can now take a numeric argument (inches) to insert a gap (of that many inches) between each source page.
  • There is now an interactive menu option for selecting the OCR language training file (Tesseract OCR only).
  • Fixed memory leak in bmpregion_find_multicolumn_divider().
  • Fixed default value for -col in usage.
  • Clarified -ocrlang usage.
  • Compiled Linux versions with -static and -static-libstdc++ to hopefully reduce shared library incompatibilities.

New in K2pdfopt 1.33 (Nov 29, 2011)

  • Added autodetection of the orientation of the PDF
  • file. This is somewhat experimental and comes with
  • several caveats, but I have made it the default
  • because I think it works pretty well.
  • Caveat #1: It assumes the PDF/DJVU file is mostly
  • lines of text and looks for regularly spaced lines
  • of text to determine the orientation.
  • Caveat #2: If it determines that the page is
  • sideways, it rotates it 90 degrees clockwise, so it
  • may end up upside down.
  • The autodetection is set with the -rt command-line
  • option (or the "rt" menu option):
  • Set it to a number to rotate your PDF/DJVU file
  • that many degrees counter-clockwise.
  • Set it to "auto" and k2pdfopt will examine up
  • to 10 pages of the file to determine the
  • orientation it will use.
  • Set it to "aep" to auto-detect the rotation of
  • every page. If you have different pages that
  • are rotated differently from each other within
  • one file, you can use this option to try to
  • auto-rotate each page.
  • To revert to v1.32 and turn off the orientation
  • detection, just put -rt 0 on the command line.
  • Added option to attempt full justification when
  • breaking lines of text. This is experimental and
  • will only work well if the output dpi is chosen so
  • that rows break approximately evenly. To turn on,
  • use the "j" option in the interactive menu or the
  • -j command-line option with a + after the selection,
  • e.g.
  • -j 0+ (left/full justification)
  • -j 1+ (center/full justification)