What's new in LinkChecker 9.3
Sep 6, 2014
- Features:
- checking: Parse and check links in PDF files.
- checking: Parse Refresh: and Content-Location: HTTP headers for URLs.
- Changes:
- plugins: PDF and Word checks are now parser plugins (PdfParser, WordParser). Both plugins are not enabled by default since they require third party modules.
- plugins: Print a warning for enabled plugins that could not import needed third party modules.
- checking: Treat empty URLs as same as parent URL. Closes: GH bug #524
- installation: Replaced the twill dependency with local code.
- Fixes:
- checking: Catch XML parse errors in sitemap XML files and print them as warnings. Patch by Mark-Hetherington. Closes: GH bug #516
- checking: Fix internal URL match pattern. Patch by Mark-Hetherington. Closes: GH bug #510
- checking: Recalculate extern status after HTTP redirection. Patch by Mark-Hetherington. Closes: GH bug #515
- checking: Do not strip quotes from already resolved URLs. Closes: GH bug #521
- cgi: Sanitize configuration. Closes: GH bug #519
- checking: Use user-supplied authentication and proxies when requestiong robot.txt.
- plugins: Fix Word file check plugin. Closes: GH bug #530
New in LinkChecker 9.2 (Sep 6, 2014)
- Fixes:
- checking: Don't scan external robots.txt sitemap URLs.
- Closes: GH bug #495
- installation: Correct case for pip install command.
- Closes: GH bug #498
- Features:
- checking: Parse and check HTTP Link: headers.
- checking: Support parsing of HTML image srcset attributes.
- checking: Support parsing of HTML schema itemtype attributes.
New in LinkChecker 9.1 (Mar 31, 2014)
- Features:
- installation: Use .gz compression for source release to support "pip install".
New in LinkChecker 9.0 (Mar 5, 2014)
- Features:
- checking: Support connection and content check plugins.
- checking: Move lots of custom checks like Antivirus and syntax checks into plugins (see upgrading.txt for more info).
- checking: Add options to limit the number of requests per second, allowed URL schemes and maximum file or download size. Closes: GH bug #397, #465, #420
- checking: Support checking Sitemap: URLs in robots.txt files.
- checking: Reduced memory usage when caching checked links. Closes: GH bug #429
- gui: UI language can be changed dynamically. Closes: GH bug #391
- Changes:
- checking: Use the Python requests module for HTTP and HTTPS requests. Closes: GH bug #393, #463, #417
- logging: Removed download, domains and robots.txt statistics.
- logging: HTML output is now in HTML5.
- checking: Removed 301 warning since 301 redirects are used a lot without updating the old URL links. Also, recursive redirection is not checked any more since there is a maximum redirection limit anyway. Closes: GH bug #444, #419
- checking: Disallowed access by robots.txt is an info now, not a warning. Otherwise it produces a lot of warnings which is counter-productive.
- checking: Do not check SMTP connections for mailto: URLs anymore. It resulted in lots of false warnings since spam prevention usually disallows direct SMTP connections from unrecognized client IPs.
- checking: Only internal URLs are checked as default. To check external urls use --check-extern. Closes: GH bug #394, #460
- checking: Document that gconf and KDE proxy settings are parsed. Closes: GH bug #424
- checking: Disable twill page refreshing. Closes: GH bug #423
- checking: The default number of checking threads is 10 now instead of 100.
- Fixes:
- logging: Status was printed every second regardless of the configured wait time.
- logging: Add missing column name to SQL insert command. Closes: GH bug #399
- checking: Several speed and memory usage improvements.
- logging: Fix --no-warnings option. Closes: GH bug #457
- logging: The -o none now sets the exit code. Closes: GH bug #451
- checking: For login pages, use twill form field counter if the field has neither name nor id. Closes: GH bug #428
- configuration: Check regular expressions for errors. Closes: GH bug #410
New in LinkChecker 8.6 (Jan 9, 2014)
- Changes:
- checking: Add "Accept" HTTP header.
- Fixes:
- installer: Include missing logger classes for Windows and
- OSX installer.
New in LinkChecker 8.5 (Dec 27, 2013)
- Features:
- checking: Make per-host connection limits configurable.
- checking: Avoid DoS in SSL certificate host matcher.
- Changes:
- checking: Always use the W3C validator to check HTML or CSS syntax.
- checking: Remove the http-wrong-redirect warning.
- checking: Remove the url-content-duplicate warning.
- checking: Make SSL certificate verification optional and allow
- user-specified certificate files.
- Closes: GH bug #387
- cmdline: Replace argument parsing. No changes in functionality, only
- the help text will be formatted different.
- gui: Check early if help files are not found.
- Closes: GH bug #437
- gui: Remember the last "Save result as" selection.
- Closes: GH bug #380
- Fixes:
- checking: Apache Coyote (the HTTP server of Tomcat) sends the wrong
- Content-Type on HEAD requests. Automatically fallback to GET in this
- case.
- Closes: GH bug #414
- checking: Do not use GET on POST forms.
- Closes: GH bug #405
- scripts: Fix argument parsing in linkchecker-nagios
- Closes: GH bug #404
- installation: Fix building on OS X systems.
New in LinkChecker 8.4 (Jan 26, 2013)
- Features:
- checking: Support URLs.
- logging: Sending SIGUSR1 signal prints the stack trace of all current running threads. This makes debugging deadlocks easier.
- gui: Support Drag-and-Drop of local files. If the local file is a LinkChecker project (.lcp) file it is loaded, else the check URL is set to the local file URL.
- Changes:
- checking: Increase per-host connection limits to speed up checking.
- Fixes:
- checking: Fix a crash when closing a Word document after scanning failed. Closes: GH bug #369
- checking: Catch UnicodeError from idna.encode() fixing an internal error when trying to connect to certain invalid hostnames.
- checking: Always close HTTP connections without body content. See also http://bugs.python.org/issue16298 Closes: GH bug #376
New in LinkChecker 8.3 (Jan 7, 2013)
- Features:
- project: The Project moved to Github.
- Changes:
- logging: Print system arguments (sys.argv) and variable values in internal error information.
- installation: Install the dns Python module into linkcheck_dns subdirectory to avoid conflicts with an upstream python-dns installation.
- Fixes:
- gui: Fix storing of ignore lines in options.
New in LinkChecker 8.2 (Nov 10, 2012)
- Changes:
- checking: Print a warning when passwords are found in the configuration file and the file is accessible by others.
- checking: Add debug statements for unparseable content types. Closes: SF bug #3579714
- checking: Turn off caching. This improves memory performance drastically and it's a very seldom used feature - judging from user feedback over the years and my own experience.
- checking: Only allow checking of local files when parent URL does not exist or it's also a file URL.
- Fixes:
- checking: Fix anchor checking of cached HTTP URLs. Closes: SF bug #3577743
- checking: Fix cookie path matching with empty paths. Closes: SF bug #3578005
- checking: Fix handling of non-ASCII exceptions (regression in 8.1). Closes: SF bug #3579766
- configuration: Fix configuration directory creation on Windows systems.
New in LinkChecker 8.1 (Oct 15, 2012)
- Features:
- checking: Allow specification of maximum checking time or maximum number of checked URLs.
- checking: Send a HTTP Do-Not-Track header.
- checking: Check URL length. Print error on URL longer than 2000 characters, warning for longer than 255 characters.
- checking: Warn about duplicate URL contents.
- logging: A new XML sitemap logger can be used that implements the protocol defined at http://www.sitemaps.org/protocol.php.
- Changes:
- doc: Mention 7-zip and Peazip to extract the .tar.xz under Windows. Closes: SF bug #3564733
- logging: Print download and cache statistics in text output logger.
- logging: Print warning tag in text output logger. Makes warning filtering more easy.
- logging: Make the last modification time a separate field in logging output. See doc/upgrading.txt for compatibility changes.
- logging: All sitemap loggers log all valid URLs regardless of the --warnings or --complete options. This way the sitemaps can be logged to file without changing the output of URLs in other loggers.
- logging: Ignored warnings are now never logged, even when the URL has errors.
- checking: Improved robots.txt caching by using finer grained locking.
- checking: Limit number of concurrent connections to FTP and HTTP servers. This avoids spurious BadStatusLine errors.
- Fixes:
- logging: Close logger properly on I/O errors. Closes: SF bug #3567476
- checking: Fix wrong method name when printing SSL certificate warnings.
- checking: Catch ValueError on invalid cookie expiration dates. Patch from Charles Jones. Closes: SF bug #3575556
- checking: Detect and handle remote filesystem errors when checking local file links.
New in LinkChecker 8.0 (Sep 3, 2012)
- Features:
- checking: Verify SSL certificates for HTTPS connections. Both the
- hostname and the expiration date are checked.
- checking: Always compare encoded anchor names.
- checking: Support WML sites.
- checking: Show number of parsed URLs in page content.
- cmdline: Added Nagios plugin script.
- Changes:
- dependencies: Python >= 2.7.2 is now required
- gui: Display debug output text with fixed-width font.
- gui: Display the real name in the URL properties.
- gui: Make URL properties selectable with the mouse.
- checking: Ignore feed: URLs.
- checking: --ignore-url now really ignores the URLs instead
- of checking only the syntax.
- checking: Increase the default number of checker threads from 10 to
- 100.
- Fixes:
- gui: Fix saving of the debugmemory option.
- checking: Do not handle attribute as parent
- URL but as normal URL to be checked.
- checking: Fix UNC path handling on Windows.
- checking: Detect more sites not supporting HEAD requests properly.
New in LinkChecker 7.9 (Jun 11, 2012)
- Fixes:
- checking: Catch any errors initializing the MIME database.
- Closes: SF bug #3528450
- checking: Fix writing temporary files.
- checking: Properly handle URLs with user/password information.
- Closes: SF bug #3529812
- Changes:
- checking: Ignore URLs from local PHP files with execution
- directives of the form "".
- Prevents false errors when checking local PHP files.
- Closes: SF bug #3532763
- checking: Allow configuration of local webroot directory to
- enable checking of local HTML files with absolute URLs.
- Closes: SF bug #3533203
- Features:
- installation: Support RPM building with cx_Freeze.
- installation: Added .desktop files for POSIX systems.
- checking: Allow writing of a memory dump file to debug memory
- problems.
New in LinkChecker 7.8 (May 14, 2012)
- Fixes:
- checking: Always use GET for Zope servers since their HEAD support is broken.
- installation: Install correct MSVC++ runtime DLL version for Windows.
- installation: Install missing Python modules for twill, cssutils and HTMLTidy.
- Changes:
- documentation: Made the --ignore-url documentation more clear. Patch from Charles Jones.
- installation: Report missing py2app instead of generating a Distutils error.
- documentation: Fix typo in linkcheckerrc.5 manual page.
- Features:
- installation: Add dependency declaration documentation to setup.py.
New in LinkChecker 7.7 (Apr 23, 2012)
- Fixes:
- checking: Detect invalid empty cookie values.
- checking: Fix cache key for URL connections on redirect.
- gui: Fix update check when content could not be downloaded.
- i18n: Make locale domain name lowercase, fixing the .mo-file
- lookup on Unix systems.
- checking: Fix CSV output with German locale.
- checking: Write correct statistics when saving results in the GUI.
- Changes:
- cmdline: Remove deprecated options --check-css-w3 and
- --check-html-w3.
- Features:
- cgi: Added a WSGI script to replace the CGI script.
New in LinkChecker 7.6 (Apr 2, 2012)
- Fixes:
- checking: Recheck extern status on HTTP redirects even if domain
- did not change. Patch by Charles Jones.
- Closes: SF bug #3495407
- checking: Fix non-ascii HTTP header handling.
- Closes: SF bug #3495621
- checking: Fix non-ascii HTTP header debugging.
- Closes: SF bug #3488675
- checking: Improved error message for connect errors to the ClamAV
- virus checking daemon.
- gui: Replace configuration filename in options dialog.
- checking: Honor the charset encoding of the Content-Type HTTP
- header when parsing HTML. Fixes characters displayed as '?'
- for non-ISO-8859-1 websites.
- Closes: SF bug #3388257
- checking: HTML parser detects and handles invalid comments of the
- form "".
- Closes: SF bug #3509848
- checking: Store cookies on redirects. Patch by Charles Jones.
- Closes: SF bug #3513345
- checking: Fix concatenation of multiple cookie values.
- Patch by Charles Jones.
- logging: Encode comments when logging CSV comments.
- Closes: SF bug #3513415
- Changes:
- checking: Add real url to cache. Improves output for cached errors.
- checking: Specify timeout for SMTP connections. Avoids spurious
- connect errors when checking email addresses.
- Closes: SF bug #3504366
- Features:
- config: Allow --pause and --cookiefile to be set in configuration file.
New in LinkChecker 7.5 (Feb 14, 2012)
- Fixes:
- checking: Properly handle non-ascii HTTP header values.
- Closes: SF bug #3473359
- checking: Work around a Squid proxy bug which resulted in not
- detecting broken links.
- Closes: SF bug #3472341
- documentation: Fix typo in the manual page.
- Closes: SF bug #3485876
- Changes:
- checking: Add steam:// URIs to the list of ignored URIs.
- Closes: SF bug #3471570
- checking: Deprecate the --check-html-w3 and --check-css-w3 options.
- The W3C checkers are automatically used if a local check library
- is not installed.
- distribution: The portable version of LinkChecker does not write
- the configuration file in the user directory anymore. So a user
- can use this version on a foreign system without leaving any traces
- behind.
- Features:
- gui: Add Ctrl-L shortcut to highlight the URL input.
- gui: Support loading and saving of project files.
- Closes: SF bug #3467492
New in LinkChecker 7.4 (Jan 7, 2012)
- Fixes:
- gui: Fix saving of check results as a file.
- Closes: SF bug #3466545, #3470389
- Changes:
- checking: The archive attribute of and is a
- comma-separated list of URIs. The value is now split and each URI
- is checked separately.
- cmdline: Remove deprecated options.
- configuration: The dictionary-based logging configuration is now
- used. The logging.conf file has been removed.
- dependencies: Python >= 2.7 is now required
- Features:
- checking: Add HTML5 link elements and attributes.
New in LinkChecker 7.3 (Dec 27, 2011)
- Fixes:
- configuration: Properly detect home directory on OS X systems.
- Closes: SF bug #3423110
- checking: Proper error reporting for too-long unicode hostnames.
- Closes: SF bug #3438553
- checking: Do not remove whitespace inside URLs given on the
- commandline or GUI. Only remove whitespace at the start and end.
- cmdline: Return with non-zero exit value when internal program
- errors occurred.
- gui: Fix saving of check results as a file.
- Changes:
- gui: Display all options in one dialog instead of tabbed panes.
- Features:
- gui: Add configuration for warning strings instead of regular
- expressions. The regular expressions can still be configured in
- the configuration file.
- gui: Add configuration for ignore URL patterns.
- Closes: SF bug #3311262
- checking: Support parsing of Safari Bookmark files.
New in LinkChecker 7.2 (Oct 20, 2011)
- Fixes:
- checking: HTML parser now correctly detects character encoding for
- some sites.
- Closes: SF bug #3388291
- logging: Fix SQL output.
- Closes: SF bug #3415274, #3422230
- checking: Fix W3C HTML checking by using the new soap12 output.
- Closes: SF bug #3413022
- gui: Fix startup when configuration file contains errors.
- Closes: SF bug #3392021
- checking: Ignore errors trying to get FTP feature set.
- Closes: SF bug #3424719
- Changes:
- configuration: Parse logger and logging part names case insensitive.
- Closes: SF bug #3380114
- gui: Add actions to find bookmark files to the edit menu.
- Features:
- checking: If a warning regex is configured, multiple matches in
- the URL content are added as warnings.
- Closes: SF bug #3412317
- gui: Allow configuration of a warning regex.
New in LinkChecker 7.1 (Aug 8, 2011)
- Fixes:
- checking: HTML parser detects and handles stray "
New in LinkChecker 7.0 (May 28, 2011)
- Fixes:
- doc: Correct reference to RFC 2616 for cookie file format.
- Closes: SF bug #3299557
- checking: HTML parser detects and handles stray "
New in LinkChecker 6.9 (May 7, 2011)
- Fixes:
- gui: Correctly reset logger statistics.
- gui: Fixed saving of parent URL source.
- installer: Fixed portable windows version by not compressing DLLs.
- checking: Catch socket errors when resolving GeoIP country data.
- Changes:
- checking: Automatically allow redirections from URLs given by the
- user.
- checking: Limit download file size to 5MB.
- SF bug #3297970
- gui: While checking, show new URLs added in the URL list view by
- scrolling down.
- gui: Display release date in about dialog.
- Closes: SF bug #3297255
- gui: Warn before closing changed editor window.
- Closes: SF bug #3297245
- doc: Improved warningregex example in default configuration file.
- Closes: SF bug #3297254
- Features:
- gui: Add syntax highlighting for Qt editor in case QScintilla
- is not installed.
- gui: Highlight check results and colorize number of errors.
- gui: Reload configuration after changes have been made in the editor.
- Closes: SF bug #3297242
New in LinkChecker 6.8 (Apr 27, 2011)
- Fixes:
- checking: Make module detection more robust by catching OSError.
- Changes:
- gui: Print detected module information in about dialog.
- gui: Close application on Ctrl-C.
- checking: Ignore redirections if the scheme is not HTTP,
- HTTPS or FTP.
- build: Ship Microsoft C++ runtime files directly instead
- of the installer package.
- gui: Make QScintilla editor optional by falling back to a
- QPlainText editor.
- Features:
- build: Support building a binary installer in 64bit Windows
- systems.
- build: The Windows installer is now signed with a local self-signed
- certificate.
- build: Added a Mac OS X binary installer.
- network: Support getting network information on Mac OS X systems.
New in LinkChecker 6.7 (Apr 12, 2011)
- Changes:
- checking: Parse PHP files recursively.
- gui: Remove reset button from option dialog.
- Features:
- gui: Add update check for newer versions of LinkChecker.
New in LinkChecker 6.6 (Mar 26, 2011)
- Fixes:
- gui: Really read system and user configuration file.
- gui: Fix "File->Save results" command. Closes: SF bug #3223290
- Changes:
- logging: Add warning tag attribute in XML loggers.
- Features:
- gui: Added a crash handler which displays exceptions in a dialog window.
New in LinkChecker 6.5 (Mar 14, 2011)
- Fixes:
- checking: Fix typo calling get_temp_file() function.Closes: SF bug #3196917
- checking: Prevent false positives when detecting the MIME typeof certain archive files.
- checking: Correct conversion between file URLs and encodedfilenames. Fixes false errors when handling files with Unicodeencodings.
- checking: Work around a Python 2.7 regression in parsing certainURLs with paths starting with a digit.
- cmdline: Fix filename completion if path starts with ~
- cgi: Prevent encoding errors printing to sys.stdout using anencoding wrapper.
- Changes:
- checking: Use HTTP GET requests to work around buggy IIS serverssending false positive status codes for HEAD requests.
- checking: Strip leading and trailing whitespace from URLs and printa warning instead of having errors.Also all embedded whitespace is stripped from URLs given at thecommandline or the GUI.Closes: SF bug #3196918
- Features:
- configuration: Support reading GNOME and KDE proxy settings.
New in LinkChecker 6.4 (Feb 21, 2011)
- Fixes:
- checking: Do not remove CGI parameters when joining URLs.
- checking: Correctly detect empty FTP paths as directories.
- checking: Reuse connections more than once and ensure they are closed before expiring.
- checking: Make sure "ignore" URL patterns are checked before "nofollow" URL patterns. Closes: SF bug #3184973
- install: Properly include all linkcheck.dns submodules in the .exe installer.
- gui: Remove old context menu action to view URL properties.
- gui: Disable viewing of parent URL source if it's a directory.
- Changes:
- gui: Use Alt-key shortcuts for menu entries.
- checking: Improved thread locking and reduce calls to time.sleep().
- cmdline: Deprecate the --priority commandline option. Now the check process runs with normal priority.
- cmdline: Deprecate the --allow-root commandline option. Root privileges are now always dropped.
- cmdline: Deprecate the --interactive commandline option. It has no effect anymore.
- Features:
- checking: Added support for Google Chrome bookmark files.
- gui: Preselect filename on save dialog when editing file:// URLs. Closes: SF bug #3176022
- gui: Add context menu entries for finding Google Chrome and Opera bookmark files.
New in LinkChecker 6.3 (Feb 7, 2011)
- Fixes:
- install: Fixed the install instructions. Closes: SF bug #3153484
- logging: Enforce encoding error policy when writing to stdout.
- checking: Prevent error message from Geoip by using the correct API function when no city database is installed.
- checking: Properly detect case where IPv6 is not supported. Closes: SF bug #3167249
- Changes:
- gui: Detect local or development versions in update check.
New in LinkChecker 6.2 (Jan 7, 2011)
- Changes:
- checking: Parse PHP files recursively.
- gui: Remove reset button from option dialog.
- Features:
- gui: Add update check for newer versions of LinkChecker.
New in LinkChecker 6.1 (Dec 27, 2010)
- Fixes:
- checking: Fix broken anchor checking. Closes: SF bug #3140765
- checking: Properly detect filenames with spaces as internal links when given as start URL.
- logging: Allow Unicode strings to be written to stdout without encoding errors on Unix systems.
- logging: Fix missing content type for cached URLs.
- gui: Reset statistics before each run.
- Changes:
- install: Compress Windows installer with upx, saving some Bytes.
- Features:
- gui: Add URL input context menu action to paste Firefox bookmark file.
- install: Added a portable package for Windows.
New in LinkChecker 6.0 (Dec 20, 2010)
- Fixes:
- checking: Fall back to HTTP GET requests when the connection has been reset since some servers tend to do this for HEAD requests. Closes: SF bug #3114622
- gui: Activate links in property dialog.
- gui: Fix sorting of columns in URL result list. Closes: SF bug #3131401
- checking: Fix wrong __init__ call to URL proxy handler. Closes: SF bug #3118254
- checking: Catch socket errors (for example socket.timeout) when closing SMTP connections.
- Changes:
- dependencies: Require and use Python 2.6.
- cmdline: Removed deprecated options --no-anchor-caching and --no-proxy-for.
- config: Remove backwards compatilibity parsing and require the new multiline configuration syntax.
- logging: Use codecs module for proper output encoding. Closes: SF bug #3114624
- checking: The maximum file size of FTP files is now limited to 10MB.
- checking: Remove warning about using Unicode domains which are more widely supported now.
- logging: The unique ID of an URL is not printed out anymore. Instead the cache URL key should be used to uniquely identify URLs.
- gui: Display URL properties in main window instead of an extra dialog.
- Features:
- logging: More statistic information about content types and URL lengths is printed out.
- gui: Store column widths in registry settings.
- gui: Add ability to save results to local files with File->Save.
- gui: Assume the entered URL starts with http:// if it has no scheme specified and is not a valid local file.
- gui: Display check statistics in main window.
- gui: There is now a clear button in the URL input field if any text has been written to it.
New in LinkChecker 5.5 (Nov 22, 2010)
- Fixes:
- checking: Do not check content of already cached URLs. Closes: SF bug #1720083
- checking: Do not parse URL CGI part recursively, avoiding maximum recursion limit errors. Closes: SF bug #3096115
- logging: Avoid error when logger fields "intro" or "outro" are configured.
- logging: Correctly quote edge labels of graph output formats and remove whitespace.
- checking: Make sure the check for external domain is done after all HTTP redirections.
- checking: Check for allowed content read before trying to parse anchors in HTML file. Closes: SF bug #3110569
- Changes:
- cmdline: Don't log a warning if URL has been redirected. Closes: SF bug #3078820
- checking: Do not print warnings for HTTP -> HTTPS and HTTPS -> HTTP redirects any more.
- logging: Changed comment format in GML output to be able to load the graph in gephi.
- gui: Remove timeout and thread options.
- checking: Do not report irc:// hyperlinks as errors, ignore them instead. Closes: SF bug #3106302
- Features:
- gui: Add command to save the parent URL source in a local file.
- gui: Show configuration files in option dialog and allow them to be edited. Closes: SF bug #3102201
- gui: Added dialog to show detailed URL properties on double click.
- gui: Store GUI options in registry settings.
New in LinkChecker 5.0.2 (Feb 18, 2009)
- Added some fixes for the Windows .exe binary GUI.
New in LinkChecker 5.0.1 (Feb 2, 2009)
- Remove unit tests from distribution to avoid antivirus software alarms with the virus filter tests.
New in LinkChecker 5.0 (Jan 26, 2009)
- A new GUI client for checking has been added and invalid handling of persistent connections has been fixed.