Screaming Frog SEO Spider Changelog

What's new in Screaming Frog SEO Spider 9.2

May 3, 2018

Speed up XML Sitemap generation exports.
Add ability to cancel XML Sitemap exports.
Add an option to start without initialising the Embedded Browser (Configuration->System->Embedded Browser). This is for users that can’t update their security settings, and don’t require JavaScript crawling.
Increase custom extraction max length to 32,000 characters.
Prevent users from setting database directory to read-only locations.
Fix switching to tree view with a search in place shows “Searching” dialog, forever, and ever.
Fix incorrect inlink count after re-spider.
Fix crash when performing a search.
Fix project saved failed for list mode crawl with hreflang data.
Fix crash when re-spidering in list mode.
Fix crash in ‘Bulk Export > All Page Source’ export.
Fix webpage cut off in screenshots.
Fix search in tree view, while crawling doesn’t keep up to date.
Fix tree view export missing address column.
Fix hreflang XML sitemaps missing namespace.
Fix needless namespaces from XML sitemaps.
Fix blocked by Cross-Origin Resource Sharing policy incorrectly reported during JavaScript rendering.
Fix crash loading in large crawl in database mode.

New in Screaming Frog SEO Spider 9.1 (May 3, 2018)

New in Screaming Frog SEO Spider 9.0 (May 3, 2018)

Configurable Database Storage (Scale)
In-App Memory Allocation
Store & View HTML & Rendered HTML
Custom HTTP Headers
XML Sitemap Improvements
Updated SERP Snippet Emulator
Post Crawl API Requests
Smaller updates and bug fixes:
While we have introduced the new database storage mode to improve scalability, regular memory storage performance has also been significantly improved. The SEO Spider uses less memory, which will enable users to crawl more URLs than previous iterations of the SEO Spider.
The ‘exclude‘ configuration now works instantly, as it is applied to URLs already waiting in the queue. Previously the exclude would only work on new URLs discovered, and rather than those already found and waiting in the queue. This meant you could apply an exclude, and it would be some time before the SEO Spider stopped crawling URLs that matched your exclude regex. Not anymore.
The ‘inlinks’ and ‘outlinks’ tabs (and exports) now include all sources of a URL, not just links (HTML anchor elements) as the source. Previously if a URL was discovered only via a canonical, hreflang, or rel next/prev attribute, the ‘inlinks’ tab would be blank and users would have to rely on the ‘crawl path report’, or various error reports to confirm the source of the crawled URL. Now these are included within ‘inlinks’ and ‘outlinks’ and the ‘type’ defines the source element (ahref, HTML canonical etc).
In line with Google’s plan to stop using the old AJAX crawling scheme (and rendering the #! URL directly), we have adjusted the default rendering to text only. You can switch between text only, old AJAX crawling scheme and JavaScript rendering.
You can now choose to ‘cancel’ either loading in a crawl, exporting data or running a search or sort.
We’ve added some rather lovely line numbers to the custom robots.txt feature.
To match Google’s rendering characteristics, we now allow blob URLs during JS rendering crawl.
We renamed the old ‘GA & GSC Not Matched’ report to the ‘Orphan Pages‘ report, so it’s a bit more obvious.
URL Rewriting now applies to list mode input.
There’s now a handy ‘strip all parameters’ option within URL Rewriting for ease.
We have introduced numerous JavaScript rendering stability improvements.
The Chromium version used for rendering is now reported in the ‘Help > Debug’ dialog.
List mode now supports .gz file uploads.
The SEO Spider now includes Java 8 update 161, with several bug fixes.
Fix: The SEO Spider would incorrectly crawl all ‘outlinks’ from JavaScript redirect pages, or pages with a meta refresh with ‘Always Follow Redirects’ ticked under the advanced configuration. Thanks to our friend Fili Weise on spotting that one!
Fix: Ahrefs integration requesting domain and subdomain data multiple times.
Fix: Ahrefs integration not requesting information for HTTP and HTTPS on (sub)domain level.
Fix: The crawl path report was missing some link types, which has now been corrected.
Fix: Incorrect robots.txt behaviour for rules ending *$.
Fix: Auth Browser cookie expiration date invalid for non UK locales.

New in Screaming Frog SEO Spider 8.0 (Jul 18, 2017)

Updated User Interface:
The SEO Spider has ruined many an SEO’s slide deck over the years, with its retro (yet, beautifully functional) user interface, and we felt it was finally deserving of an update. However, please don’t panic – it retains the core usability and data led functionality of what made the interface loved by users. And, it still stays true to its fairly retro styling.
It’s a little more modern and has splashes of colour, but now also takes advantage of new technologies in the updated framework, and works with HDPI monitors by default.
External Link Metrics Integration:
You can now connect to Majestic, Ahrefs and Moz APIs and pull in external link metrics during a crawl. This has been a much-requested feature and is extremely useful for performing a content audit, or quickly bulk checking link metrics against a list of URLs.
When you have connected to an API, link metrics will appear in real time, under the new ‘Link Metrics’ tab and in the ‘Internal’ tab, so they can be combined with all the usual crawl and analytical data.
We’ve now also introduced an ‘API’ tab into the right-hand window pane, to allow users to keep an eye on progress.
You will be required to have an account with the tool providers to pull in data using your own API credentials. Each of the tools offer different functionality and metrics from their APIs, and you’re able to customise what data you want to pull in.
The SEO Spider will calculate the API usage of pulling data based upon your API plan (where possible via the API), and can even combine link counts for HTTP and HTTPS versions of URLs for Majestic and Ahrefs to help you save time.
Moz is the only tool with a free (slower, and limited API), as well as a paid plan, which you can select and allows requests to be super fast.
So you can pull in Moz metrics such as Page Authority, Domain Authority, or Spam Score and lots more.
Custom Configuration Profiles:
You can already adjust and save your configuration to be the default, however, we know users want to be able to switch between multiple set-ups quickly, depending on the crawl type, client or objective. Hence, you are now able to create multiple custom configuration profiles and seamlessly switch between them.
There isn’t a limit to the number of profiles, you can create as many as you like. The custom configuration profiles are saved within your user directory, so you can also copy and share your favourite profiles with colleagues for them to load and use.
JavaScript Redirects:
The SEO Spider will now discover and report on JavaScript redirects. The SEO Spider was the first commercial crawler with JavaScript rendering, and this functionality has been advanced further to help identify client-side redirects, which is another first.
While not strictly speaking a response code, they can be viewed under the ‘Response Codes’ tab and ‘Redirection (JavaScript)’ filter. Meta Refreshes are now also included within this area and treated in a similar way to regular server-side and client-side redirect reporting.
HSTS Support:
HTTP Strict Transport Security (HSTS) is a server directive that forces all connections over HTTPS. If any ‘insecure’ links are discovered in a crawl with a Strict-Transport-Security header set, the SEO Spider will show a 307 response with a status message of ‘HSTS Policy’.
The SEO Spider will request the HTTPS version as instructed, but highlight this with a 307 response (inline with browsers, such as Chrome), to help identify when HSTS and insecure links are used (rather than just requesting the secure version, and not highlighting that insecure links actually exist).
The search engines and browsers will only request the HTTPS version, so obviously the 307 response HSTS policy should not be considered as a real temporary redirect and ‘a redirect to fix’. John Mueller discussed this in a Google+ post last year.
Hreflang Auditing In XML Sitemaps:
The SEO Spider already extracts, crawls and reports on hreflang attributes delivered by HTML link element and HTTP Header, and will now for XML Sitemaps in list mode as well.
There’s now an extra column under the ‘hreflang’ tab, for ‘Sitemap hreflang’ which allows users to audit for common issues, such as missing confirmation links, incorrect language codes, not using the canonical, and much more.
Fetch & Render Screenshots Exporting:
You can view the rendered page the SEO Spider crawled in the ‘Rendered Page’ tab at the bottom of the user interface when crawling in JavaScript rendering mode.
Version 8.0 also includes a number of smaller updates, which include:
The ‘Internal’ tab now has new columns for ‘Unique Inlinks’, ‘Unique Outlinks’ and ‘Unique External Outlinks’ numbers. The unique number of ‘inlinks’ was previously only available within the ‘Site Structure’ tab and displays a percentage of the overall number of pages linking to a page.
A new ‘Noindex Confirmation Links’ filter is available within the ‘Hreflang’ tab and corresponding export in the ‘Reports > Hreflang > Noindex Confirmation Links’ menu.
An ‘Occurences’ column has been added to the Hreflang tab to count the number on each page and identify potential problems.
A new ‘Encoded URL’ column has been added to ‘Internal’ and ‘Response Codes’ tab.
The ‘Level’ column has been renamed to ‘Crawl Depth’ to avoid confusion & support queries.
There’s a new ‘External Links’ export under the ‘Bulk Export’ top level menu, which provides all source pages with external links.
The SERP Snippet tool has been updated to refine pixel widths within the SERPs.
Java is now bundled with the SEO Spider, so it doesn’t have to be downloaded separately anymore.
Added a new preset user-agent for SeznamBot (for a search engine in the Czech Republic). Thanks to Jaroslav for the suggestion.
The insecure content report now includes hreflang and rel=“next” and rel=“prev” links.
You can highlight multiple rows, right click and open them all in a browser now.
List mode now supports Sitemap Index files (alongside usual sitemap .xml files).
We also fixed up some bugs:
Fixed a couple of crashes in JavaScript rendering.
Fixed parsing of query strings in the canonical HTTP header.
Fixed a bug with missing confirmation links of external URLs.
Fixed a few crashes in Xpath and in GA integration.
Fixed filtering out custom headers in rendering requests, causing some rendering to fail.

New in Screaming Frog SEO Spider 7.0 (Dec 27, 2016)

‘Fetch & Render’ (Rendered Screen Shots):
You can now view the rendered page the SEO Spider crawled in the new ‘Rendered Page’ tab which dynamically appears at the bottom of the user interface when crawling in JavaScript rendering mode. This populates the lower window pane when selecting URLs in the top window.
This feature is enabled by default when using the new JavaScript rendering functionality, and allows you to set the AJAX timeout and viewport size to view and test various scenarios. With Google’s much discussed mobile first index, this allows you to set the user-agent and viewport as Googlebot Smartphone and see exactly how every page renders on mobile.
Viewing the rendered page is vital when analysing what a modern search bot is able to see and is particularly useful when performing a review in staging, where you can’t rely on Google’s own Fetch & Render in Search Console.
Blocked Resources:
The SEO Spider now reports on blocked resources, which can be seen individually for each page within the ‘Rendered Page’ tab, adjacent to the rendered screen shots.
The blocked resources can also be seen under ‘Response Codes > Blocked Resource’ tab and filter. The pages this impacts and the individual blocked resources can also be exported in bulk via the ‘Bulk Export > Response Codes > Blocked Resource Inlinks’ report.
Custom robots.txt:
You can download, edit and test a site’s robots.txt using the new custom robots.txt feature under ‘Configuration > robots.txt > Custom’. The new feature allows you to add multiple robots.txt at subdomain level, test directives in the SEO Spider and view URLs which are blocked or allowed.
During a crawl you can filter blocked URLs based upon the custom robots.txt (‘Response Codes > Blocked by robots.txt’) and see the matches robots.txt directive line.
Custom robots.txt is a useful alternative if you’re uncomfortable using the regex exclude feature, or if you’d just prefer to use robots.txt directives to control a crawl.
The custom robots.txt uses the selected user-agent in the configuration, and works well with the new fetch and render feature, where you can test how a web page might render with blocked resources.
We considered including a check for a double UTF-8 byte order mark (BOM), which can be a problem for Google. According to the spec, it invalidates the line – however this will generally only ever be due to user error. We don’t have any problem parsing it and believe Google should really update their behaviour to make up for potential mistakes.
Please note – The changes you make to the robots.txt within the SEO Spider, do not impact your live robots.txt uploaded to your server.
hreflang Attributes:
First of all, apologies this one has been a long time coming. The SEO Spider now extracts, crawls and reports on hreflang attributes delivered by HTML link element and HTTP Header. They are also extracted from Sitemaps when crawled in list mode.
While users have historically used custom extraction to collect hreflang, by default these can now be viewed under the ‘hreflang’ tab, with filters for common issues.
While hreflang is a fairly simple concept, there’s plenty of issues that can be encountered in the implementation. We believe this is the most comprehensive auditing for hreflang currently available anywhere and includes checks for missing confirmation links, inconsistent languages, incorrect language/regional codes, non-canonical confirmation links, multiple entries, missing self-reference, not using the canonical, missing the x-default, and missing hreflang completely.
Additionally, there are four new hreflang reports available to allow data to be exported in bulk (under the ‘reports’ top level menu).
This feature can be fairly resource-intensive on large sites, so extraction and crawling are entirely configurable under ‘Configuration > Spider’.
rel=”next” and rel=”prev” Errors:
This report highlights errors and issues with rel=”next” and rel=”prev” attributes, which are of course used to indicate paginated content.
The report will show any rel=”next” and rel=”prev” URLs which have a no response, blocked by robots.txt, 3XX redirect, 4XX, or 5XX error (anything other than a 200 ‘OK’ response).
This report also provides data on any URLs which are discovered only via a rel=”next” and rel=”prev” attribute and are not linked-to from the site (in the ‘unlinked’ column when ‘true’).
Maintain List Order Export:
One of our most requested features has been the ability to maintain the order of URLs when uploaded in list mode, so users can then export the data in the same order and easily match it up against the original data.
Unfortunately it’s not as simple as keeping the order within the interface, as the SEO Spider performs some normalisation under the covers and removes duplicates, which meant it made more sense to produce a way to export data in the original order.
Hence, we have introduced a new ‘export’ button which appears next to the ‘upload’ and ‘start’ buttons at the top of the user interface (when in list mode) which produces an export with data in the same order as it was uploaded.
Maintain list order export
The data in the export will be in the same order and include all of the exact URLs in the original upload, including duplicates or any fix-ups performed.
Web Forms Authentication (Crawl Behind A Login):
The SEO Spider has supported basic and digest standards-based authentication for some time, which enables users to crawl staging and development sites. However, there are other web forms and areas which require you to log in with cookies which have been inaccessible, until now.
We have introduced a new ‘authentication’ configuration (under ‘Configuration > Authentication), which allows users to log in to any web form within the SEO Spider Chromium browser, and then crawl it.
This means virtually all password-protected areas, intranets and anything which requires a web form login can now be crawled.
Please note – This feature is extremely powerful and often areas behind logins will contain links to actions which a user doesn’t want to press (for example ‘delete’). The SEO Spider will obviously crawl every link, so please use responsibly, and not on your precious fantasy football team. With great power comes great responsibility(!).
Also includes some other smaller updates and bug fixes in version 7.0 of the Screaming Frog SEO Spider, which include the following:
All images now appear under the ‘Images’ tab. Previously the SEO Spider would only show ‘internal’ images from the same subdomain under the ‘images’ tab. All other images would appear under the ‘external’ tab. We’ve changed this behaviour as it was outdated, so now all images appear under ‘images’ regardless.
The URL rewriting ‘remove parameters’ input is now a blank field (similar to ‘include‘ and ‘exclude‘ configurations), which allows users to bulk upload parameters one per line, rather than manually inputting and entering each separate parameter.
The SEO Spider will now find the page title element anywhere in the HTML (not just the HEAD), like Googlebot. Not that we recommend having it anywhere else!
Introduced tri-state row sorting, allowing users to clear a sort and revert back to crawl order.
The maximum XML sitemap size has been increased to 50MB from 10MB, in line with Sitemaps.org updated protocol.
Fixed a crash in custom extraction!
Fixed a crash when using the date range Google Analytics configuration.
Fixed exports ignoring column order and visibility.
Fixed cookies set via JavaScript not working in rendered mode.
Fixed issue where SERP title and description widths were different for master view and SERP Snippet table on Windows for Thai language.

New in Screaming Frog SEO Spider 6.0.0 (Jul 21, 2016)

Rendered Crawling (JavaScript):
There were two things we set out to do at the start of the year. Firstly, understand exactly what the search engines are able to crawl and index. This is why we created the Screaming Frog Log File Analyser, as a crawler will only ever be a simulation of search bot behaviour.
Secondly, we wanted to crawl rendered pages and read the DOM. It’s been known for a long time that Googlebot acts more like a modern day browser, rendering content, crawling and indexing JavaScript and dynamically generated content rather well. The SEO Spider is now able to render and crawl web pages in a similar way.
You can choose whether to crawl the static HTML, obey the old AJAX crawling scheme or fully render web pages, meaning executing and crawling of JavaScript and dynamic content.
Google deprecated their old AJAX crawling scheme and we have seen JavaScript frameworks such as AngularJS (with links or utilising the HTML5 History API) crawled, indexed and ranking like a typical static HTML site. I highly recommend reading Adam Audette’s Googlebot JavaScript testing from last year if you’re not already familiar.
After much research and testing, we integrated the Chromium project library for our rendering engine to emulate Google as closely as possible. Some of you may remember the excellent ‘Googlebot is Chrome‘ post from Mike King back in 2011 which discusses Googlebot essentially being a headless browser.
Configurable Columns & Ordering:
You’re now able to configure which columns are displayed in each tab of the SEO Spider (by clicking the ‘+’ in the top window pane).
You can also drag and drop the columns into any order and this will be remembered (even after a restart).
To revert back to the default columns and ordering, simply right click on the ‘+’ symbol and click ‘Reset Columns’ or click on ‘Configuration > User Interface > Reset Columns For All Tables’.
XML Sitemap & Sitemap Index Crawling:
The SEO Spider already allows crawling of XML sitemaps in list mode, by uploading the .xml file (number 8 in the ‘10 features in the SEO Spider you should really know‘ post) which was always a little clunky to have to save it if it was already live (but handy when it wasn’t uploaded!).
So we’ve now introduced the ability to enter a sitemap URL to crawl it (‘List Mode > Download Sitemap’).
Previously if a site had multiple sitemaps, you’d have to upload and crawl them separately as well.
Now if you have a sitemap index file to manage multiple sitemaps, you can enter the sitemap index file URL and the SEO Spider will download all sitemaps and subsequent URLs within them!
Improved Custom Extraction – Multiple Values & Functions:
We listened to feedback that users often wanted to extract multiple values, without having to use multiple extractors. For example, previously to collect 10 values, you’d need to use 10 extractors and index selectors ([1],[2] etc) with Xpath.
We’ve changed this behavior, so by default a single extractor will collect all values found and report them via a single extractor for XPath, CSS Path and Regex. If you have 20 hreflang values, you can use a single extractor to collect them all and the SEO Spider will dynamically add additional columns for however many are required. You’ll still have 9 extractors left to play with as well.
rel=“next” and rel=“prev” Elements Now Crawled:
The SEO Spider can now crawl rel=“next” and rel=“prev” elements whereas previously the tool merely reported them. Now if a URL has not already been discovered, the URL will be added to the queue and the URLs will be crawled if the configuration is enabled (‘Configuration > Spider > Basic Tab > Crawl Next/Prev’).
rel=“next” and rel=“prev” elements are not counted as ‘Inlinks’ (in the lower window tab) as they are not links in a traditional sense. Hence, if a URL does not have any ‘Inlinks’ in the crawl, it might well be due to discovery from a rel=“next” and rel=“prev” or a canonical. We recommend using the ‘Crawl Path Report‘ to show how the page was discovered, which will show the full path.
There’s also a new ‘respect next/prev’ configuration option (under ‘Configuration > Spider > Advanced tab’) which will hide any URLs with a ‘prev’ element, so they are not considered as duplicates of the first page in the series.
Updated SERP Snippet Emulator:
Earlier this year in May Google increased the column width of the organic SERPs from 512px to 600px on desktop, which means titles and description snippets are longer. Google displays and truncates SERP snippets based on characters’ pixel width rather than number of characters, which can make it challenging to optimise.
Our previous research showed Google used to truncate page titles at around 482px on desktop. With the change, we have updated our research and logic in the SERP snippet emulator to match Google’s new truncation point before an ellipses (…), which for page titles on desktop is around 570px.
Our research shows that while the space for descriptions has also increased they are still being truncated far earlier at a similar point to the older 512px width SERP. The SERP snippet emulator will only bold keywords within the snippet description, not in the title, in the same way as the Google SERPs.
Please note – You may occasionally see our SERP snippet emulator be a word out in either direction compared to what you see in the Google SERP. There will always be some pixel differences, which mean that the pixel boundary might not be in the exact same spot that Google calculate 100% of the time.
We are still seeing Google play to different rules at times as well, where some snippets have a longer pixel cut off point, particularly for descriptions! The SERP snippet emulator is therefore not always exact, but a good rule of thumb.
Other Updates:
A new ‘Text Ratio’ column has been introduced in the internal tab which calculates the text to HTML ratio.
Google updated their Search Analytics API, so the SEO Spider can now retrieve more than 5k rows of data from Search Console.
There’s a new ‘search query filter’ for Search Console, which allows users to include or exclude keywords (under ‘Configuration > API Access > Google Search Console > Dimension tab’). This should be useful for excluding brand queries for example.
There’s a new configuration to extract images from the IMG srcset attribute under ‘Configuration > Advanced’.
The new Googlebot smartphone user-agent has been included.
Updated our support for relative base tags.
Removed the blank line at the start of Excel exports.
Fixed a bug with word count which could make it less accurate.
Fixed a bug with GSC CTR numbers.

New in Screaming Frog SEO Spider 5.0.0 (Sep 8, 2015)

Google Search Analytics Integration:
You can now connect to the Google Search Analytics API and pull in impression, click, CTR and average position data from your Search Console profile. Alongside Google Analytics integration, this should be valuable for Panda and content audits respectively.
View & Audit URLs Blocked By Robots.txt:
You can now view URLs disallowed by the robots.txt protocol during a crawl.
Disallowed URLs will appear with a ‘status’ as ‘Blocked by Robots.txt’ and there’s a new ‘Blocked by Robots.txt’ filter under the ‘Response Codes’ tab, where these can be viewed efficiently.
The ‘Blocked by Robots.txt’ filter also displays a ‘Matched Robots.txt Line’ column, which provides the line number and disallow path of the robots.txt entry that’s excluding each URL. This should make auditing robots.txt files simple!
GA & GSC Not Matched Report:
The ‘GA Not Matched’ report has been replaced with the new ‘GA & GSC Not Matched Report’ which now provides consolidated information on URLs discovered via the Google Search Analytics API, as well as the Google Analytics API, but were not found in the crawl.
This report can be found under ‘reports’ in the top level menu and will only populate when you have connected to an API and the crawl has finished.
There’s a new ‘source’ column next to each URL, which details the API(s) it was discovered (sometimes this can be both GA and GSC), but not found to match any URLs found within the crawl.
Configurable Accept-Language Header:
Google introduced local-aware crawl configurations earlier this year for pages believed to adapt content served, based on the request’s language and perceived location.
This essentially means Googlebot can crawl from different IP addresses around the world and with an Accept-Language HTTP header in the request. Hence, like Googlebot, there are scenarios where you may wish to supply this header to crawl locale-adaptive content, with various language and region pairs. You can already use the proxy configuration to change your IP as well.
You can find the new ‘Accept-Language’ configuration under ‘Configuration > HTTP Header > Accept-Language’.
We have some common presets covered, but the combinations are huge, so there is a custom option available which you can just set to any value required.
Smaller Updates & Fixes:
The Analytics and Search Console tabs have been updated to allow URLs blocked by robots.txt to appear, which we believe to be HTML, based upon file type.
The maximum number of Google Analytics metrics you can collect from the API has been increased from 20 to 30. Google restrict the API to 10 metrics for each query, so if you select more than 10 metrics (or multiple dimensions), then we will make more queries (and it may take a little longer to receive the data).
With the introduction of the new ‘Accept-Language’ configuration, the ‘User-Agent’ configuration is now under ‘Configuration > HTTP Header > User-Agent’.
We added the ‘MJ12Bot’ to our list of preconfigured user-agents after a chat with our friends at Majestic.
Fixed a crash in XPath custom extraction.
Fixed a crash on start up with Windows Look & Feel and JRE 8 update 60.
Fixed a bug with character encoding.
Fixed an issue with Excel file exports, which write numbers with decimal places as strings, rather than numbers.
Fixed a bug with Google Analytics integration where the use of hostname in some queries was causing ‘Selected dimensions and metrics cannot be queried together errors’.

New in Screaming Frog SEO Spider 4.0 (Jul 7, 2015)

New in Screaming Frog SEO Spider 3.0.0 (Feb 11, 2015)

This update includes a new way of analyzing a crawl, additional sitemap features and insecure content reporting, which will help with all those HTTPS migrations! As always thanks to everyone for their continued support, feedback and suggestions for the tool.
NEWS FEATURES:
Tree View:
You can now switch from the usual ‘list view’ of a crawl, to a more traditional directory ‘tree view’ format, while still maintaining the granular detail of each URL crawled you see in the standard list view.
This additional view will hopefully help provide an alternative perspective when analyzing a website’s architecture.
The SEO Spider doesn’t crawl this way natively, so switching to ‘tree view’ from ‘list view’ will take a little time to build, & you may see a progress bar on larger crawls for instance. This has been requested as a feature for quite sometime, so thanks to all for their feedback.
Insecure Content Report:
We have introduced a ‘protocol’ tab, to allow you to easily filter and analyze by secure and non secure URLs at a glance (as well as other protocols potentially in the future). As an extension to this, there’s also a new ‘insecure content’ report which will show any HTTPS URLs which have insecure elements on them. It’s very easy to miss some insecure content, which often only get picked up on go live in a browser.
So if you’re working on HTTP to HTTPS migrations, this should be particularly useful. This report will identify any secure pages, which link out to insecure content, such as internal HTTP links, images, JS, CSS, external CDN’s, social profiles etc.
Image Sitemaps & Updated XML Sitemap Features:
You can now add images to your XML sitemap or create an image sitemap file.
You now have the ability to include images which appear under the ‘internal’ tab from a normal crawl, or images which sit on a CDN (and appear under the ‘external’ tab).
Typically you don’t want to include images like logos in an image sitemap, so you can also choose to only include images with a certain number of source attribute references. To help with this, we have introduced a new column in the ‘images’ tab which shows how many times an image is referenced (IMG Inlinks).
This is a nice easy way to exclude logos or social media icons, which are often linked to sitewide for example. You can also right-click and ‘remove’ any images or URLs you don’t want to include obviously too! The ‘IMG Inlinks’ is also very useful when viewing images with missing alt text, as you may wish to ignore social profiles without them etc.
There’s now also plenty more options when generating an XML sitemap. You can choose whether to include ‘noindex’, canonicalised, paginated or PDFs in the sitemap for example. Plus you now also have greater control over the lastmod, priority and change frequency.
Paste URLs In List Mode:
To help save time, you can now paste URLs directly into the SEO Spider in ‘list’ mode, or enter URLs manually (into a window) and upload a file like normal.
Hopefully these additional options will be useful and help save time, particularly when you don’t want to save a file first to upload.
Improved Bulk Exporting:
We plan on making the exporting function entirely customizable, but for now bulk exporting has been improved so you can export all inlinks (or ‘source’ links) to the custom filter and directives, such as ‘noindex’ or ‘canonicalised’ pages if you wish to analyze crawl efficiency for example.
Windows Look & Feel:
There’s a new ‘user interface’ configuration for Windows only, that allows users to enable ‘Windows look and feel’. This will then adhere to the scaling settings a user has, which can be useful for some newer systems with very high resolutions.
UPDATES:
You can now view the ‘Last-Modified’ header response within a column in the ‘Internal’ tab. This can be helpful for tracking down new, old, or pages within a certain date range. ‘Response time’ of URLs has also been moved into the internal tab as well (which used to just be in the ‘Response Codes’ tab, thanks to RaphSEO for that one).
The parser has been updated so it’s less strict about the validity of HTML mark-up. For example, in the past if you had invalid HTML mark-up in the HEAD, page titles, meta descriptions or word count may not always be collected. Now the SEO Spider will simply ignore it and collect the content of elements regardless.
There’s now a ‘mobile-friendly’ entry in the description prefix dropdown menu of the SERP panel. From our testing, these are not used within the description truncation calculations by Google (so you have the same amount of space for characters as pre there introduction).
We now read the contents of robots.txt files only if the response code is 200 OK. Previously we read the contents irrespective of the response code.
Loading of large crawl files has been optimised, so this should be much quicker.
We now remove ‘tabs’ from links, just like Google do (again, as per internal testing). So if a link on a page contains the tab character, it will be removed.
We have formatted numbers displayed in filter total and progress at the bottom. This is useful when crawling at scale! For example, you will see 500,000 rather than 500000.
The number of rows in the filter drop down have been increased, so users don’t have to scroll.
The default response timeout has been increased from 10 secs to 20 secs, as there appears to be plenty of slow responding websites still out there unfortunately!
The lower window pane cells are now individually selectable, like the main window pane.
The ‘search’ button next to the search field has been removed, as it was fairly redundant as you can just press ‘Enter’ to search.
There’s been a few updates and improvements to the GUI that you may notice.
FIXES:
Fixed a bug with ‘Depth Stats’, where the percentage didn’t always add up to 100%.
Fixed a bug when crawling from the domain root (without www.) and the ‘crawl all subdomains’ configuration ticked, which caused all external domains to be treated as internal.
Fixed a bug with inconsistent URL encoding. The UI now always shows the non URL encoded version of a URL. If a URL is linked to both encoded and unencoded, we’ll now only show the URL once.
Fixed a crash in Configuration->URL Rewriting->Regex Replace, as reported by a couple of users.
Fixed a crash for a bound checking issue, as reported by Ahmed Khalifa.
Fixed a bug where unchecking the ‘Check External’ tickbox still checks external links, that are not HTML anchors (so still checks images, CSS etc).
Fixed a bug where the leading international character was stripped out from SERP title preview.
Fixed a bug when crawling links which contained a new line. Google removes and ignores them, so we do now as well.
Fixed a bug where AJAX URLs are UTF-16 encoded using a BOM. We now derive encoding from a BOM, if it’s present.

New in Screaming Frog SEO Spider 2.55 (Jul 29, 2014)

New in Screaming Frog SEO Spider 2.50 (Jul 1, 2014)

New in Screaming Frog SEO Spider 2.40 (May 16, 2014)

SERP Snippets Now Editable:
First of all, the SERP snippet tool we released in our previous version has been updated extensively to include a variety of new features. The tool now allows you to preview SERP snippets by device type (whether it’s desktop, tablet or mobile) which all have their own respective pixel limits for snippets. You can also bold keywords, add rich snippets or description prefixes like a date to see how the page may appear in Google.
he largest update is that the tool now allows you to edit page titles and meta descriptions directly in the SEO Spider as well. This subsequently updates the SERP snippet preview and the table calculations letting you know the number of pixels you have before a word is truncated. It also updates the text in the SEO Spider itself and will be remembered automatically, unless you click the ‘reset title and description’ button. You can make as many edits to page titles and descriptions and they will all be remembered.
This means you can also export the changes you have made in the SEO Spider and send them over to your developer or client to update in their CMS. This feature means you don’t have to try and guesstimate pixel widths in Excel (or elsewhere!) and should provide greater control over your search snippets. You can quickly filter for page titles or descriptions which are over pixel width limits, view the truncations and SERP snippets in the tool, make any necessary edits and then export them. (Please remember, just because a word is truncated it does not mean it’s not counted algorithmically by Google).
SERP Mode For Uploading Page Titles & Descriptions:
You can now switch to ‘SERP mode’ and upload page titles and meta descriptions directly into the SEO Spider to calculate pixel widths. There is no crawling involved in this mode, so they do not need to be live on a website.
This means you can export page titles and descriptions from the SEO Spider, make bulk edits in Excel (if that’s your preference, rather than in the tool itself) and then upload them back into the tool to understand how they may appear in Google’s SERPs.
Under ‘reports’, we have a new ‘SERP Summary’ report which is in the format required to re-upload page titles and descriptions. We simply require three headers for ‘URL’, ‘Title’ and ‘Description’.
Crawl Overview Right Hand Window Pane:
We received a lot of positive response to our crawl overview report when it was released last year. However, we felt that it was a little hidden away, so we have introduced a new right hand window which includes the crawl overview report as default. This overview pane updates alongside the crawl, which means you can see which tabs and filters are populated at a glance during the crawl and their respective percentages.
his means you don’t need to click on the tabs and filters to uncover issues, you can just browse and click on these directly as they arise. The ‘Site structure’ tab provides more detail on the depth and most linked to pages without needing to export the ‘crawl overview’ report or sort the data. The ‘response times’ tab provides a quick overview of response time from the SEO Spider requests. This new window pane will be updated further in the next few weeks.
You can choose to hide this window, if you prefer the older format.
Ajax Crawling #!:
Some of you may remember an older version of the SEO Spider which had an iteration of Ajax crawling, which was removed in a later version. We have redeveloped this feature so the SEO Spider can now crawl Ajax as per Google’s Ajax crawling scheme also sometimes (annoyingly) referred to as hashbang URLs (#!).
There is also an Ajax tab in the UI, which shows both the ugly and pretty URLs, with filters for hash fragments. Some pages may not use hash fragments (such as a homepage), so the ‘fragment’ meta tag can be used to recognise an Ajax page. In the same way as Google, the SEO Spider will then fetch the ugly version of the URL.
Canonical Errors Report:
Under the ‘reports‘ menu, we have introduced a ‘canonical errors’ report which includes any canonicals which have no response, are a 3XX redirect or a 4XX or 5XX error.
This report also provides data on any URLs which are discovered only via a canonical and are not linked to from the site (so not html anchors to the URL). This report will hopefully help save time, so canonicals don’t have to be audited separately via list mode.
Other Smaller Updates:
We have also made a large number of other updates, these include the following –
A ‘crawl canonicals‘ configuration option (which is ticked by default) has been included, so the user can decide whether they want to actually crawl canonicals or just reference them.
Added new Googlebot for Smartphones user-agent and retired the Googlebot-Mobile for Smartphones UA. Thanks to Glenn Gabe for the reminder.
The ‘Advanced Export’ has been renamed to ‘Bulk Export‘. ‘XML Sitemap‘ has been moved under a ‘Sitemaps’ specific navigation item.
Added a new ‘No Canonical’ filter to the directives tab which helps view any html pages or PDFs without a canonical.
Improved performance of .xlsx file writing to be close to .csv and .xls
‘Meta data’ has been renamed to ‘Meta Robots’.
The SEO Spider now always supplies the Accept-Encoding header to work around several sites that are 404 or 301’ing based on it not being there (even though it’s not actually a requirement…).
Allow user to cancel when uploading in list mode.
Provide feedback in stages when reading a file in list mode.
Max out Excel lines per sheet limits for each format (65,536 for xls, and 1,048,576 for xlsx).
The lower window ‘URL info’ tab now contains much more data collected about the URL.
‘All links’ in the ‘Advanced Export’ has been renamed to ‘All In Links’ to provide further clarity.
The UI has been lightened and there’s a little more padding now.
Fixed a bug where empty alt tags were not being picked up as ‘missing’. Thanks to the quite brilliant Ian Macfarlane for reporting it.
Fixed a bug upon some URLs erroring upon upload in list mode. Thanks again to Fili for that one.
Fixed a bug in the custom filter export due to the file name including a colon as default. Oops!
Fixed a bug with images disappearing in the lower window pane, when clicking through URLs.

New in Screaming Frog SEO Spider 2.30 (Mar 21, 2014)

New in Screaming Frog SEO Spider 2.22 (Dec 4, 2013)

New in Screaming Frog SEO Spider 2.21 (Dec 4, 2013)

New in Screaming Frog SEO Spider 2.20 (Jul 15, 2013)

Redirect Chains Report:
There is a new ‘reports’ menu in the top level navigation of the UI, which contains the redirect chains report. This report essentially maps out chains of redirects, the number of hops along the way and will identify the source, as well as if there is a loop. This is really useful as the latency for users can be longer with a chain, a little extra PageRank can dissipate in each hop and a large chain of 301s can be seen as a 404 by Google.
Another very cool part of the redirect chain report is how it works for site migrations alongside the new ‘Always follow redirects‘ option (in the ‘advanced tab’ of the spider configuration). Now when you tick this box, the SEO spider will continue to crawl redirects even in list mode and ignore crawl depth.
Previously the SEO Spider would only crawl the first redirect and report the redirect URL target under the ‘Response Codes’ tab. However, as list mode is essentially working at a crawl depth of ’0′, you wouldn’t see the status of the redirect target which, particularly on migrations is required when a large number of URLs are changed. Potentially a URL could 301, then 301 again and then 404. To find this previously, you had to upload each set of target URLs each time to analyze responses and the destination. Now, the SEO Spider will continue to crawl, until it has found the final target URL. For example, you can view redirect chains in a nice little report in list mode now.
Crawl Path Report:
Have you ever wanted to know how a URL was discovered? Obviously you can view ‘in links’ of a URL, but when there is a particularly deep page, or perhaps an infinite URLs issue caused by incorrect relative linking, it can be a pain to track down the originating source URL (Tip! – To find the source manually, sort URLs alphabetically and find the shortest URL in the sequence!). However, now on right click of a URL (under ‘export’), you can see how the spider discovered a URL and what crawl path it took from start to finish.
Respect noindex & Canonical:
You now have the option to ‘respect’ noindex and canonical directives. If you tick this box under the advanced tab of the spider configuration, the SEO Spider will respect them. This means ‘noindex’ URLs will obviously still be crawled, but they will not appear in the interface (in any tab) and URLs which have been ‘canonicalised’ will also not appear either. This is useful when analysing duplicate page titles, or descriptions which have been fixed by using one of these directives above.
rel=“next” and rel=“prev”:
The SEO Spider now collects these html link elements designed to indicate the relationship between URLs in a paginated series. rel=“next” and rel=“prev” can now be seen under the ‘directives’ tab.
Custom Filters Now Regex:
Similar to our include, exclude and internal search function, the custom filters now support regex, rather than just query string. Oh and we have increased the number of filters from five to ten and included ‘occurrences’. So if you’re searching for an analytics UA ID or a particular phrase, the number of times it appears within the source code of a URL will be reported as well.
Crawl Overview Report:
Under the new ‘reports’ menu discussed above, we have included a little ‘crawl overview report’. This does exactly what it says on the tin and provides an overview of the crawl, including total number of URLs encountered in the crawl, the total actually crawled, the content types, response codes etc with proportions. This will hopefully provide another quick easy way to analyze overall site health at a glance.
We have also changed the max page title length to 65 characters (although it seems now to be based on pixel image width), added a few more preset mobile user agents, fixed some bugs (such as large sitemaps being created over the 10Mb limit) and made other smaller tweaks along the way.

New in Screaming Frog SEO Spider 2.10 (Jan 30, 2013)

New in Screaming Frog SEO Spider 2.00 (Jun 27, 2012)

Word count – The SEO spider now counts the number of words on a given URL between the body tags. This is useful for finding low content pages, you can read our word count definition here.
URL rewriting – The SEO spider now allows you to rewrite URLs. This is particularly useful for sites with session IDs or excess parameters, you can now simply remove them from the URLs using this feature. You can read about URL rewriting in our user guide.
Auto check for updates – You don’t have to manually check for updates anymore, we let you know when one is available. (You can also disable this feature!)
Remove URLs – We allow you to delete URLs completely from the SEO spider (upon the right click). So if you only wish to export certain URLs, or create a sitemap with specific URLs, you can do it in the interface (rather than exporting to Excel).
Advanced exports – We have renamed the ‘export’ option in the top level menu, to ‘Advanced export’ to differentiate it from the usual ‘export’ option. This area allows you to export in bulk, rather than just from the window in your current view. We have included additional exports under this section as well, including exporting of all alt text and anchor text. You can read more about the advanced export feature in our user guide.
Crawling outside of sub folders / domains – As default the SEO spider has always crawled from sub domain or sub directory forwards. This is really useful for most sites, but there are some configurations where this can be a pain. So, we have provided a couple of extra options to crawl outside of start sub folders or sub domains for greater control of crawl. So you can now crawl from anywhere you’d like on the site using this feature and we will crawl all URLs for example. Both of these new options can be found under the ‘include‘ option in configuration.
Amended the definition of ‘internal’ and ‘external’ – Historically links from the domain you are crawling can be included under the ‘external’ tab as well as the ‘internal’ tab. If you crawled from a sub folder for example, anything outside of that sub folder would be treated as ‘external’, including links from the domain you are crawling. This was by design for a number of reasons, but we understand that it has at times been a cause of some confusion. Hence, we have changed our crawling. The ‘external’ tab, will now only show links pointing to other domains (or subdomains).
Renamed ‘Meta & Canonical’ – We amended this tab name to ‘directives’, as it makes more sense. This gives us more room to include additional directives under this tab such as rel=alternate and rel=prev/next etc.
Fixed Keep Alive Headers Issue – There was a bug in the Mac version of the software with keep alive headers, this has been fixed.

Screaming Frog SEO Spider Changelog

What's new in Screaming Frog SEO Spider 9.2

New in Screaming Frog SEO Spider 9.1 (May 3, 2018)

New in Screaming Frog SEO Spider 9.0 (May 3, 2018)

New in Screaming Frog SEO Spider 8.0 (Jul 18, 2017)

New in Screaming Frog SEO Spider 7.0 (Dec 27, 2016)

New in Screaming Frog SEO Spider 6.0.0 (Jul 21, 2016)

New in Screaming Frog SEO Spider 5.0.0 (Sep 8, 2015)

New in Screaming Frog SEO Spider 4.0 (Jul 7, 2015)

New in Screaming Frog SEO Spider 3.0.0 (Feb 11, 2015)

New in Screaming Frog SEO Spider 2.55 (Jul 29, 2014)

New in Screaming Frog SEO Spider 2.50 (Jul 1, 2014)

New in Screaming Frog SEO Spider 2.40 (May 16, 2014)

New in Screaming Frog SEO Spider 2.30 (Mar 21, 2014)

New in Screaming Frog SEO Spider 2.22 (Dec 4, 2013)

New in Screaming Frog SEO Spider 2.21 (Dec 4, 2013)

New in Screaming Frog SEO Spider 2.20 (Jul 15, 2013)

New in Screaming Frog SEO Spider 2.10 (Jan 30, 2013)

New in Screaming Frog SEO Spider 2.00 (Jun 27, 2012)

New in Screaming Frog SEO Spider 1.9 (Feb 21, 2012)