WebLech is a free and open source, fully featured web site download/mirror tool created in Java, that supports many features required to download websites and emulate standard web-browser behavior as much as possible.
WebLech is multithreaded and comes with a GUI console.
Here are some key features of "WebLech URL Spider":
· Open Source MIT Licence means it's totally free and you can do what you want with it
· Pure Java code means you can run it on any Java-enabled computer
· Multi-threaded operation for downloading lots of files at once
· Supports basic HTTP authentication for accessing password-protected sites
· TTP referrer support maintains link information between pages (needed to Spider some websites)
Lots of configuration options:
· Depth-first or breadth-first traversal of the site
· Candidate URL filtering, so you can stick to one web server, one directory, or just Spider the whole web
· Configurable caching of downloaded files allows restart without needing to download everything again
· URL prioritization, so you can get interesting files first and leave boring files till last (or ignore them completely)
· Checkpointing so you can snapshot spider state in the middle of a run and restart without lots of processing.
Requirements:
· Java
What's New in This Release: [ read full changelog ]
· Added classification of URLs as "interesting" or "boring" by simple
· string matching. Interesting URLs are downloaded in preference to
· boring ones.
· Separated Spider from the UI, which is now in ui/TextSpider.
· Added checkpointing and resume functionality, so the spider can be
· killed and restarted without doing lots of processing.
· Fixed URL retrieval so fragments (URLs with a # in them) are not
· treated as a new URL.