The Lemur Project 4.12

Language modeling and information retrieval application
The Lemur Toolkit is a free and open source application designed to facilitate research in language modeling and information retrieval. The Lemur Toolkit includes technologies such as ad hoc and distributed retrieval, cross-language IR, summarization, filtering, and classification.

Main features:

  • Sophisticated structured query languages (using InQuery and Indri)
  • Support for XML and structured document retrieval
  • Used commonly with a wide range of research test collections (e.g., TREC CDs 1-5, wt10g, RCV1, gov, gov2)
  • Index your web pages with an "out-of-the-box" site search capability
  • Interactive interfaces for Windows, Linux, and Web
  • Distributed information retrieval and document clustering applications
  • Cross-platform, fast and modular code written in C++
  • C++, Java and C# APIs
  • Free and open-source software
  • In use for over 6 years by a large and growing user community
  • Indexing:
  • Multiple indexing methods for small, medium and large-scale (terabyte) collections
  • Built-in support for English, Chinese and Arabic text
  • Porter and Krovetz word stemming
  • Incremental indexing
  • Out-of-the-box indexing support for TREC Text, TREC Web, plain text, HTML, XML, PDF, MBox, Microsoft Word, and Microsoft PowerPoint
  • Indexes inline and offset text annotations (e.g., part-of-speech and named entities)
  • Indexes document attributes
  • Retrieval:
  • Supports major language modeling approaches such as Indri and KL-divergence, as well as vector space, tf.idf, Okapi and InQuery
  • Relevance- and pseudo-relevance feedback
  • Wildcard term expansion (using Indri)
  • Passage and XML element retrieval
  • Cross-lingual retrieval
  • Smoothing via Dirichlet priors and Markov chains
  • Supports arbitrary document priors (e.g., Page Rank, URL depth)

last updated on:
June 25th, 2010, 13:56 GMT
file size:
63.2 MB
license type:
developed by:
The Lemur Team
operating system(s):
Mac OS X
binary format:
Universal Binary
Home \ Developer Tools
The Lemur Project
Download Button

In a hurry? Add it to your Download Basket!

user rating



Rate it!
What's New in This Release:
  • 02) Click to expand/collapse Version: 4.12
  • BUG# 3014524 -- Update google parser for query log toolbar server.
  • BUG# 3014521 -- Query log toolbar server can now be run with an optional
  • hostname parameter, which will be used instead of localhost if
read full changelog

Add your review!