Mindquarry's commercial offerings end
Much to our regret, we must inform you that the company Mindquarry will stop providing commercial services and products. We could not convince our investor to keep financing our endeavour.Read more...
Speaker Info:
Daniel Naber, Senior Developer
Description of Talk:
Apache Lucene is a collection of search-related software at the Apache Software foundation, most notably Lucene Java (often just called "Lucene"), Solr, and Nutch.
Considering its active community and the number of high-class deployments, Lucene Java is by far the most successful Open Source fulltext search library. It is used, amongst many others, to power the search at Wikipedia, monster.com, and the desktop search tool Beagle.
Technically, Lucene is a pure Java library that requires Java 1.4 and has no external dependencies. Solr and Nutch are Java-based applications that are built on Lucene Java and that can be used almost without programming knowledge.
The talk will introduce fulltext indexing and searching with Lucene, Solr, and Nutch. The important steps in fulltext information retrieval will be described: file format conversion, meta data extraction, text normalization, and the indexing step itself. Examples will be given to give you an idea of how easy it is to use Lucene Java and when it may be more sensible to use Solr or Nutch -- or even a standard relational database.
Lucene Java will be explained using Java code examples, showing how the important classes fit together. Solr is a Lucene-based search server with HTTP interfaces which expect and return XML documents. It has some higher level feature like a web frontend, replication, and caching which make it an interesting alternative even for software developer's that are willing to learn Lucene Java. Solr's configuration files will be explained and the XML format will be shown.
Unlike Lucene Java, Nutch is not a library but a complete web search engine. Technically Nutch is Lucene plus a web crawler, a plug-in system, document converters, and a web search front end. A short demonstration will show how to get the crawler started.