Apache Lucene v.3.0.1


Apache Lucene 3.0.1 is a smart and flexible tool which is designed to help you develop open-source search software.Major Features:Lucene Java,flagship sub-project, provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.Droids is an intelligent robot crawling framework currently in incubation.Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Lucene Java search engine to the C# and .NET platform utilizing Microsoft .NET Framework. Lucene.Net is currently under incubation.Lucy is a loose C port of Lucene Java, with Perl and Ruby bindings.Mahout is a subproject with the goal of creating a suite of scalable machine learning libraries.Nutch builds on Lucene Java to provide web search application software.Open Relevance Project is a new subproject with the aim of collecting and distributing free materials for relevance testing and performance.PyLucene is a Python port of the the Lucene Java project.Solr is a high performance search server built using Lucene Java, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface.Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.Enhancements:Changes in backwards compatibility policy (1)LUCENE-2123: Removed the protected inner class ScoreTerm from FuzzyQuery. The change was needed because the comparator of this class had to be changed in an incompatible way. The class was never intended to be public. (Uwe Schindler, Mike McCandless)Bug fixes (10)LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode and equals methods, cause bad things to happen when caching BooleanQueries. (Chris Hostetter, Mike McCandless)LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at the same time, it's possible for commit to return control back to one of the threads before all changes are actually committed. (Sanne Grinovero via Mike McCandless)LUCENE-2132: Fix the demo result.jsp to use QueryParser with a Version argument. (Brian Li via Robert Muir)LUCENE-2166: Don't incorrectly keep warning about the same immense term, when IndexWriter.infoStream is on. (Mike McCandless)LUCENE-2158: At high indexing rates, NRT reader could temporarily lose deletions. (Mike McCandless)LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load implementation class when interface was loaded by a different class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy)LUCENE-2257: Increase max number of unique terms in one segment to termIndexInterval (default 128) * ~2.1 billion = ~274 billion. (Tom Burton-West via Mike McCandless)LUCENE-2260: Fixed AttributeSource to not hold a strong reference to the Attribute/AttributeImpl classes which prevents unloading of custom attributes loaded by other classloaders (e.g. in Solr plugins). (Uwe Schindler)LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when only one payload is present.(Erik Hatcher, Mike McCandless via Uwe Schindler)LUCENE-2270: Queries consisting of all zero-boost clauses (for example, text:foo^0) sorted incorrectly and produced invalid docids. (yonik)API Changes (4)LUCENE-1609: Restore IndexReader.getTermInfosIndexDivisor (it was accidentally removed in 3.0.0) (Mike McCandless)LUCENE-1972: Restore SortField.getComparatorSource (it was accidentally removed in 3.0.0) (John Wang via Uwe Schindler)LUCENE-2190: Added a new class CustomScoreProvider to function package that can be subclassed to provide custom scoring to CustomScoreQuery. The methods in CustomScoreQuery that did this before were deprecated and replaced by a method getCustomScoreProvider(IndexReader) that returns a custom score implementation using the above class. The change is necessary with per-segment searching, as CustomScoreQuery is a stateless class (like all other Queries) and does not know about the currently searched segment. This API works similar to Filter's getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless, Uwe Schindler)LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant will cause backwards compatibility problems when upgrading Lucene. See the Version javadocs for additional information. (Robert Muir)Optimizations (3)LUCENE-2086: When resolving deleted terms, do so in term sort order for better performance (Bogdan Ghidireac via Mike McCandless)LUCENE-2123 (partly): Fixes a slowdown / memory issue added by LUCENE-504. (Uwe Schindler, Robert Muir, Mike McCandless)LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum. (Uwe Schindler, Robert Muir)Test Cases (3)LUCENE-2114: Change TestFilteredSearch to test on multi-segment index as well. (Simon Willnauer via Mike McCandless)LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute that checks if clearAttributes() was called correctly. (Uwe Schindler, Robert Muir)LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if end() is implemented correctly. (Koji Sekiguchi, Robert Muir)Documentation (1)LUCENE-2114: Improve javadocs of Filter to call out that the provided reader is per-segment (Simon Willnauer via Mike McCandless)

Apache Lucene 3.0.1 is a smart and flexible ...

  • Apache
  • 133

Review Apache Lucene

  • captcha

Other software of Apache Software Foundation
  • Apache AxKitApache AxKit is a convenient server management tool which is used for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a ...
  • Apache Bean Validation  v.0.3The goal of the Bean Validation project is to deliver an implementation of the Bean Validation Specfication (JSR303) ...
  • Apache Chainsaw v2 Build 2006-03-02Apache Chainsaw v2 Build 2006-03-02 is designed to be a creative and helpful companion application to Log4j written by members of the Log4j development community. Like a number of Open Source projects, this new version was built upon inspirations, ...

New Miscellaneous software
  • Excavator  v.1.0.11Diggernaut is a cloud based service for web scraping, data extraction and other ETL (Extract, Transform, Load) tasks. If you dont have any programming skills, you can use Excavator tool to build configuration for your scrapers.
  • SocialMedia driven App Developer  v.2.76If you could look 1 year ahead, would you want to see? The SocialMedia driven App Developer is calculating your Shareware's Success. Fine-tune as many cost-driving parameters as you want. See revenue and net profit results displayed over 48 months.
  • InstallAware Free Installer  v.X6InstallAware's new Free Installer runs inside Visual Studio and creates setups automatically, by scanning your loaded solutions for dependencies and output files, and including them in your setup. This special edition of InstallAware is freeware!
  • Metamill  v.8.1.1921Metamill is a professional UML modeling tool. Supports UML 2.4 standard. All 14 UML diagrams supported. Round-trip engineering for Python, ADA, Java, C, C++, C# and VB.Net. RTF and HTML document generation.
  • Apache Lenya  v.2.0Apache Lenya 2.0 is a script designed like a content management system and coming with revision control, multi-site management, scheduling, search, WYSIWYG editors, and workf low. It is Free / Open Source. It has many other features. Major Features: ...
  • Apache Maven  v.2.2.1Apache Maven 2.2.1 comes as a useful software project management and comprehension tool that is based on the concept of a project object model (POM) and can manage a project's build, reporting and documentation from a central piece of ...