In the previous article we have given basic information about how to enable the indexing of binary files, ie MS Word files, PDF files or LibreOffice files. Today we will do the same thing, using the Data Import Handler. Since a few days ago a new version of the Solr server (3.1) have been released, the following guidelines are based on this version. For the purpose of the article I used the “example” application – all of the changes relate to this application.
Lucene and Solr 3.1
A few minutes ago Lucene and Solr commiters published a new, stable version of Lucene library and Solr search engine – both numbered 3.1. There are numerous changes, but I’ll just mention some of them (following Grant Ingersoll announcement).
Solr 1.4: Local Params
Several months ago, during one of the projects I have tried to construct a query with optimal faceting. The problem was that we need filters (fq) in the query but in the same time we need a faceting that was not filtered. To some point it was not possible in Solr – you had to make two queries. But now, you can do it with one query. Let’s meet LocalParams.
Solr and Tika integration (part 1 – basics)
Indexing the so-called “rich documents”, ie files like pdf, doc, rtf, and so on (or binary files) always required some additional work on the developer side, at least to get the contents of the file and prepare it in a format understood by the search engines, in this case for Solr. To minimize this job I decided to look at the Apache Tika and integration of this library with Solr.
“Car sale application” – Spatial Search, adding location data (part 3)
The amount of announcements in our database is so large, that our web site users started to look for another option to filter search results and another way of sorting them. We need to add the functionality, which allows us to operate with localization data related to the cars.
Data Import Handler & XML – nested entities
Data Import Handler is a very nice and powerful tool. The following entry is a description of the problem (and solutions) which I met recently.
Sorting by function value in Solr (SOLR-1297)
In Solr 3.1 and later we have a very interesting functionality, which enables us to sort by function value. What that gives us ? Actually a few interesting possibilities.
Waiting for 4.0: SOLR-2272 – Solr and JOIN functionality
Index – delete or update?
From time to time, in working with Solr there is a problem – how to update Solr index structure. There are various reasons for these changes – the new functional requirements, optimization, or anything else – it is not important. What is important is the question that arise – should we remove the index, or simply change the structure and do a full indexing? Contrary to appearances, the answer to this question depends on the changes we made in the structure of the index.
”Car sale” application – WordDelimiterFilter and PatternReplaceFilter, helping to improve search results (part 2)
In the first part of our ”Car sale” application related posts we created some standard index structure by properly configuring schema.xml configuration file. It didn’t take long to hear the first complains from the website users with this kind of configuration. Why don’t I receive any search results entering the “audi a” phrase ? I would like to see some announcements with “Audi A6” and “Audi A8” for example. I entered the phrase “Honda crv” – 0 results, “Suzuki maruti” – none. Are there no related offers in the announcement database ? There are! But the current configuration of the searchable field type (field “content” – type “text”) does not allow us to find those offers using the queries we’ve entered. That’s the reason why the WordDelimiterFilter and PatternReplaceFilter need to enter the battlefield.