Indexing files like doc, pdf – Solr and Tika integration

In the previous article we have given basic information about how to enable the indexing of binary files, ie MS Word files, PDF files or LibreOffice files. Today we will do the same thing, using the Data Import Handler. Since a few days ago a new version of the Solr server (3.1) have been released, the following guidelines are based on this version. For the purpose of the article I used the “example” application – all of the changes relate to this application.

Read more

Solr 1.4: Local Params

Several months ago, during one of the projects I have tried to construct a query with optimal faceting. The problem was that we need filters (fq) in the query but in the same time we need a faceting that was not filtered. To some point it was not possible in Solr – you had to make two queries. But now, you can do it with one query. Let’s meet LocalParams.

Read more

Solr and Tika integration (part 1 – basics)

Indexing the so-called “rich documents”, ie files like pdf, doc, rtf, and so on (or binary files) always required some additional work on the developer side, at least to get the contents of the file and prepare it in a format understood by the search engines, in this case for Solr. To minimize this job I decided to look at the Apache Tika and integration of this library with Solr.

Read more

Index – delete or update?

From time to time, in working with Solr there is a problem – how to update Solr index structure. There are various reasons for these changes – the new functional requirements, optimization, or anything else – it is not important. What is important is the question that arise – should we remove the index, or simply change the structure and do a full indexing? Contrary to appearances, the answer to this question depends on the changes we made in the structure of the index.

Read more

”Car sale” application – WordDelimiterFilter and PatternReplaceFilter, helping to improve search results (part 2)

In the first part of our ”Car sale” application related posts we created some standard index structure by properly configuring schema.xml configuration file. It didn’t take long to hear the first complains from the website users with this kind of configuration. Why don’t I receive any search results entering the “audi a” phrase ? I would like to see some announcements with “Audi A6” and “Audi A8” for example. I entered the phrase “Honda crv” – 0 results, “Suzuki maruti” – none. Are there no related offers in the announcement database ? There are! But the current configuration of the searchable field type (field “content” – type “text”) does not allow us to find those offers using the queries we’ve entered. That’s the reason why the WordDelimiterFilter and PatternReplaceFilter need to enter the battlefield.

Read more

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. /home/aludstro/domains/solr.pl/public_html/wp-includes/link-template.php on line 409
https://solr.pl/en/car-sale-application-worddelimiterfilter-and-patternreplacefilter-helping-to-improve-search-results-part-2/">View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active
Save settings
Cookies settings