Indexing files like doc, pdf – Solr and Tika integration

In the previous article we have given basic information about how to enable the indexing of binary files, ie MS Word files, PDF files or LibreOffice files. Today we will do the same thing, using the Data Import Handler. Since a few days ago a new version of the Solr server (3.1) have been released, the following guidelines are based on this version. For the purpose of the article I used the “example” application – all of the changes relate to this application.

Read more

Solr 1.4: Local Params

Several months ago, during one of the projects I have tried to construct a query with optimal faceting. The problem was that we need filters (fq) in the query but in the same time we need a faceting that was not filtered. To some point it was not possible in Solr – you had to make two queries. But now, you can do it with one query. Let’s meet LocalParams.

Read more

Solr and Tika integration (part 1 – basics)

Indexing the so-called “rich documents”, ie files like pdf, doc, rtf, and so on (or binary files) always required some additional work on the developer side, at least to get the contents of the file and prepare it in a format understood by the search engines, in this case for Solr. To minimize this job I decided to look at the Apache Tika and integration of this library with Solr.

Read more

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. /home/aludstro/domains/solr.pl/public_html/wp-includes/link-template.php on line 409
https://solr.pl/en/solr-and-tika-integration-part-1-basics/">View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active
Save settings
Cookies settings