General – Solr.pl

Apache Solr 9.8.0

Rafał Kuć — Thu, 23 Jan 2025 19:33:38 +0000

It is a pleasure to inform you that the new version of the Solr search server has been released. It is the next release from the 9.x branch and it is numbered 9.8.

Some of the changes introduced in Solr 9.8:

Solr cross data center feature graduated into a main Solr feature!
A give request may now be limited when it comes to the amount of memory it can use using the memAllowed parameter.
The lib tags in solrconfig.xml are now silently ignored unless you include the SOLR_CONFIG_LIB_ENABLED environment variable set to true.
A new parser called knn_text_to_vector was added allowing to calculate text embeddings using external LLMs.

We encourage you to read the whole list of changes at: https://solr.apache.org/docs/9_8_0/changes/Changes.html.

Apache Solr 9.8 can be downloaded from https://dlcdn.apache.org/solr/.

Solr Cookbook Third Edition for less than 6 euro

Rafał Kuć — Fri, 18 Dec 2015 13:56:36 +0000

A quick information for those of you who are interested in my latest Solr book, the Solr Cookbook Third Edition. Packt Publishing is offering the book for less than 6 euros, starting on 17th December till the end of the year. If you are interested, you can buy the book with the discounted price at Solr Cookbook Third Edition Sale website.

Solr Cookbook, Third Edition

Rafał Kuć — Fri, 23 Jan 2015 13:49:20 +0000

As usual when we are not updating solr.pl for a long time that doesn’t mean that we are not doing anything. Similar to the previous period of silence we were writing. This time, after two years of from Apache Solr 4 Cookbook publication we are proud to announce that this Monday – 26.01.2015 Solr Cookbook Third Edition will be published.

Similar to the previous edition of cookbook, we took the time to rebuild the book and all recipes were updated, half of the previous content has been thrown away and new content was added. The very important thing in our minds is that Solr Cookbook Third Edition covers Solr 4.x version (basing on the newest 4.10.3 version of Solr) and Solr 5.0 which should be released very soon.

The book is targeting beginners and intermediate users working with Apache Solr. You’ll find recipes that should make your life easier when you take the first steps with Solr and when you are encountering common problems that intermediate users tend to struggle with. However I don’t recommend the book for those of you who knows everything about Solr – you may find parts of the book interesting, but this book is not directed to you.

The list of chapters from the book is as follows:

Apache Solr Configuration
Indexing Your Data
Analyzing Your Text Data
Querying Solr
Faceting
Improving Solr Performance
In the Cloud
Using Additional Solr Functionalities
Dealing with Problems
Real-life Situations

More information about the book itself with a free chapter (which will be available after official publication of the book) can be found on Packt Publishing web page dedicated to the book – https://www.packtpub.com/big-data-and-business-intelligence/solr-cookbook-third-edition.

Win Elasticsearch Server second edition e-book

Rafał Kuć — Sun, 08 Jun 2014 13:20:11 +0000

Together with Packt Publishing we have give away copies of our latest book – “Elasticsearch Server 2nd Edition“. Although it is not about Solr, you can have a chance to learn about it and compare to Solr. For the readers of solr.pl we will modify the competition question – “What unusual use case you’ve implemented using Apache Solr”. You can also give an example of a functionality that can’t be achieved with Elasticsearch.

All the information about the competition can be found at http://elasticsearchserverbook.com/win-elasticsearch-server-second-editon-ebook/.

SolrCloud – What happens when ZooKeeper fails?

Rafał Kuć — Mon, 02 Dec 2013 14:13:35 +0000

One of the questions I tend to get is what happens with SolrCloud cluster when ZooKeeper fails. Of course we are not talking about a single ZooKeeper instance failure, but the whole ensemble not being accessible and so the quorum not present. Because the answer to this question is very easy to verify i decided to make a simple blog post to show what happens when ZooKeeper fails.

Test environment

The test environment was very simple:

A single virtual machine running under Linux operating system
A single instance of ZooKeeper (which will be suitable for our test)
Two Solr instances with a single collection deployed
Solr 4.6

In order to create our test collection I’ve uploaded the configuration to ZooKeeper and used the following command:

curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=collection1&numShards=2&replicationFactor=1'

The cloud view of the example cluster was as follows:

Test data indexing

The next step in our test will be indexing. We will index a few example documents that are provided with Solr in the exampledocs directory. The following commands were used to index the data:

curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @mem.xml -H 'Content-type:application/xml'
curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @monitor.xml -H 'Content-type:application/xml'
curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @monitor2.xml -H 'Content-type:application/xml'

After executing the above commands we get the following number of documents:

The whole collection holds 5 documents
Shard located on Solr running on port 8983 host 1 document
Shard located on Solr running on port 7983 has 4 documents

Querying with ZooKeeper not present

Now we go to the next step – we shutdown our ZooKeeper instance and we try to run a simple query by sending the following command:

curl 'localhost:8983/solr/collection1/select?q=*:*&indent=true'

In result we get the following response:



 
  0
  16
  
   true
   *:*
  
 


 TWINX2048-3200PRO 
 CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail
 Corsair Microsystems Inc.
 corsair
 
  electronics
  memory
 
 
  CAS latency 2,    2-3-3-6 timing, 2.75v, unbuffered, heat-spreader
 
 185.0
 185,USD
 5
 true
 37.7752,-122.4232
 2006-02-13T15:26:37Z
 electronics|6.0 memory|3.0
 1453219034197655552


 VS1GB400C3
 CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail
 Corsair Microsystems Inc.
 corsair
 
  electronics
  memory
 
 74.99
 74.99,USD
 7
 true
 37.7752,-100.0232
 2006-02-13T15:26:37Z
 electronics|4.0 memory|2.0
 1453219034252181504


 VDBDB1A16
 A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM
 A-DATA Technology Inc.
 corsair
 
  electronics
  memory
 
 
  CAS latency 3,     2.7v
 
 0
 true
 45.18414,-93.88141
 2006-02-13T15:26:37Z
 electronics|0.9 memory|0.1
 1453219034255327232


 3007WFP
 Dell Widescreen UltraSharp 3007WFP
 Dell, Inc.
 dell
 
  electronics
  monitor
 
 
  30" TFT active matrix LCD, 2560 x 1600, .25mm dot pitch, 700:1 contrast
 
 USB cable
 401.6
 2199.0
 2199,USD
 6
 true
 43.17614,-90.57341
 1453219041357332480


 VA902B
 ViewSonic VA902B - flat panel display - TFT - 19"
 ViewSonic Corp.
 viewsonic
 
  electronics
  monitor
 
 
  19" TFT active matrix LCD, 8ms response time, 1280 x 1024 native resolution
 
 190.4
 279.95
 279.95,USD
 6
 true
 45.18814,-93.88541
 1453219045997281280

As we can see Solr responded correctly. This is because Solr already has the clusterstate.json file cached. To search Solr doesn’t need to update that file, so search should and is working as we could see.

Indexing with failed ZooKeeper

Without turning on our ZooKeeper instance we try to run the following command:

curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @hd.xml -H 'Content-type:application/xml'

The above command should result in indexing the contents of the hd.xml file. After a longer period of time Solr responds with the following information:



50315096Cannot talk to ZooKeeper - Updates are disabled.503

So as you can see we are not able to index data without working ZooKeeper ensemble.

Starting ZooKeeper again

So let’s see what will happen when we start our ZooKeeper instance again without restarting Solr nodes. After starting ZooKeeper we try to run the same indexing command, we just did, once again:

curl 'localhost:8983/solr/collection1/update?commit=true' --data-binary @hd.xml -H 'Content-type:application/xml'

And this time the response is different:

As we can see the indexing request was successful this time. This allows us to assume that the connection to ZooKeeper was re-established by Solr. We can see that in Solr and ZooKeeper logs.

Short summary

As you can see, our short test allowed to see what happens when our ZooKeeper ensemble fails and what we can expect from Solr in such rare cases. I hope this blog entry will help you with some doubts about SolrCloud and its usefulnesses.

Please also remember that during the test, the cluster state did not change – all shards were accessible and working. We will see what will be happening when shards or replicas fails when ZooKeeeper is down in the next blog entry about SolrCloud.

Random documents from result set (Giveaway results !)

Marek Rogoziński — Tue, 02 Apr 2013 11:56:24 +0000

And now two birds with a single stone – a new article and the Apache Solr 4 Cookbook giveaway results. In this article we would like to show you how to implement random ordering of documents in the resulting using Apache Solr. Our example is real case scenario – we’ve used this to draw two giveaway participants. Those two comment authors that will be of top of the results set will receive the ebook.

Documents

Our documents contain information about participants of the competition – their id, name (as the author field) and email.. For example one record looks like that:


  1
  Solr.pl author
  blog(at)solr.pl

Our very big data contains 19 records, maybe we should have used map/reduce ? :).

Schema

The schema.xml file describing the structure of the index is also very simple. In our case it contains the following fields:

Additional configuration

Now we need to make sure that the schema.xml file contains the following definition type and field definitions:

In the example schema.xml file provided with the standard Solr distribution package this type and dynamic field is available by default. We will need those to randomize our result set.

Running a query with random sorting

Running a query with a random sorting is a little bit tricky. We build a query like we usually do except for the sorting. For the sort parameter we will use the previously defined dynamic field with the random prefix. For example:

localhost:8983/solr/competition/select?q=*:*&sort=random_12939291%20desc

How it works ?

Solr will calculate ordering of the documents basing on the name of the random field and the index version. This means that every time you use the same field name and the same index (which was not changed between queries), you will get results that are ordered exactly the same way. This is disadvantage of this method, but sometimes this may be quite handy, like when doing paging (we don’t want to have different results ordering for each page, right ?). Because of this you have to generate the field name in your application that runs queries to Solr.

And now – Giveaway results !

We’ve used the above-mentioned query. Number used in sort field is absolutely random, that was randomized by saying: “Dad, tell me some random numbers” :). So the whole query we’ve used was:

localhost:8983/solr/collection1/select?q=*:*&indent=true&rows=2&sort=random_3721117253841%20desc

This above query gave the following results:


  
    9
    Rajeev Srivastava
    [CENSORED]
    1431017731370516481
  
    8
    Evgeny
    [CENSORED]
    1431017731370516480

And the winners are

Rajeev
Evgeny

Congratulations ! We will contact you in the very near future with further information about how to receive your awards. Once again congratulations! Also, to all the other participants, thanks for participations and your comments !

Win Free Copies of Packt’s new book on Apache Solr (updated)

Rafał Kuć — Fri, 15 Mar 2013 12:54:11 +0000

Readers would be pleased to know that we have teamed up with Packt Publishing to organize a Giveaway of the Apache Solr 4 Cookbook. Two lucky winners will win a copy of the book (in eBook format). Keep reading to find out how you can be one of the Lucky Winners.

Let’s start with a little reminder about the book:

Learn how to make Apache Solr search faster, more complete, and comprehensively scalable
Solve performance, setup, configuration, analysis, and query problems in no time
Get to grips with, and master, the new exciting features of Apache Solr 4

Read more about this book and download free Sample Chapter.

How to Enter ?

All you need to do is head on over to the book page (Apache Solr 4 Cookbook) and look through the product description of the book and drop a line via the comments below this post to let us know what interests you the most about this book. It’s that simple.

Product Description: http://www.packtpub.com/apache-solr-4-cookbook/book

Deadline

The contest will close on 28.03.2013. Winners will be contacted by email, so be sure to use your real email address when you comment!

Who Will Win ?

The winners will be chosen by the Solr.pl team randomly from readers entering the competition that replied with on topic comment.

If you want to increase your chances of winning, write a small review of the book using the sample chapter on Amazon.com and also forward the same post to bhavins@packtpub.com.

Book Format

The free copies will be provided in eBook format.

Update

The contents is now officially closed. Thank you to all the participants. The winners will be announced in a dedicated blog post right after Easter, on Tuesday 2nd of April.

New Book: ElasticSearch Server!

Marek Rogoziński — Tue, 29 Jan 2013 11:23:12 +0000

In the blog post dedicated to Solr 4.0 Cookbook we give a small hint that cookbook was not the only project that occupies our free time. Today we can officially say that a few month of hard work is slowly coming to an end – we can announce a new book about one of the greatest piece of open-source software – ElasticSearch Server book!

ElasticSearch server book describes the most important and commonly used features of ElasticSearch (at least from our perspective). Example of topics discussed:

ElasticSearch installation and configuration
Static and dynamic index structure creation
Querying ElasticSearch with Query DSL explained
Using filters
Faceting
Routing
Indexing data that is not flat

We also talk about:

Autocomplete and how to implement it using ElasticSearch
Percolator – what is it and how to use it
ElasticSearch monitoring and being a fireman
And much, much more

Eventhough we work with ElasticSearch everyday, we realized how big it was after digging into all the functionalities and cases. Because of that we were not able to describe all the features, but we hope we were able to choose the ones that are the most interested and needed.

Do you think that we should start writing about new enterprise search engine on solr.pl ?

Solr 4.0 Cookbook

Rafał Kuć — Tue, 11 Dec 2012 11:19:09 +0000

Because of the fact that is was rather quietly on solr.pl lately, we would like to show you one of the reasons of this situation.

We are pleased to inform you, that the updated version of cookbook – “Solr 4.0 Cookbook” will be available in March 2013. The book is focused on the latest available version of Solr server – the 4.0.As before, the content is divided into ten thematic chapters and the book is again maintained in the cookbook conversion, which means that each recipe is focused on solving one particular problem.

From our point of view its worth to notice that the book was rewritten in the majority. More than half of the old recipes were removed and new ones came for them, for example ones describing Solr Cloud.

If you are interested, please refer to the Packt Publishing page: http://www.packtpub.com/apache-solr-4-cookbook/book, more details soon.

Rich documents processing – on the search or application side?

Rafał Kuć — Mon, 11 Jun 2012 21:48:05 +0000

When indexing so called “rich documents” we should sometimes think about, where we want those documents to be processes – should we send them to Apache Solr (or other search engine, like ElasticSearch) and forget about them or whether we should use Apache Tika before sending the document and send the extracted content along with other information for indexation.

Options

As I wrote a few lines above we have two options – the first one is sending the binaries to search engine and use ExtractingRequestHandler (information about integrating Solr with Apache Tika can be found here) in Solr case, so it will make all the work for us. The second option is to use the same functionality (almost the same) to parse binary documents and get their contents before sending them to Solr. Of course there is a third option, not possible in most cases – get the documents you want to index in a format understandable by Solr

Processing on the Search Server Side

The simplest approach is to process your “rich documents” on the search server side. Lets assume its Apache Solr. We configure the ExtractingRequestHandler in the way we want it to work and we forget about everything else. But its not the right approach every time. You can imagine a situation when your indexing server is almost 100% utilized. If you would add another source of generating load you would probably suffer from performance problems. In such cases you will probably want to do it the other way.

Processing Outside of the Search Server

If the amount of rich documents is huge or your indexing server is almost completely utilized than it may be a good idea to process your binary files before sending them to your indexing server. Using Apache Tika for example we are able to build (quite easily) a good and reliable solution to process rich documents in your application. Of course such approach require a bit of knowledge about Java (or any other language you will use for content extraction). Such approach can save us from a situation where our indexing server is overloaded and because of the amount of data we can’t do anything with it.

A Few Words at the End

Once every few weeks we will be publishing posts that don’t cover one of the Apache Solr functionalities, but instead discuss some overall search problem or describe architecture of system with search as their part. We hope that such posts will allow us and you to look a bit wider on search topics than only from Apache Solr point of view.