Random documents from result set (Giveaway results !)

And now two birds with a single stone – a new article and the Apache Solr 4 Cookbook giveaway results. In this article we would like to show you how to implement random ordering of documents in the resulting using Apache Solr. Our example is real case scenario – we’ve used this to draw two giveaway participants. Those two comment authors that will be of top of the results set will receive the ebook.

Documents

Our documents contain information about participants of the competition – their id, name (as the author field) and email.. For example one record looks like that:

<doc>
  <field name="id">1</id>
  <field name="author">Solr.pl author</field>
  <field name="email">blog(at)solr.pl</field>
</doc>

Our very big data contains 19 records, maybe we should have used map/reduce ? :).

Schema

The schema.xml file describing the structure of the index is also very simple. In our case it contains the following fields:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="author" type="text_general" indexed="true" stored="true"/>
<field name="email" type="text_general" indexed="true" stored="true"/>

Additional configuration

Now we need to make sure that the schema.xml file contains the following definition type and field definitions:

<fieldType name="random" class="solr.RandomSortField" indexed="true" />
<dynamicField name="random_*" type="random" />

In the example schema.xml file provided with the standard Solr distribution package this type and dynamic field is available by default. We will need those to randomize our result set.

Running a query with random sorting

Running a query with a random sorting is a little bit tricky. We build a query like we usually do except for the sorting. For the sort parameter we will use the previously defined dynamic field with the random prefix. For example:

localhost:8983/solr/competition/select?q=*:*&sort=random_12939291%20desc

How it works ?

Solr will calculate ordering of the documents basing on the name of the random field and the index version. This means that every time you use the same field name and the same index (which was not changed between queries), you will get results that are ordered exactly the same way. This is disadvantage of this method, but sometimes this may be quite handy, like when doing paging (we don’t want to have different results ordering for each page, right ?). Because of this you have to generate the field name in your application that runs queries to Solr.

And now – Giveaway results !

We’ve used the above-mentioned query. Number used in sort field is absolutely random, that was randomized by saying: “Dad, tell me some random numbers” :). So the whole query we’ve used was:

localhost:8983/solr/collection1/select?q=*:*&indent=true&rows=2&sort=random_3721117253841%20desc

This above query gave the following results:

<result name="response" numFound="19" start="0">
  <doc>
    <str name="id">9</str>
    <str name="author">Rajeev Srivastava</str>
    <str name="email">[CENSORED]</str>
    <long name="_version_">1431017731370516481</long></doc>
  <doc>
    <str name="id">8</str>
    <str name="author">Evgeny</str>
    <str name="email">[CENSORED]</str>
    <long name="_version_">1431017731370516480</long></doc>
</result>

And the winners are

  • Rajeev
  • Evgeny

Congratulations ! We will contact you in the very near future with further information about how to receive your awards.  Once again congratulations! Also, to all the other participants, thanks for participations and your comments !

Leave a Reply

Your email address will not be published. Required fields are marked *