Quick look – FieldCollapsing

FieldCollapsing, or in other words grouping of search results has just been commited to the svn repository. I decided to take a look at this functionality and see how it works.

I want to begin with brief information – FieldCollapsing is only available in version 4.0 of Solr, which is a development version of Solr project, and it’s rather unlikely to be transfered to version 3.X.

FieldCollapsing – what is it ?

Imagine that our index contains information about companies from different cities. We want to show our users one (or, for example two or three) companies in each city, of course, the companies that meet the search criteria. How to do that – just use the FieldCollapsing mechanism. It allows the returned results to be grouped based on field contents. The search results can be grouped into a single document, or a fixed quantity of documents.

Parameters

Similarly, as with most features available in Solr, the behavior of FieldCollapsing mechanism can be configured through a number of parameters, here they are:

  • group – setting this parameter to true enables FieldCollapsing mechanism. The default value is false.
  • group.field – this parameter determines on the contents of what field grouping is going to take place.
  • group.func – definition of function, based on the outcome of which grouping will be made.
  • group.limit – the number of documents returned in each group. The default is 1.
  • group.sort – parameter specifying how to sort the documents in groups. The default value is the value score desc.

It is worth noting that the rows parameter passed to the query will determine the number of groups to be returned in search results not the amount of individual documents. Sort parameter behaviour is also changed. This parameter will tell Solr how to sort groups not individual documents. Groups wil be sorted based on the content of fields of the first documents in every group.

Search Results

Search results are different from those to which we are accustomed. They are grouped according to the parameters that we have passed. The main element of the search results are no longer documents – when we use FieldCollapsing the main search result element is a group of documents. Within the groups the documents are shown (their number is defined by group.limit parameter). For example, making the following query:

http://localhost:8983/solr/select/?q=*:*&group=true&group.field=instock&indent=true

to Solr which index was created by indexing all documents in XML format from a catalog exampledocs will result in getting the following response:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
  <lst name="params">
    <str name="group.field">inStock</str>
    <str name="group">true</str>
    <str name="indent">true</str>
    <str name="q">*:*</str>
  </lst>
</lst>
<lst name="grouped">
  <lst name="inStock">
    <int name="matches">19</int>
    <arr name="groups">
     <lst>
        <str name="groupValue">T</str>
        <result name="doclist" numFound="15" start="0">
          <doc>
            <arr name="cat"><str>electronics</str><str>hard drive</str></arr>
            <arr name="features"><str>7200RPM, 8MB cache, IDE Ultra ATA-133</str><str>NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor</str></arr>
            <str name="id">SP2514N</str>
            <bool name="inStock">true</bool>
            <str name="manu">Samsung Electronics Co. Ltd.</str>
            <date name="manufacturedate_dt">2006-02-13T15:26:37Z</date>
            <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133</str>
            <int name="popularity">6</int>
            <float name="price">92.0</float>
            <str name="store">45.17614,-93.87341</str>
            <double name="store_0_d">45.17614</double>
            <double name="store_1_d">-93.87341</double>
            <str name="store_lat_lon">45.17614,-93.87341</str>
          </doc>
        </result>
      </lst>
      <lst>
        <str name="groupValue">F</str>
        <result name="doclist" numFound="4" start="0">
          <doc>
            <arr name="cat"><str>electronics</str><str>connector</str></arr>
            <arr name="features"><str>car power adapter, white</str></arr>
            <str name="id">F8V7067-APL-KIT</str>
            <bool name="inStock">false</bool>
            <str name="manu">Belkin</str>
            <date name="manufacturedate_dt">2005-08-01T16:30:25Z</date>
            <str name="name">Belkin Mobile Power Cord for iPod w/ Dock</str>
            <int name="popularity">1</int>
            <float name="price">19.95</float>
            <str name="store">45.17614,-93.87341</str>
            <double name="store_0_d">45.17614</double>
            <double name="store_1_d">-93.87341</double>
            <str name="store_lat_lon">45.17614,-93.87341</str>
            <float name="weight">4.0</float>
          </doc>
        </result>
      </lst>
    </arr>
  </lst>
</lst>
</response>

At the end

An interesting feature that will certainly find use in some systems. However, please note that this functionality will be further developed. So far there is no support for distributed search and for grouping on multivalued fields. At this time there’s no point of a performance testing, first because of the changes that will come to the mechanism, and secondly because of the fact that this is Lucene and Solr 4.0 which are both in development. However, I will be definitely watching how this functionality evolves 😉

Leave a Reply

Your email address will not be published. Required fields are marked *