Hierarchical faceting – Pivot facets in trunk

In a large number of implementations which I took part in, sooner or later, the question arise – what can we do to get faceting as a tree structure. Of course there some tricks for that, however, their use was to modify the data and appropriate processing of the results on application side. It was not particularly functional, nor especially comfortable. However, a few days ago Solr version 4.0 has been enhanced with code that is marked as Solr-792 in the system JIRA. Let’s see in this case, how to get the faceting results as a tree.
Important Note – at this point this functionality is only available in version 4.0, Solr, which is the development version. To use this version you need to download the code from trunk of Lucene/Solr SVN repository.

A few words at the beginning

In many projects in which I had the opportunity to deal with there was a need to use a hierarchical faceting. One of the simplest example is the requirement of showing the cities in the provinces and the number of documents in both provinces, as well as in various cities. Till recently, with no changes in the structure of data, it was impossible to achieve such functionality. Now it is possible ;)

Indexing

In order not to unnecessarily complicate the described functionality I decided to use the sample XML documents that are available in the directory /exampledocs of the example deployment. I also didn’t modify the schema.xml file, or solrconfig.xml, so that configurations are standard. So thats all when it comes to configuration. So we can start the indexing process (I called the command from the directory $SOLR_HOME/exampledocs/):

./post.sh *.xml

After seeing several screens of information, and we have our data indexed.

The mechanism

It is not difficult to use hierarchical faceting. Solr creators gave us to use two additional parameters to the ones we already know:

  • facet.pivot – list of comma-separated fields, which shows at which fields and in what order to calculate the structure,
  • facet.pivot.mincount – the minimum number of documents there needs to be to the result to be included in faceting results. The default value is 1.

So let’s try it.

Queries

At the beginning of the try with two fields. I query for all the documents from the index and add the parameter facet.pivot=cat,inStock to say Solr that I want to get the results of the hierarchical faceting, where the first level of the hierarchy is the cat field, and the second level is the inStock field. The query looks as follows:

http://localhost:8983/solr/select/?q=*:*&facet=true&facet.pivot=cat,inStock

To shorten the listing I omitted the part responsible for the search results along with a header.

<?xml version="1.0" encoding="UTF-8"?>
<response>
.
.
.
<result name="response" numFound="19" start="0"/>
<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields"/>
  <lst name="facet_dates"/>
  <lst name="facet_ranges"/>
  <lst name="facet_pivot">
    <arr name="cat,inStock">
      <lst>
        <str name="field">cat</str>
        <str name="value">electronics</str>
        <int name="count">17</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">true</bool>
            <int name="count">13</int>
          </lst>
          <lst>
            <str name="field">inStock</str>
            <bool name="value">false</bool>
            <int name="count">4</int>
          </lst>
        </arr>
      </lst>
      <lst>
        <str name="field">cat</str>
        <str name="value">memory</str>
        <int name="count">6</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">true</bool>
            <int name="count">6</int>
          </lst>
        </arr>
      </lst>
      <lst>
        <str name="field">cat</str>
        <str name="value">connector</str>
        <int name="count">2</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">false</bool>
            <int name="count">2</int>
          </lst>
        </arr>
      </lst>
      <lst>
        <str name="field">cat</str>
        <str name="value">graphics card</str>
        <int name="count">2</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">false</bool>
            <int name="count">2</int>
          </lst>
        </arr>
      </lst>
      <lst>
        <str name="field">cat</str>
        <str name="value">hard drive</str>
        <int name="count">2</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">true</bool>
            <int name="count">2</int>
          </lst>
        </arr>
      </lst>
      <lst>
        <str name="field">cat</str>
        <str name="value">monitor</str>
        <int name="count">2</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">true</bool>
            <int name="count">2</int>
          </lst>
        </arr>
      </lst>
      <lst>
        <str name="field">cat</str>
        <str name="value">search</str>
        <int name="count">2</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">true</bool>
            <int name="count">2</int>
          </lst>
        </arr>
      </lst>
      <lst>
        <str name="field">cat</str>
        <str name="value">software</str>
        <int name="count">2</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">true</bool>
            <int name="count">2</int>
          </lst>
        </arr>
      </lst>
    </arr>
  </lst>
</lst>
</response>

The presentation of faceting results has changed in this case. For each of the main level we have the markers defining the field (the tag with the attribute name=”field”), value (the tag with the attribute name=”value”) and the number of documents (the tag with the attribute name=”count”). Next there is the the second level hierarchy (tag with the attribute name=”pivot”). The second level contains the same elements as the first level – name, value and the number of documents with a given value.

Let’s see how this mechanism can deal with more levels of depth. To check that I run the following query:

http://localhost:8983/solr/select/?q=*:*&facet=true&facet.pivot=cat,inStock,features

I omitted the response header with the results, leaving the faceting results only. In addition, due to the length of the faceting results I only show one level one level faceting:

<?xml version="1.0" encoding="UTF-8"?>
<response>
.
.
.
<result name="response" numFound="19" start="0"/>
<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields"/>
  <lst name="facet_dates"/>
  <lst name="facet_ranges"/>
  <lst name="facet_pivot">
    <arr name="cat,inStock,features">
      <lst>
        <str name="field">cat</str>
        <str name="value">electronics</str>
        <int name="count">17</int>
        <arr name="pivot">
          <lst>
            <str name="field">inStock</str>
            <bool name="value">true</bool>
            <int name="count">13</int>
            <arr name="pivot">
              <lst>
                <str name="field">features</str>
                <str name="value">2</str>
                <int name="count">7</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">3</str>
                <int name="count">7</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">lcd</str>
                <int name="count">5</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">x</str>
                <int name="count">5</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">ca</str>
                <int name="count">4</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">latenc</str>
                <int name="count">4</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">tft</str>
                <int name="count">4</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">v</str>
                <int name="count">4</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">0</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">1</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">25</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">30</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">5</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">7</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">8</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">time</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">up</str>
                <int name="count">3</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">000</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">19</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">20</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">2336</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">27</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">275</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">6</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">75</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">activ</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">built</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">cach</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">color</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">flash</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">heat</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">heatspread</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">matrix</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">mb</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">ms</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">photo</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">resolut</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">seek</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">speed</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">spreader</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">unbuff</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">usb</str>
                <int name="count">2</int>
              </lst>
            </arr>
          </lst>
          <lst>
            <str name="field">inStock</str>
            <bool name="value">false</bool>
            <int name="count">4</int>
            <arr name="pivot">
              <lst>
                <str name="field">features</str>
                <str name="value">0</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">1</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">16</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">2</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">20</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">3</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">9</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">90</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">adapt</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">car</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">clock</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">direct</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">directx</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">dual</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">dvi</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">express</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">gddr</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">ghz</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">gl</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">gpu</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">gpuvpu</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">hdtv</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">mb</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">mhz</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">open</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">opengl</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">out</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">pci</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">power</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">vpu</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">white</str>
                <int name="count">2</int>
              </lst>
              <lst>
                <str name="field">features</str>
                <str name="value">x</str>
                <int name="count">2</int>
              </lst>
            </arr>
          </lst>
        </arr>
      </lst>
    </arr>
  </lst>
</lst>
</response>

As shown in the example, also in this case Solr had no problems with the correct calculation of the hierarchy. The above example is almost the same, in the context of data available, as the previous example, it only contains one more level of depth.

A few words at the end

In my opinion this is one of the more useful features for “ordinary” user. Unfortunately, so far only available in development version of Solr. I have not found any information about whether it is planned to transfer this functionality to version 1.5 of Solr, which is named branch_3x branch in SVN. However, it is important that this functionality was commited, and sooner or later Solr users will be able to use it.

This post is also available in: Polish

This entry was posted on Monday, October 25th, 2010 at 07:25 and is filed under About Solr. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

11 Responses to “Hierarchical faceting – Pivot facets in trunk”

  1. Nicolas Says:

    Is this code for SOLR-792 already available in the trunk of Solr 4.0?

  2. gr0 Says:

    Yes, the code is already commited to trunk – I was writing this post with the help of trunk version of Solr. There were also same changes to the functionality since the post were published.

  3. Vijay Says:

    I have solr 1.4.1 and i am trying pivots as given above without changing any configuration, but i dont see any results for pivot. Any clue?

  4. gr0 Says:

    Vijay, as I wrote in the beginning of the post, pivot facet are not available in Solr 1.4.1 – you need to be using trunk version of Solr from the SVN repository.

  5. Rajani Maski Says:

    I run a normal facet query with q parameter q=*:* and did facet=on&facet.field=stock&facet.filed=place&facet.field=quantity&facet.mincout=1

    Results i got is-

    10
    10
    10
    10

    10
    10

    10
    10

    Now when I am doing this facet.pivot query with same q paramater (q= *:* )and same data set ..
    query – facet.pivot=stock,place,quality&facet.mincout=1

    Result I get is like this-

    The point is .. Why I am not getting result hirearchy for “wheat” when it is coming in the flat faceting above.

  6. Rajani Maski Says:

    Sorry above post dint include my tag solr results. Don’t know why!
    10 had tags attached.
    Let me repost

  7. Rajani Maski Says:

    10
    10
    10
    10

    10
    10

    10
    10

  8. Rajani Maski Says:

    It is not letting me add solr search results with tags. I have posted same query in solruser list ..

    This is the link:

    http://www.mail-archive.com/solr-user@lucene.apache.org/msg47072.html

  9. gr0 Says:

    What Solr version are you using ? Remeber that this feature requires Solr 4.0, so you need to get that version from the SVN repository.

  10. Steve Says:

    In your opinion, do you think it would be safe to base any software on this at this point? I have a project I’d love to bring to solr that’s being redone in the next 3 months. This feature would be critical to making it possible (without a ton of unnecessary extra facet queries, which would be performance death). I’m really tempted but if it’s only in the dev branch right now… it’s so hard to get a sense of when that means it will be production-ready! I can deal with it right now but is it worth my time when I need it to be released before May…?

  11. gr0 Says:

    In my opinion the 4.0 version won’t be released before May. I personally didn’t use 4.0 in production environment. But, from my personal experience, we didn’t have any problems when going into production on so called “dev” branches, actually almost all projects I’ve deal with was run on development branches. It’s actually up to you, but if you are sure, you can test functionalities before going live and pivot facets are a crucial functionality I would consider taking the 4.0 version. Just remember that when upgrading between versions of Solr 4.0 there may be a need of full indexation.