Solr 5.1: New faceting API (quick look)

With the recent release of Solr 5.1 we’ve got a nice, new functionalities in Solr. One of those new features is the new faceting module that allows to send reuquests in JSON format in the request body. In this blog entry we will try to quickly look at the functionality and see how Sorl changes when it comes to real time data analysis.

Test data

For the purpose of our tests I’ll use the data provided with the default Solr distribution. To run Solr and index the data I use the following command:

bin/solr start -e techproducts

Starting Solr using this command will result in both starting Solr and indexing the data to a new collection called techproducts.

The first example

Let’s start with a very simple example of using the new faceting module – let’s try running a query that will result in Solr returning a list of categories and manufactures of our products with counts. The query we are used to would look as follows:

curl 'localhost:8983/solr/techproducts/select?q=*:*&rows=0&indent=true&facet=true&facet.field=manu_id_s&facet.field=cat'

The query that will use the new faceting API looks as follows:

curl http://localhost:8983/solr/techproducts/query -d 'q=*:*&rows=0&
 json.facet={
  categories : {
   terms : {
    field: cat
   }
  },
  producers : {
   terms : {
    field: manu_id_s
   }
  }
 }'

The result returned by Solr is as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":2,
    "params":{
      "q":"*:*",
      "json.facet":"{\n  categories : {\n   terms : {\n    field: cat\n   } \n  },\n  manu : {\n   terms : {\n    field: manu_id_s\n   }\n  }\n}",
      "rows":"0"}},
  "response":{"numFound":32,"start":0,"docs":[]
  },
  "facets":{
    "count":32,
    "categories":{
      "buckets":[{
          "val":"electronics",
          "count":12},
        {
          "val":"currency",
          "count":4},
        {
          "val":"memory",
          "count":3},
        {
          "val":"connector",
          "count":2},
        {
          "val":"graphics card",
          "count":2},
        {
          "val":"hard drive",
          "count":2},
        {
          "val":"search",
          "count":2},
        {
          "val":"software",
          "count":2},
        {
          "val":"camera",
          "count":1},
        {
          "val":"copier",
          "count":1}]},
    "manu":{
      "buckets":[{
          "val":"corsair",
          "count":3},
        {
          "val":"belkin",
          "count":2},
        {
          "val":"canon",
          "count":2},
        {
          "val":"apple",
          "count":1},
        {
          "val":"asus",
          "count":1},
        {
          "val":"ati",
          "count":1},
        {
          "val":"boa",
          "count":1},
        {
          "val":"dell",
          "count":1},
        {
          "val":"eu",
          "count":1},
        {
          "val":"maxtor",
          "count":1}]}}}

As we can see – both the query and the results is not somthing we are used to, let’s discuss both briefly.

Zapytanie

The second query has been sent do the query handler, but that is not that important. What we are interested in is the query itself. All the parameters of the query were not send as the HTTP request parameters, but instead we’ve included them in the request body, although still in a form that we are used to. In addition to that, we’ve included a new thing the facet.json parameter, which will holds our faceting definition.

In Solr 5.1, each faceting definition we want to use should be defined as follows:

NAME: {
 TYPE: {
  PARAMETER: PARAMETER_VALUE
  ...
  PARAMETER: PARAMETER_VALUE
 }
}

In Solr 5.2 this format will be slightly changed, but we will get back to it once Solr 5.2 will be released. In our example we’ve used the terms faceting type, which works similar to the facet.field we are used to. The other possible types are for example query or range. Each of those type have a certain properties that can be used with them – we advise to visit offical Solr documentation to get information on what the parameters are.

The results

If we would look at the results returned by Solr we will see something called buckets. Those are key – value pairs that describe the returned faceting results. Something new, but we’ve already seen that with the standard faceting, just named differently. However, when using the new faceting API we can expect two types of responses – one is the buckets and the other is a single value. We will get back to it and describe why it is important when Solr 5.2 will be released.

Faceting and functions

Let’s try one more example – finding the average price for our products and the 99 percentile. With the new faceting API it is quick and easy – we can use functions in faceting. For example, the query that will fullfil our requirement looks as follows:

curl http://localhost:8983/solr/techproducts/query -d 'q=*:*&rows=0&
 json.facet={
  average:"avg(price)",
  percentile:"percentile(price,99)"
 }'

The response generated by Solr looks as follows:

{
  "responseHeader":{
    "status":0,
    "QTime":2,
    "params":{
      "q":"*:*",
      "json.facet":"{\n  average:\"avg(price)\",\n  percentile:\"percentile(price,99)\"\n }",
      "rows":"0"}},
  "response":{"numFound":32,"start":0,"docs":[]
  },
  "facets":{
    "count":32,
    "average":164.10218846797943,
    "percentile":1966.6484985351556}}

Solr returned the results we wanted in the form of single value for each faceting.

What’s next?

Of course the features we’ve looked at today is not all that Solr offers and will offer in the near future. First of all the number of functions that we can use in faceting will be extended – for example we will be able to calculate unique values. In addition to that, in Solr 5.2, we will get the ability to nest facets inside other facets and do calculations like – getting minimum price for each of the categories that we calculated using terms faceting.

This post is also available in: Polish

This entry was posted on Saturday, May 30th, 2015 at 11:16 and is filed under About Solr, Solr. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.