Faceting is one of the ways to categorize the content found in the process of information retrieval. In case of Solr this is the division of set of documents on the basis of certain criteria: content of individual fields, queries or on the basis of compartments or dates. In today’s entry I will try to some scope on the possibility of using the faceting mechanism, both currently available in Solr 1.4.1, as well as what will be available in the future.
One of the few sources of information about faceting is Solr wiki – to be more specific – the page at: http://wiki.apache.org/solr/SimpleFacetParameters. The following article is an extension to the information available on the wiki website.
Solr faceting mechanism can be divided into four basic types:
- field faceting,
- query faceting,
- date faceting,
- range facteing.
To turn Solr faceting mechanism on, one need to pass facet parameter with the value true.
Field faceting
First type of faceting. This type of faceting categorize documents found due to specified field. With this type of faceting we are able to get a number of documents found for example in each category or geographical location. Faceting by field is characterized by a large number of options which configure its behavior. This are the parameters available for use:
- facet.field – parameter specifying which field will be used to perform faceting. This parameter can be specified multiple times. Remember that adding multiple facet.field parameters to the query can affect performance.
- facet.prefix – restricts faceting results to those that begin with the specified prefix. The parameter can be defined for each field specified by the facet.field parameter – you can do it, by adding a parameter like this: facet.field_name.prefix. This parameter is a relatively simple way to implement autocomplete mechanism.
- facet.sort – specifies how to sort faceting results. If You use Solr version lower than 1.4, this parameter takes values of true or false indicating successively – sort by the number of results and sort by index order (in the case of ASCII this means alphabetical sorting). If however You are using Solr version 1.4 or higher You should use count value (meaning the same as true), or index value (meaning the same as false). The default value for this parameter is true/count when facet.limit set to 0 or false/index for other values of facet.limit parameter. The parameter can be defined for each field specified by the facet.field parameter.
- facet.limit – parameter specifying how many unique values of faceting results to display. A negative value for this parameter mean no limit. Please note that the larger the limit, the more memory you need and the longer query execution. Default parameter value is 100. The parameter can be defined for each field specified by the facet.field parameter.
- facet.offset – parameter defining from offset (from the first faceting result) of presented faceting results. Default parameter value is 0. This parameter is designed to help implementing faceting result paging. The parameter can be defined for each field specified by the facet.field parameter.
- facet.mincount – parameter specifying the minimum size of result to be included in faceting results. The default value is 0. The parameter can be defined for each field specified by the facet.field parameter.
- facet.missing – parameter specifying whether, in addition to standard faceting results, number of documents without a value in the specified field should be included. This parameter can take values of true or false. The default parameter value is false. The parameter can be defined for each field specified by the facet.field parameter.
- facet.method – parameter introduced in Solr 1.4. It takes the value of enum or fc. Specifies a method for faceting calculation. Setting this parameter to enum effects in using term enumeration to calculate faceting results. This method is proven to be most efficient when dealing with fields with small number of unique terms. The second method, labeled fc, is the standard method for faceting calculation. It takes all the results and iterate over all documents in the result set. The parameter can be defined for each field specified by the facet.field parameter. The default value is fc for all the fields not based on Boolean type.
- facet.enum.cache.minDf – parameter with strange sounding name specifying the minimum number of matching documents to a single term to the fc method to be used for faceting result calculation. I know it sounds strange but i do not know how to explain it easier 😉
These are the parameters of field faceting. In case of most parameters I have written that there is a possibility to define their values for each field specified by facet.field parameter. How does it look like ? Suppose we have a query like this:
q=solr&facet=true&facet.field=category&facet.field=location
It is a simple query for ‘solr’ term with faceting mechanism turned on. There are two facet fields defined – category and location. Lets say, that we would like to have 200 facet results for category field sorted by count and 50 facet results for location field sorted alphabetically. To do that we add the following fragment to the query shown above:
facet.category.limit=200&facet.category.sort=count&facet.location.limit=50&facet.location.sort=index
As shown we can easily modify facet mechanism behavior for individual facet fields.
Query faceting
Facet mechanism based on a single parameter – facet.query to which we give a query. The query passed to the parameter must be constructed so that standard Lucene query parser can understand it. An example use of this parameter is, for example query a group of pricing, which could look like:
facet.query=price: [0+TO+100]
Note, however, that each added facet.query parameter is another query to Lucene, which means performance loss. Many facet.query parameters in a query can be painful to Solr.
There is one more thing worth mentioning when talking about query faceting – there is a possibility to define your own parser to parse facet.query parameter value. To use your own parser, for example, called myParser parameter passed to Solr should look like this:
facet.query={!myParser}aaa
Date faceting
New faceting functionality introduced in Solr 1.3. Date faceting allows you to calculate faceting results including all the intricacies of processing dates. Please note that date faceting can only be used with fields based on the type solr.DateField. Now let’s get on with the parameters associated with date faceting:
- facet.date – like facet.field parameter, this parameter is used to identify fields where dates faceting should be used. As in the case of facet.field parameter you can specify this parameter several times to allow date faceting on many fields in one query.
- facet.date.start – parameter specifying the lower limit of date on which the faceting calculation should be started. This parameter can be defined for each field specified by the facet.date parameter. This parameter is mandatory when using facet.date and should be defined for each facet.date parameter.
- facet.date.end – parameter defining the upper limit of the date, on which the faceting calculation should be ended. This parameter can be defined for each field specified by the facet.date parameter. This parameter is mandatory when using facet.date and should be defined for each facet.date parameter.
- facet.date.gap – parameter specifying date compartments to be generated for the defined boundaries. This parameter is mandatory when using facet.date and should be defined for each facet.date parameter. The parameter can be defined for each field specified by the facet.date parameter.
- facet.date.hardend – parameter taking values true and false, telling Solr what to do in the case when the parameter facet.date.gap is not evenly splitting the compartments. If we set this parameter to true the last compartment generated by facet.date.gap parameter can be wider than the boundary defined by facet.date.end parameter. If we set this parameter to false (default value) the last compartment generated by facet.date.gap parameter can be smaller then the rest of the ranges. The parameter can be defined for each field specified by the facet.date parameter.
- facet.date.other – parameter specifying what values besides the standard ones (ranges) should be added to results of date faceting. The parameter can be defined for each field specified by the facet.date parameter. The parameter can take following values:
- before – in addition to the standard date faceting results, there will be one more – number of documents with a date before the one defined in the facet.date.start parameter,
- after – in addition to the standard date faceting results, there will be one more – number of documents with the date after the one defined in the facet.date.end parameter,
- between – in addition to the standard date faceting results, there will be one more – number of documents with the date between facet.date.start and facet.date.end parameters,
- all – a shortcut to define all the above,
- none – none of the additional results will be added to date faceting results.
- facet.date.include – parameter that will be introduced in Solr 4.0. It allows of closing or opening of the compartments defined by the boundaries and the gap. The parameter will accept the following values:
- lower – each of the resulting compartment will contain its lower limit,
- upper – each of the resulting compartment will contain its upper limit,
- egde – the first and last interval will include its external borders – that is, for the first lower and upper range for the last interval,
- outer – a parameter specifying that the compartments defined by the values before and after of the facet.date.other parameter will contain its borders, even if other compartments already contain these borders,
- all – a parameter that causes the inclusion of all of the above options.
That is how we can modify the behavior of the date faceting. Now, some example of using this kind of faceting:
q=solr&facet=true&facet.date=addDate&facet.date.start=NOW/DAY-30DAYS&facet.date.end=NOW/DAY%2B30DAYS&facet.date.gap=%2B1DAY
What does the above query do ? We turn the faceting mechanism on, we define date faceting for addDate field. What we want to get is the compartments between 30 days before today (NOW/DAY-30DAYS) and 30 days after today (NOW/DAY+30DAYS). The compartments will be of the size of a single day.
Range faceting
Functionality which will be available in Solr 3.1. If someone want to test it right now, both the trunk and branch 3.x have this functionality implemented. This method of faceting is the extension of date faceting. This functionality works similar to date faceting – as a result we get a list of compartments constructed automatically based on parameters. Here are the list of parameters that can be used to define range faceting behavior:
- facet.range – like facet.field parameter, this parameter is used to identify fields where range faceting should be used. As in the case of facet.field parameter you can specify this parameter several times to allow range faceting on many fields in one query.
- facet.range.start – parameter specifying the lower limit of range on which the faceting calculation should be started. This parameter can be defined for each field specified by the facet.range parameter. This parameter is mandatory when using facet.range and should be defined for each facet.range parameter.
- facet.range.end – parameter defining the upper limit of the range, on which the faceting calculation should be ended. This parameter can be defined for each field specified by the facet.range parameter. This parameter is mandatory when using facet.range and should be defined for each facet.range parameter.
- facet.range.gap – parameter specifying range compartments to be generated for the defined boundaries. This parameter is mandatory when using facet.range and should be defined for each facet.date parameter. The parameter can be defined for each field specified by the facet.date parameter.
- facet.range.hardend – parameter taking values true and false, telling Solr what to do in the case when the parameter facet.range.gap is not evenly splitting the compartments. If we set this parameter to true the last compartment generated by facet.range.gap parameter can be wider than the boundary defined by facet.range.end parameter. If we set this parameter to false (default value) the last compartment generated by facet.range.gap parameter can be smaller then the rest of the ranges. The parameter can be defined for each field specified by the facet.rangeparameter.
- facet.range.other – parameter specifying what values besides the standard ones (ranges) should be added to results of range faceting. The parameter can be defined for each field specified by the facet.range parameter. The parameter can take following values:
- before – in addition to the standard range faceting results, there will be one more – number of documents with a values lower than the one defined in the facet.range.start parameter,
- after – in addition to the standard range faceting results, there will be one more – number of documents with the values higher than the one defined in the facet.range.end parameter,
- between – in addition to the standard range faceting results, there will be one more – number of documents with the values between facet.range.start and facet.range.end parameters,
- all – a shortcut to define all the above,
- none – none of the additional results will be added to range faceting results.
- facet.range.include – parameter allowing closing or opening of the compartments defined by the boundaries and the gap. The parameter will accept the following values:
- lower – each of the resulting compartment will contain its lower limit,
- upper – each of the resulting compartment will contain its upper limit,
- egde – the first and last interval will include its external borders – that is, for the first lower and upper range for the last interval,
- outer – a parameter specifying that the compartments defined by the values before and after of the facet.range.other parameter will contain its borders, even if other compartments already contain these borders,
- all – a parameter that causes the inclusion of all of the above options.
As you can see the range faceting parameters are almost identical to those in date faceting. The behavior is also almost identical. An example query using ranges faceting may be the following query:
q=solr&facet=true&facet.range=price&facet.range.start=0&facet.range.end=1000&facet.range.gap=100
So, we went through all of the types of faceting. But thats not all. Users of Solr version 1.4 and higher have the opportunity to use the so-called LocalParams.
LocalParams and faceting
Suppose we have a requirement. We have a query that returns search results for the term ‘solr’ and in which we have defined two filters, one for category and one for the country of origin of the document. In addition to the search results we want to enable navigation through the regions and categories, but we would like them not to be dependend on each other. That is, we want to give the opportunity to navigate through the regions for the term ‘solr’ but we dont want it to be limited to the selected category, and vice versa. To do it in Solr version 1.3 or earlier, we would write the following query:
q=solr&fq=category:search&fq=region:poland q=solr&facet=true&facet.field=category&facet.field=region
Two queries, because first we have to get narrowed search results, on the other hand we need the faceting result not to be narrowed by filters. For Solr version 1.4 or higher, we can shorten this to one query. For this purpose, we use the possibility of tagging and exclusion of tagged parameters. First we change the query as follows:
q=solr&fq={!tag=categoryFQ}fq=category:search&fq={!tag=regionFQ}region:poland
For now, the search results will not change. We added tags to the filters in the above query so we can later exclude them in faceting. Then we modify the second query as follows:
q=solr&facet=true&facet.field={!ex=categoryFQ,regionFQ}category&facet.field={!ex=categoryFQ,regionFQ}region
So far the faceting results will not change. We added exclusions to the facet.field parameters, so filters named categoryFQ and regionFQ will not be taken into consideration when calculating faceting results.
Then we combine the modified query, so it should look as follows:
q=solr&fq={!tag=categoryFQ}fq=category:search&fq={!tag=regionFQ}region:poland&facet=true&facet.field={!ex=categoryFQ,regionFQ}category&facet.field={!ex=categoryFQ,regionFQ}region
I`ll write more about LocalParams in a future entries.
A few words at the end
I hope that this article approached the possibility of using Solr faceting, both in earlier versions of Solr, in the present, as well as those that arise in the nearest future.