“Car sale application”– Result Grouping, let’s group some search results (part 6)

In today’s post we will try to add to our car sale application the new functionality, which allows us to group some search results. Let’s imagine a user who would like to search for “audi a4” advertisements and as a result get the results grouped by car’s year of production, with 2-3 results in every group. And how about some range grouping, for example mileage ranges? Today we will accept the challenge.

New functionality request parameters description

Result grouping functionality is available since solr 3.3. Let’s get to know some of it’s request parameters we will surely need:

  • group – turn on and off result grouping
  • group.field – field name used to group search results. We have to be sure that the field used for grouping (year of production in our case) is single-valued and have the string/text type
  • group.query – query used to group results by ranges, for example mileage ranges
  • group.limit – the number of results to return for each group

This four basic parameters allow us to achieve what we want.

schema.xml changes

Possible schema.xml changes can be made in order to be sure that the group field is of the proper type (“string” or “text”). We would like to group our search results by “year” field, so let’s recall how the definition looks right now:

<field name="year" type="tint" indexed="true" stored="true" required="true" />

The field is of integer type. In order to be able to group results using this field, we create another “year” field, let’s call it “year_group”, which will have the string type:

<field name="year_group" type="string" indexed="true" stored="false" />

and copy the content of the “year” field to the new field called “year_group”:

<copyField source="year" dest="year_group"/>

That’s practically all the changes we should do in our schema.xml configration file.

Some sample data

Let’s now create some sample data in order to test the new functionality. We assume that we have some samples of Audi A4 car data. Two of them are year 2002, another two 2003 and the last one is 2006. Additionally, one of them has the mileage below 100 000 km, three of them have the mileage in the range between 100 000 km and 199 999 km and the last one has the mileage over 200 000 km:

<add>
   <doc>
      <field name="id">1</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2002</field>
      <field name="price">22700</field>
      <field name="engine_size">1900</field>
      <field name="mileage">197000</field>
      <field name="colour">green</field>
      <field name="damaged">false</field>
      <field name="city">Koszalin</field>
      <field name="loc">54.12,16.11</field>
   </doc>
   <doc>
      <field name="id">2</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2003</field>
      <field name="price">27800</field>
      <field name="engine_size">1900</field>
      <field name="mileage">220000</field>
      <field name="colour">black</field>
      <field name="damaged">false</field>
      <field name="city">Bialystok</field>
      <field name="loc">53.08,23.09</field>
   </doc>
   <doc>
      <field name="id">3</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2002</field>
      <field name="price">21300</field>
      <field name="engine_size">1900</field>
      <field name="mileage">125000</field>
      <field name="colour">black</field>
      <field name="damaged">false</field>
      <field name="city">Szczecin</field>
      <field name="loc">53.25,14.35</field>
   </doc>
   <doc>
      <field name="id">4</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2003</field>
      <field name="price">30300</field>
      <field name="engine_size">1900</field>
      <field name="mileage">150000</field>
      <field name="colour">red</field>
      <field name="damaged">false</field>
      <field name="city">Gdansk</field>
      <field name="loc">54.21,18.40</field>
   </doc>
  <doc>
      <field name="id">5</field>
      <field name="make">Audi</field>
      <field name="model">A4</field>
      <field name="year">2006</field>
      <field name="price">32100</field>
      <field name="engine_size">1900</field>
      <field name="mileage">9900</field>
      <field name="colour">red</field>
      <field name="damaged">false</field>
      <field name="city">Swidnik</field>
      <field name="loc">52.15,21.00</field>
   </doc>
</add>

Let’s create queries

Using the parameters described at the beginning of the article, we create the “audi A4” query, which will show us some search results grouped by the year of production:

?q=audi+a4&group=true&group.field=year_group&group.limit=2&fl=id,mileage,make,model,year

As we see, we have limited the results in every group to max 2. In response we would like to have only those fields, which will help us clearly and readably identify the documents, so: id, mileage, make, model and year. As a result we have the response:

<lst name="grouped">
  <lst name="year_group">
    <int name="matches">5</int>
    <arr name="groups">
      <lst>
        <str name="groupValue">2002</str>
        <result name="doclist" numFound="2" start="0">
          <doc>
            <str name="id">1</str>
            <str name="make">Audi</str>
            <int name="mileage">197000</int>
            <str name="model">A4</str>
            <int name="year">2002</int>
          </doc>
          <doc>
            <str name="id">3</str>
            <str name="make">Audi</str>
            <int name="mileage">125000</int>
            <str name="model">A4</str>
            <int name="year">2002</int>
          </doc>
        </result>
      </lst>
      <lst>
        <str name="groupValue">2003</str>
        <result name="doclist" numFound="2" start="0">
          <doc>
            <str name="id">2</str>
            <str name="make">Audi</str>
            <int name="mileage">220000</int>
            <str name="model">A4</str>
            <int name="year">2003</int>
          </doc>
          <doc>
            <str name="id">4</str>
            <str name="make">Audi</str>
            <int name="mileage">150000</int>
            <str name="model">A4</str>
            <int name="year">2003</int>
          </doc>
        </result>
      </lst>
      <lst>
        <str name="groupValue">2006</str>
        <result name="doclist" numFound="1" start="0">
          <doc>
            <str name="id">5</str>
            <str name="make">Audi</str>
            <int name="mileage">9900</int>
            <str name="model">A4</str>
            <int name="year">2006</int>
          </doc>
        </result>
      </lst>
    </arr>
  </lst>
</lst>

Let’s analyse the response. We have 5 matches:

<int name="matches">5</int>

The response has been split into 3 independent groups:

  1. <str name="groupValue">2002</str>

    where we have two (numFound=”2″) 2002 cars

  2. <str name="groupValue">2003</str>

    where we have two (numFound=”2″) 2003 cars

  3. <str name="groupValue">2006</str>

    where we have one (numFound=”1″) 2006 car

That’s correct!

Now let’s create query, which will group our search results by the mileage ranges. We assume that we have 3 ranges:

  1. <0km ; 99999km>
  2. <100000km ; 199999km>
  3. <200000km ; * >

Query:

?q=audi+a4&group=true&group.query=mileage:[0+TO+99999]&group.query=mileage:[100000+TO+199999]&group.query=mileage:[200000+TO+*]&group.limit=3&fl=id,mileage,make,model,year

and response:

<lst name="grouped">
  <lst name="mileage:[0 TO 99999]">
    <int name="matches">5</int>
    <result name="doclist" numFound="1" start="0">
      <doc>
        <str name="id">5</str>
        <str name="make">Audi</str>
        <int name="mileage">9900</int>
        <str name="model">A4</str>
        <int name="year">2006</int>
      </doc>
    </result>
  </lst>
  <lst name="mileage:[100000 TO 199999]">
    <int name="matches">5</int>
    <result name="doclist" numFound="3" start="0">
      <doc>
        <str name="id">1</str>
        <str name="make">Audi</str>
        <int name="mileage">197000</int>
        <str name="model">A4</str>
        <int name="year">2002</int>
      </doc>
      <doc>
        <str name="id">3</str>
        <str name="make">Audi</str>
        <int name="mileage">125000</int>
        <str name="model">A4</str>
        <int name="year">2002</int>
      </doc>
      <doc>
        <str name="id">4</str>
        <str name="make">Audi</str>
        <int name="mileage">150000</int>
        <str name="model">A4</str>
        <int name="year">2003</int>
      </doc>
    </result>
  </lst>
  <lst name="mileage:[200000 TO *]">
    <int name="matches">5</int>
    <result name="doclist" numFound="1" start="0">
      <doc>
        <str name="id">2</str>
        <str name="make">Audi</str>
        <int name="mileage">220000</int>
        <str name="model">A4</str>
        <int name="year">2003</int>
      </doc>
    </result>
  </lst>
</lst>

Again we have 5 search results. In the first group there is a car with the mileage of 9900 km, in the second group there are cars with the mileage of 197000 km, 125000 km and 150000 km, and finally in the third group there is a car with the mileage of 220000km. We achieve what we wanted. Mission accomplished.

The end

Yet another functionality, this time search results grouping one, is now added to our car sale application. We will surely see what will be the users opinions 🙂

This post is also available in: Polish

This entry was posted on Monday, July 4th, 2011 at 08:59 and is filed under About Solr. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to ““Car sale application”– Result Grouping, let’s group some search results (part 6)”

  1. Rih Says:

    Can the last query example be grouped (further) by a field (e.g. year)?

  2. rA Says:

    You mean something like a subgroup in a group ? No, unfortunately it cannot (yet hopefully:). But within the group you can for example sort the documents using group.sort parameter, for example:
    ?q=audi+a4&group=true&group.query=mileage:[0+TO+99999]&group.query=mileage:[100000+TO+199999]&group.query=mileage:[200000+TO+*]&group.limit=3&fl=id,mileage,make,model,year&group.sort=year+desc
    As the response you should have the documents within every group sorted by year decreasingly.

  3. Rih Says:

    I see. I was thinking of using the said approach as a nested grouping workaround.