“Car sale application” – Spatial Search, adding location data (part 3)

The amount of announcements in our database is so large, that our web site users started to look for another option to filter search results and another way of sorting them. We need to add the functionality, which allows us to operate with localization data related to the cars.

Requirements specification

We would like to add two new functionalities:

  1. Filtering the results in order to display only those announcements, that are located not farther than x kilometres from the given place, where x = 50,100,200,500,1000 km.
  2. Sorting the results using the distance between the given place and the given car’s localization.

In order to face the requirements, we need to use solr’s functionality called “Spatial Search”, that is available in solr distribution from version 3.1. The changes we need to provide are related to schema.xml file modifications and the input data changes, where we have to add the information about the localization of every car. In the end we will create proper requests.

Schema.xml changes

  1. New field types definitions:
    • the first definition is nothing more than another numerical type:
    • <fieldType name="tdouble" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    • the second definition uses the “solr.LatLonType” class, which allows us to index localization data using the dynamic field with suffix “_coordinate”:
    • <fieldType name="location" subFieldSuffix="_coordinate"/>
  2. New fields definitions:
    • field, that will be used to accumulate the city name data, that is related to every car:
    • <field name="city" type="string" indexed="true" stored="true" />
    • “loc” field will be used to index localization data:
    • <field name="loc" type="location" indexed="true" stored="false"/>
    • the dynamic field used internally to accumulate the information provided by the “loc” field:
    • <dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>

Input data analysis

In order to present how to modify the input data, let’s take 5 announcements from the cities:

  1. Koszalin
    • latitude: 54.12
    • longitude: 16.11
  2. Białystok
    • latitude: 53.08
    • longitude: 23.09
  3. Szczecin
    • latitude: 53.25
    • longitude: 14.35
  4. Gdańsk
    • latitude: 54.21
    • longitude: 18.40
  5. Warszawa
    • latitude: 52.15
    • longitude: 21.00

We provide the localization data by entering the latitude and longitude separated by the comma in the “loc” field. Our data might look like this:

<add>
   <doc>
      <field name="id">1</field>
      <field name="make">Audi</field>
      <field name="model">80</field>
      <field name="year">2008</field>
      <field name="price">9774</field>
      <field name="engine_size">2000</field>
      <field name="mileage">92467</field>
      <field name="colour">green</field>
      <field name="damaged">false</field>
      <field name="city">Koszalin</field>
      <field name="loc">54.12,16.11</field>
   </doc>
   <doc>
      <field name="id">2</field>
      <field name="make">Audi</field>
      <field name="model">A8</field>
      <field name="year">2009</field>
      <field name="price">9078</field>
      <field name="engine_size">1000</field>
      <field name="mileage">31369</field>
      <field name="colour">black</field>
      <field name="damaged">false</field>
      <field name="city">Białystok</field>
      <field name="loc">53.08,23.09</field>
   </doc>
   <doc>
      <field name="id">3</field>
      <field name="make">Audi</field>
      <field name="model">TT</field>
      <field name="year">1997</field>
      <field name="price">1109</field>
      <field name="engine_size">1299</field>
      <field name="mileage">116987</field>
      <field name="colour">silver</field>
      <field name="damaged">true</field>
      <field name="city">Szczecin</field>
      <field name="loc">53.25,14.35</field>
   </doc>
   <doc>
      <field name="id">4</field>
      <field name="make">BMW</field>
      <field name="model">Seria 7</field>
      <field name="year">2007</field>
      <field name="price">140000</field>
      <field name="engine_size">3000</field>
      <field name="mileage">418000</field>
      <field name="colour">green</field>
      <field name="damaged">false</field>
      <field name="city">Gdańsk</field>
      <field name="loc">54.21,18.40</field>
   </doc>
   <doc>
      <field name="id">5</field>
      <field name="make">Chevrolet</field>
      <field name="model">TrailBlazer</field>
      <field name="year">2007</field>
      <field name="price">140000</field>
      <field name="engine_size">3000</field>
      <field name="mileage">418000</field>
      <field name="colour">green</field>
      <field name="damaged">false</field>
      <field name="city">Warszawa</field>
      <field name="loc">52.15,21.00</field>
   </doc>
</add>

Let’s create queries

We have our localization data in the index, so all we need right now is to create queries that will satisfy our needs. Let’s imagine, that we are searching for announcements when being in Białystok city, which is located about 200 km away from the Warszawa city, about 400 km away from the Gdańsk city, about 550 km away from the Koszalin city and about 650 km away from the Szczecin city.

To execute the first point from the requirements specification, we add the special filter query to our request:

...&fq={!geofilt sfield=loc}&pt=53.08,23.09&d=50

where:

  • sfield – the name of the field, where we have our localization data indexed.
  • pt – the localization of the starting point, it is the Białystok city in our case.
  • d – the distance used to narrow the search results. By using the 50,100,200,500,1000 values we can satisfy all our needs.

Example:

  1. Query:
    q=*:*&fq={!geofilt sfield=loc}&pt=53.08,23.09&d=200
  2. Search results:
  3. <result name="response" numFound="2" start="0">
       <doc>
          <str name="city">Białystok</str>
          <str name="colour">black</str>
          <bool name="damaged">false</bool>
          <int name="engine_size">1000</int>
          <str name="id">2</str>
          <str name="make">Audi</str>
          <int name="mileage">31369</int>
          <str name="model">A8</str>
          <float name="price">9078.0</float>
          <int name="year">2009</int>
       </doc>
       <doc>
          <str name="city">Warszawa</str>
          <str name="colour">green</str>
          <bool name="damaged">false</bool>
          <int name="engine_size">3000</int>
          <str name="id">5</str>
          <str name="make">Chevrolet </str>
          <int name="mileage">418000</int>
          <str name="model">TrailBlazer</str>
          <float name="price">140000.0</float>
          <int name="year">2007</int>
       </doc>
    </result>

That’s great, we don’t have any announcements from the Koszalin, Gdańsk or Szczecin city, as these cities are located farther than 200 km from the Białystok city.

To execute the second point from the requirements specification, we use the possibility to sort the search results by using the geodist function. The query would look like this:

...&sfield=loc&pt=53.08,23.09&sort=geodist()+desc

The example of sorting the search results using the distance, starting from the Białystok city:

  1. Query:
    q=*:*&sfield=loc&pt=53.08,23.09&sort=geodist()+asc
  2. Search results:
  3. <result name="response" numFound="5" start="0">
       <doc>
          <str name="city">Bialystok</str>
          <str name="colour">black</str>
          <bool name="damaged">false</bool>
          <int name="engine_size">1000</int>
          <str name="id">2</str>
          <str name="make">Audi</str>
          <int name="mileage">31369</int>
          <str name="model">A8</str>
          <float name="price">9078.0</float>
          <int name="year">2009</int>
       </doc>
       <doc>
          <str name="city">Warszawa</str>
          <str name="colour">green</str>
          <bool name="damaged">false</bool>
          <int name="engine_size">3000</int>
          <str name="id">5</str>
          <str name="make">Chevrolet </str>
          <int name="mileage">418000</int>
          <str name="model">TrailBlazer</str>
          <float name="price">140000.0</float>
          <int name="year">2007</int>
       </doc>
       <doc>
          <str name="city">Gdańsk</str>
          <str name="colour">green</str>
          <bool name="damaged">false</bool>
          <int name="engine_size">3000</int>
          <str name="id">4</str>
          <str name="make">BMW</str>
          <int name="mileage">418000</int>
          <str name="model">Seria 7</str>
          <float name="price">140000.0</float>
          <int name="year">2007</int>
       </doc>
       <doc>
          <str name="city">Koszalin</str>
          <str name="colour">green</str>
          <bool name="damaged">false</bool>
          <int name="engine_size">2000</int>
          <str name="id">1</str>
          <str name="make">Audi</str>
          <int name="mileage">92467</int>
          <str name="model">80</str>
          <float name="price">9774.0</float>
          <int name="year">2008</int>
       </doc>
       <doc>
          <str name="city">Szczecin</str>
          <str name="colour">silver</str>
          <bool name="damaged">true</bool>
          <int name="engine_size">1299</int>
          <str name="id">3</str>
          <str name="make">Audi</str>
          <int name="mileage">116987</int>
          <str name="model">TT</str>
          <float name="price">1109.0</float>
          <int name="year">1997</int>
       </doc>
    </result>

That’s correct! Mission accomplished.

The end

Once more we are up to our website users expectations. This time we have added the functionalities, which allow our users to filter and sort the search results using the localization and distance data. Full success!

This post is also available in: Polish

This entry was posted on Monday, March 14th, 2011 at 09:40 and is filed under Bez kategorii. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.