The amount of announcements in our database is so large, that our web site users started to look for another option to filter search results and another way of sorting them. We need to add the functionality, which allows us to operate with localization data related to the cars.
Requirements specification
We would like to add two new functionalities:
- Filtering the results in order to display only those announcements, that are located not farther than x kilometres from the given place, where x = 50,100,200,500,1000 km.
- Sorting the results using the distance between the given place and the given car’s localization.
In order to face the requirements, we need to use solr’s functionality called “Spatial Search”, that is available in solr distribution from version 3.1. The changes we need to provide are related to schema.xml file modifications and the input data changes, where we have to add the information about the localization of every car. In the end we will create proper requests.
Schema.xml changes
- New field types definitions:
- the first definition is nothing more than another numerical type:
<fieldType name="tdouble" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
- the second definition uses the “solr.LatLonType” class, which allows us to index localization data using the dynamic field with suffix “_coordinate”:
- New fields definitions:
- field, that will be used to accumulate the city name data, that is related to every car:
<field name="city" type="string" indexed="true" stored="true" />
- “loc” field will be used to index localization data:
- the dynamic field used internally to accumulate the information provided by the “loc” field:
<fieldType name="location" subFieldSuffix="_coordinate"/>
<field name="loc" type="location" indexed="true" stored="false"/>
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
Input data analysis
In order to present how to modify the input data, let’s take 5 announcements from the cities:
- Koszalin
- latitude: 54.12
- longitude: 16.11
- Białystok
- latitude: 53.08
- longitude: 23.09
- Szczecin
- latitude: 53.25
- longitude: 14.35
- Gdańsk
- latitude: 54.21
- longitude: 18.40
- Warszawa
- latitude: 52.15
- longitude: 21.00
We provide the localization data by entering the latitude and longitude separated by the comma in the “loc” field. Our data might look like this:
<add> <doc> <field name="id">1</field> <field name="make">Audi</field> <field name="model">80</field> <field name="year">2008</field> <field name="price">9774</field> <field name="engine_size">2000</field> <field name="mileage">92467</field> <field name="colour">green</field> <field name="damaged">false</field> <field name="city">Koszalin</field> <field name="loc">54.12,16.11</field> </doc> <doc> <field name="id">2</field> <field name="make">Audi</field> <field name="model">A8</field> <field name="year">2009</field> <field name="price">9078</field> <field name="engine_size">1000</field> <field name="mileage">31369</field> <field name="colour">black</field> <field name="damaged">false</field> <field name="city">Białystok</field> <field name="loc">53.08,23.09</field> </doc> <doc> <field name="id">3</field> <field name="make">Audi</field> <field name="model">TT</field> <field name="year">1997</field> <field name="price">1109</field> <field name="engine_size">1299</field> <field name="mileage">116987</field> <field name="colour">silver</field> <field name="damaged">true</field> <field name="city">Szczecin</field> <field name="loc">53.25,14.35</field> </doc> <doc> <field name="id">4</field> <field name="make">BMW</field> <field name="model">Seria 7</field> <field name="year">2007</field> <field name="price">140000</field> <field name="engine_size">3000</field> <field name="mileage">418000</field> <field name="colour">green</field> <field name="damaged">false</field> <field name="city">Gdańsk</field> <field name="loc">54.21,18.40</field> </doc> <doc> <field name="id">5</field> <field name="make">Chevrolet</field> <field name="model">TrailBlazer</field> <field name="year">2007</field> <field name="price">140000</field> <field name="engine_size">3000</field> <field name="mileage">418000</field> <field name="colour">green</field> <field name="damaged">false</field> <field name="city">Warszawa</field> <field name="loc">52.15,21.00</field> </doc> </add>
Let’s create queries
We have our localization data in the index, so all we need right now is to create queries that will satisfy our needs. Let’s imagine, that we are searching for announcements when being in Białystok city, which is located about 200 km away from the Warszawa city, about 400 km away from the Gdańsk city, about 550 km away from the Koszalin city and about 650 km away from the Szczecin city.
To execute the first point from the requirements specification, we add the special filter query to our request:
...&fq={!geofilt sfield=loc}&pt=53.08,23.09&d=50
where:
- sfield – the name of the field, where we have our localization data indexed.
- pt – the localization of the starting point, it is the Białystok city in our case.
- d – the distance used to narrow the search results. By using the 50,100,200,500,1000 values we can satisfy all our needs.
Example:
- Query:
q=*:*&fq={!geofilt sfield=loc}&pt=53.08,23.09&d=200
- Search results:
<result name="response" numFound="2" start="0"> <doc> <str name="city">Białystok</str> <str name="colour">black</str> <bool name="damaged">false</bool> <int name="engine_size">1000</int> <str name="id">2</str> <str name="make">Audi</str> <int name="mileage">31369</int> <str name="model">A8</str> <float name="price">9078.0</float> <int name="year">2009</int> </doc> <doc> <str name="city">Warszawa</str> <str name="colour">green</str> <bool name="damaged">false</bool> <int name="engine_size">3000</int> <str name="id">5</str> <str name="make">Chevrolet </str> <int name="mileage">418000</int> <str name="model">TrailBlazer</str> <float name="price">140000.0</float> <int name="year">2007</int> </doc> </result>
That’s great, we don’t have any announcements from the Koszalin, Gdańsk or Szczecin city, as these cities are located farther than 200 km from the Białystok city.
To execute the second point from the requirements specification, we use the possibility to sort the search results by using the geodist function. The query would look like this:
...&sfield=loc&pt=53.08,23.09&sort=geodist()+desc
The example of sorting the search results using the distance, starting from the Białystok city:
- Query:
q=*:*&sfield=loc&pt=53.08,23.09&sort=geodist()+asc
- Search results:
<result name="response" numFound="5" start="0"> <doc> <str name="city">Bialystok</str> <str name="colour">black</str> <bool name="damaged">false</bool> <int name="engine_size">1000</int> <str name="id">2</str> <str name="make">Audi</str> <int name="mileage">31369</int> <str name="model">A8</str> <float name="price">9078.0</float> <int name="year">2009</int> </doc> <doc> <str name="city">Warszawa</str> <str name="colour">green</str> <bool name="damaged">false</bool> <int name="engine_size">3000</int> <str name="id">5</str> <str name="make">Chevrolet </str> <int name="mileage">418000</int> <str name="model">TrailBlazer</str> <float name="price">140000.0</float> <int name="year">2007</int> </doc> <doc> <str name="city">Gdańsk</str> <str name="colour">green</str> <bool name="damaged">false</bool> <int name="engine_size">3000</int> <str name="id">4</str> <str name="make">BMW</str> <int name="mileage">418000</int> <str name="model">Seria 7</str> <float name="price">140000.0</float> <int name="year">2007</int> </doc> <doc> <str name="city">Koszalin</str> <str name="colour">green</str> <bool name="damaged">false</bool> <int name="engine_size">2000</int> <str name="id">1</str> <str name="make">Audi</str> <int name="mileage">92467</int> <str name="model">80</str> <float name="price">9774.0</float> <int name="year">2008</int> </doc> <doc> <str name="city">Szczecin</str> <str name="colour">silver</str> <bool name="damaged">true</bool> <int name="engine_size">1299</int> <str name="id">3</str> <str name="make">Audi</str> <int name="mileage">116987</int> <str name="model">TT</str> <float name="price">1109.0</float> <int name="year">1997</int> </doc> </result>
That’s correct! Mission accomplished.
The end
Once more we are up to our website users expectations. This time we have added the functionalities, which allow our users to filter and sort the search results using the localization and distance data. Full success!