Recently, my attention caught the functionality described in the SOLR-2272 Jira ticket – functionality of SQL JOIN implemented in Solr. In today’s entry I take a look at this functionality.
Preparation
The first thing we need to do is download the code from the main branch (trunk) of Lucene/Solr SVN repository and add the the patch attached to the SOLR-2272. After building Solr we can proceed to testing.
Index structure
For testing purposes, I decided to define the following index structure:
<fields> <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="name" type="text" indexed="true" stored="true"/> <field name="color" type="string" indexed="true" stored="true"/> <field name="parent" type="string" indexed="true" stored="true"/> </fields>
The structure is rather simple. The id field is responsible for storing a unique identifier of the document. The name field holds the name of the document, the color is the color of the product, and the parent field is the identifier for the parent document.
Test data
File with the sample data is as follows:
<add> <doc> <field name="id">1</field> <field name="name">First document</field> <field name="color">Red</field> </doc> <doc> <field name="id">2</field> <field name="name">Second document</field> <field name="color">Yellow</field> </doc> <doc> <field name="id">3</field> <field name="name">Third document</field> <field name="color">Red</field> <field name="parent">1</field> </doc> <doc> <field name="id">4</field> <field name="name">Fourth document</field> <field name="color">Yellow</field> <field name="parent">1</field> </doc> </add>
Query example
For the short tests I conducted I’ve chosen a fairly simple case. I wanted to get documents that are the parents of documents with the value Yellow in the color field. Nothing fancy, at least in the meaning of SQL. To execute this query in the standard Solr we would have to make at least two queries – first to get all documents with the appropriate value in the color field, and then their parent documents. And how it looks when talking about patched Solr 4.0 ? Take a look at the following query:
http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow
Response for the above query looks like this:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">{!join from=parent to=id}color:Yellow</str> </lst> </lst> <result name="response" numFound="1" start="0"> <doc> <str name="color">Red</str> <str name="id">1</str> <str name="name">First document</str> </doc> </result> </response>
As you can see we got the results we wanted. But how does that query work ? As you can see in the example, apart from the standard query for a document with the given value in the field (color:Yellow) I’ve added a strange looking fragment {!join from=parent to=id}. This fragment tells Solr to show not the documents with the given field value, but those, which have the value in the id field the same as the parent field of the document that match the given query.
Verification
Let’s see if the result we got are the right ones. First let’s fetch the documents that have Yellow value in the color field. To do that, we send the following query to Solr:
http://localhost:8983/solr/select?q=color:Yellow
The response was as follows:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int> <lst name="params"> <str name="q">color:Yellow</str> </lst> </lst> <result name="response" numFound="2" start="0"> <doc> <str name="color">Yellow</str> <str name="id">2</str> <str name="name">Second document</str> </doc> <doc> <str name="color">Yellow</str> <str name="id">4</str> <str name="name">Fourth document</str> <str name="parent">1</str> </doc> </result> </response>
So we got one document which have a value in the parent field – this value is 1. Document with id field equals to 1 is the following one:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">4</int> <lst name="params"> <str name="indent">true</str> <str name="q">id:1</str> </lst> </lst> <result name="response" numFound="1" start="0"> <doc> <str name="color">Red</str> <str name="id">1</str> <str name="name">First document</str> </doc> </result> </response>
As you can see, the JOIN query worked correctly 😉
A few words at the end
Keep in mind that at the moment the functionality is under development and may change. The important this is that with Solr version 4.0 we will be able to enjoy the functionality similar to SQL JOIN.