Waiting for 4.0: SOLR-2272 – Solr and JOIN functionality

Recently, my attention caught the functionality described in the SOLR-2272 Jira ticket – functionality of SQL JOIN implemented in Solr. In today’s entry I take a look at this functionality.

Preparation

The first thing we need to do is download the code from the main branch (trunk) of Lucene/Solr SVN repository and add the the patch attached to the SOLR-2272. After building Solr we can proceed to testing.

Index structure

For testing purposes, I decided to define the following index structure:

<fields>
  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="name" type="text" indexed="true" stored="true"/>
  <field name="color" type="string" indexed="true" stored="true"/>
  <field name="parent" type="string" indexed="true" stored="true"/>
</fields>

The structure is rather simple. The id field is responsible for storing a unique identifier of the document. The name field holds the name of the document, the color is the color of the product, and the parent field is the identifier for the parent document.

Test data

File with the sample data is as follows:

<add>
 <doc>
  <field name="id">1</field>
  <field name="name">First document</field>
  <field name="color">Red</field>
 </doc>
 <doc>
  <field name="id">2</field>
  <field name="name">Second document</field>
  <field name="color">Yellow</field>
 </doc>
 <doc>
  <field name="id">3</field>
  <field name="name">Third document</field>
  <field name="color">Red</field>
  <field name="parent">1</field>
 </doc>
 <doc>
  <field name="id">4</field>
  <field name="name">Fourth document</field>
  <field name="color">Yellow</field>
  <field name="parent">1</field>
 </doc>
</add>

Query example

For the short tests I conducted I’ve chosen a fairly simple case. I wanted to get documents that are the parents of documents with the value Yellow in the color field. Nothing fancy, at least in the meaning of SQL. To execute this query in the standard Solr we would have to make at least two queries – first to get all documents with the appropriate value in the color field, and then their parent documents. And how it looks when talking about patched Solr 4.0 ? Take a look at the following query:

http://localhost:8983/solr/select?q={!join from=parent to=id}color:Yellow

Response for the above query looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">1</int>
  <lst name="params">
    <str name="q">{!join from=parent to=id}color:Yellow</str>
  </lst>
</lst>
<result name="response" numFound="1" start="0">
  <doc>
    <str name="color">Red</str>
    <str name="id">1</str>
    <str name="name">First document</str>
  </doc>
</result>
</response>

As you can see we got the results we wanted. But how does that query work ? As you can see in the example, apart from the standard query for a document with the given value in the field (color:Yellow) I’ve added a strange looking fragment {!join from=parent to=id}. This fragment tells Solr to show not the documents with the given field value, but those, which have the value in the id field the same as the parent field of the document that match the given query.

Verification

Let’s see if the result we got are the right ones. First let’s fetch the documents that have Yellow value in the color field. To do that, we send the following query to Solr:

http://localhost:8983/solr/select?q=color:Yellow

The response was as follows:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">2</int>
  <lst name="params">
    <str name="q">color:Yellow</str>
  </lst>
</lst>
<result name="response" numFound="2" start="0">
  <doc>
    <str name="color">Yellow</str>
    <str name="id">2</str>
    <str name="name">Second document</str>
  </doc>
  <doc>
    <str name="color">Yellow</str>
    <str name="id">4</str>
    <str name="name">Fourth document</str>
    <str name="parent">1</str>
  </doc>
</result>
</response>

So we got one document which have a value in the parent field – this value is 1. Document with id field equals to 1 is the following one:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">4</int>
  <lst name="params">
    <str name="indent">true</str>
    <str name="q">id:1</str>
  </lst>
</lst>
<result name="response" numFound="1" start="0">
  <doc>
    <str name="color">Red</str>
    <str name="id">1</str>
    <str name="name">First document</str>
  </doc>
</result>
</response>

As you can see, the JOIN query worked correctly 😉

A few words at the end

Keep in mind that at the moment the functionality is under development and may change. The important this is that with Solr version 4.0 we will be able to enjoy the functionality similar to SQL JOIN.

This post is also available in: Polish

This entry was posted on Monday, February 21st, 2011 at 09:21 and is filed under About Solr. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

14 Responses to “Waiting for 4.0: SOLR-2272 – Solr and JOIN functionality”

  1. krishna Says:

    very very bad

  2. krishna Says:

    sorry very very very bad

  3. gr0 Says:

    Thank you krishna for a very constructive comment, it provides so much information that this post will only be better with such comments.

    Cheers 🙂

  4. Chris Says:

    Question:

    – Do the joining fields have to have different field names? Or can they have the same name?

    What if I am pulling 3 different tables from a RDBMS into one Solr core:

    Table#1: Customers (parent table)
    Table#2: Addresses (child table with foreign key)
    Table#3: Phones (child table with foreign key)

    With there being a ONE to MANY relationship between both Customers and Addresses, and also, Customers and Phones.

    When I pull them into Solr I cannot normalizr them because a given customer can have many addresses and many phones.

    Logically, I need to query in Solr for Customers who:
    – have an address in a given state
    – a phone in a given area code
    – who are female

    Syntactically, I thought it would look like:

    http://localhost:8983/solr/customers/select/?q={!join from=customer_id to=customer_id}address_State:Maine AND phone_area_code:212 AND customer_gender:female

  5. gr0 Says:

    If I were You I would flatten that data. I think that logically customer can have one gender, many addresses and many phones. You wrote, that You need to check the area of phone number and state. I think that the use of multivalued fields should be OK (don’t know Your data specifics though, so I can be mistaken). You would then be able to run a single, simple query without join.

  6. abhijit Says:

    How can we apply join on multiple indexes..? These two indexes are having different document structure.

  7. Tariq Says:

    Question:

    Entity #1: Employee [id,firstname, lastname, List]
    Entity #2: Address [addressid, city,state, country]

    How to specify @Field annotation in entity for List addresslist variable in Employee entity?

    How to define for addresslist in schema.xml.

    Can it be multivalued or a custom java field type ?

  8. gr0 Says:

    You may want to have your data flattened or indexed as two separate object types.

  9. Tariq Says:

    Thanks for your reply. You mean to say When i need a Employee object graph by name I need to do a individual search for all the child elements separately and then i need to create the Employee with Child elements (Lists…).

  10. snehalata Says:

    Hi,
    I want to join two core and display result.But core as collection1 containg english data and HindiCorecollection containg hindi data.

  11. Lokesh Says:

    Our requirement is to join on multiple documents (more than two) in a single Solr query. Can anyone suggest the way of doing this.

  12. gr0 Says:

    If I understand you correctly, than you can’t do that in Solr, at least currently. If you want, please write more about your requirement and what your data looks like. Maybe we will be able to think of something.

  13. yks Says:

    We have a customer table as mentioned below and want to fetch all customers that have firstname as “fstn” and lastname as “lstn” using Join. Can you please help me understand how we can do that?

    Please note there can be multiple customers with firstname as “fstn” and lastname as “lstn” either as combination or just firstname or lat name.

    CustomerDetails:
    COLUMNS: fName,fValue,CustomerID
    Row1:
    fName:firstname
    fValue:fstn
    CustomerID:1
    Row2:
    fName:lastname
    fValue:lstn
    CustomerID:1
    Row3:
    fName:firstname
    fValue:xyx
    CustomerID:2

  14. gr0 Says:

    I such case I would rather go for flatting the customer data. Is that possible?