Waiting for 4.0: SOLR-2272 – Solr and JOIN functionality

Recently, my attention caught the functionality described in the SOLR-2272 Jira ticket – functionality of SQL JOIN implemented in Solr. In today’s entry I take a look at this functionality.

Preparation

The first thing we need to do is download the code from the main branch (trunk) of Lucene/Solr SVN repository and add the the patch attached to the SOLR-2272. After building Solr we can proceed to testing.

Index structure

For testing purposes, I decided to define the following index structure:

The structure is rather simple. The id field is responsible for storing a unique identifier of the document. The name field holds the name of the document, the color is the color of the product, and the parent field is the identifier for the parent document.

Test data

File with the sample data is as follows:

Query example

For the short tests I conducted I’ve chosen a fairly simple case. I wanted to get documents that are the parents of documents with the value Yellow in the color field. Nothing fancy, at least in the meaning of SQL. To execute this query in the standard Solr we would have to make at least two queries – first to get all documents with the appropriate value in the color field, and then their parent documents. And how it looks when talking about patched Solr 4.0 ? Take a look at the following query:

Response for the above query looks like this:

As you can see we got the results we wanted. But how does that query work ? As you can see in the example, apart from the standard query for a document with the given value in the field (color:Yellow) I’ve added a strange looking fragment {!join from=parent to=id}. This fragment tells Solr to show not the documents with the given field value, but those, which have the value in the id field the same as the parent field of the document that match the given query.

Verification

Let’s see if the result we got are the right ones. First let’s fetch the documents that have Yellow value in the color field. To do that, we send the following query to Solr:

The response was as follows:

So we got one document which have a value in the parent field – this value is 1. Document with id field equals to 1 is the following one:

As you can see, the JOIN query worked correctly 😉

A few words at the end

Keep in mind that at the moment the functionality is under development and may change. The important this is that with Solr version 4.0 we will be able to enjoy the functionality similar to SQL JOIN.

14 thoughts on “Waiting for 4.0: SOLR-2272 – Solr and JOIN functionality

  • 17 June 2011 at 08:17
    Permalink

    sorry very very very bad

    Reply
  • 18 June 2011 at 08:12
    Permalink

    Thank you krishna for a very constructive comment, it provides so much information that this post will only be better with such comments.

    Cheers 🙂

    Reply
  • 10 March 2012 at 22:46
    Permalink

    Question:

    – Do the joining fields have to have different field names? Or can they have the same name?

    What if I am pulling 3 different tables from a RDBMS into one Solr core:

    Table#1: Customers (parent table)
    Table#2: Addresses (child table with foreign key)
    Table#3: Phones (child table with foreign key)

    With there being a ONE to MANY relationship between both Customers and Addresses, and also, Customers and Phones.

    When I pull them into Solr I cannot normalizr them because a given customer can have many addresses and many phones.

    Logically, I need to query in Solr for Customers who:
    – have an address in a given state
    – a phone in a given area code
    – who are female

    Syntactically, I thought it would look like:

    http://localhost:8983/solr/customers/select/?q={!join from=customer_id to=customer_id}address_State:Maine AND phone_area_code:212 AND customer_gender:female

    Reply
    • 11 March 2012 at 19:47
      Permalink

      If I were You I would flatten that data. I think that logically customer can have one gender, many addresses and many phones. You wrote, that You need to check the area of phone number and state. I think that the use of multivalued fields should be OK (don’t know Your data specifics though, so I can be mistaken). You would then be able to run a single, simple query without join.

      Reply
  • 23 October 2012 at 05:00
    Permalink

    How can we apply join on multiple indexes..? These two indexes are having different document structure.

    Reply
  • 23 October 2012 at 09:44
    Permalink

    Question:

    Entity #1: Employee [id,firstname, lastname, List]
    Entity #2: Address [addressid, city,state, country]

    How to specify @Field annotation in entity for List addresslist variable in Employee entity?

    How to define for addresslist in schema.xml.

    Can it be multivalued or a custom java field type ?

    Reply
    • 28 October 2012 at 12:56
      Permalink

      You may want to have your data flattened or indexed as two separate object types.

      Reply
  • 29 October 2012 at 08:40
    Permalink

    Thanks for your reply. You mean to say When i need a Employee object graph by name I need to do a individual search for all the child elements separately and then i need to create the Employee with Child elements (Lists…).

    Reply
  • 27 September 2013 at 07:45
    Permalink

    Hi,
    I want to join two core and display result.But core as collection1 containg english data and HindiCorecollection containg hindi data.

    Reply
  • 18 December 2013 at 13:17
    Permalink

    Our requirement is to join on multiple documents (more than two) in a single Solr query. Can anyone suggest the way of doing this.

    Reply
    • 18 December 2013 at 13:38
      Permalink

      If I understand you correctly, than you can’t do that in Solr, at least currently. If you want, please write more about your requirement and what your data looks like. Maybe we will be able to think of something.

      Reply
  • 18 February 2014 at 11:06
    Permalink

    We have a customer table as mentioned below and want to fetch all customers that have firstname as “fstn” and lastname as “lstn” using Join. Can you please help me understand how we can do that?

    Please note there can be multiple customers with firstname as “fstn” and lastname as “lstn” either as combination or just firstname or lat name.

    CustomerDetails:
    COLUMNS: fName,fValue,CustomerID
    Row1:
    fName:firstname
    fValue:fstn
    CustomerID:1
    Row2:
    fName:lastname
    fValue:lstn
    CustomerID:1
    Row3:
    fName:firstname
    fValue:xyx
    CustomerID:2

    Reply
    • 18 February 2014 at 22:04
      Permalink

      I such case I would rather go for flatting the customer data. Is that possible?

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.