Apache Solr – Splitting Its Ways From Lucene Code Base

So it happened, after the discussion started by Dawid Weiss and voting on Apache Solr becoming a top-level project it is clear – Lucene and Solr are going to split their ways, at least when it comes to sharing the same repository and release cycles. I think it is time to think a bit on what that means for us as users and as solr.pl 🙂

How Did The Voting Look Like?

Some of you may be interested in how the voting looked like. In total 48 people gave their votes – committers and PMC and community members, with PMC members having their votes marked as binding. 40 people voted for the split, 8 people voted against the split, with 33 binding votes in favor and 4 binding votes against. I would say it wasn’t a close call, the majority of the people that were voting were in favor of the split.

What Does It Really Mean?

From now on things will be moving forward slowly. The vote that passed means that Lucene and Solr will split their ways. Lucene will be the library that we all use and like and it will be moving forward with the changes in its own pace. Solr will still use Lucene as the full-text search library but may choose to incorporate the changes once they are introduced in the Lucene library or wait.

There are two main JIRA issues for those of you who are interested. The first one, the SOLR-14497 is about moving the Apache Solr project to become an Apache Software Foundation Top-Level Project. Of course it doesn’t only mean separation of the codebase and the WWW service. The PMC needs to be chosen, the commitment of the current committers has to be expressed, it has to be decided what to do with the mailing lists, continues integration, and build servers and it is only the beginning. You can see, that there is a long way ahead of Apache Solr and Lucene, before the split will be finalized. Once it is done through the cleanup of the Lucene code base will happen and the LUCENE-9375 is the JIRA issue to look if you are interested in what needs and what will be done.

The Future for Both Projects

What does the split mean for both projects? For sure it will result in smaller complexity. Each project will have its source, smaller compared to the current one. Each will have its tests and the development will go independently. Solr will not be blocking changes in Lucene and developer who wants to introduce Lucene level change will not have to do the same on Solr level. Some people know Lucene better and people who know Solr better. Smaller code base also means a smaller set of unit tests, which leads to quicker execution. That should support faster development as the developer will be able to test things quicker and in a more efficient way.

But I wanted to say one thing and maybe I should have said that upfront – Lucene and Solr were separate for a long, long time. They only shared the same repository, though two different, separate directories. Of course change in the Lucene code was forcing the changes in Solr in some cases, but that has its pros and cons. As for the rest – the mailing lists were already separate, the release binaries were already separate though highly coupled when it comes to versions, the JIRA issues are separate. What’s more, a lot of patches that are touching both projects were prepared in a way that there was a separate patch for Lucene and a separate patch for Solr.

The Future for Lucene

In my opinion, Lucene will benefit from the change. It will not have to look at Solr when doing changes. There are features like the old and deprecated numeric fields that were already removed from Lucene while still being supported in Solr. That example was brought up in the linked mailing list discussion as one of the things that developers had to struggle with while working on the projects that are tightly coupled.

Smaller, less tightly coupled code means more flexibility, quicker, and easier development. Smaller code base and fewer tests mean quicker and easier development. Not needing to adjust Solr while working on Lucene only is also a benefit, at least when looking from the library perspective.

The Future for Solr

I know, things are not black and white and there are shades of gray. We may be worried that Solr will be falling behind Lucene and not keep up the pace. Yes, this is true, but it could be already happening. Instead of updating Solr, people could just move things around. We can see that people care and that won’t change just because the projects were decoupled from each other. Of course, Solr will be given less attention by the people solely focused on Lucene development, but should they be forced to update Solr that they are not fully familiar with and are not keen on working with. It think the answer is no.

Can Solr keep up the same pace of development as it is now. I think the answer is yes. Solr can be setup to use SNAPSHOT version of Lucene and be ready to be released soon after the release of a new Lucene version. All that depends on the developers and how they will approach the changes. What’s more I think that working on SNAPSHOT version of Lucene may be better, because of using a more stable API. Solr can switch to a given development version once the API for new or updated Lucene feature is in a stable state. That will make it easier for a Solr developer to develop new features. The changes can be also incorporated in Solr own pace, which means that the changes doesn’t have to be rushed and can be polished and tested even better than they are now.

The Future for Solr.pl

Well, for us, it doesn’t change much, at least when it comes to the content that we produce. We will still be writing tutorials, quick looks and release information about Solr, that’s for sure. We also plan on tracking and publishing information about Lucene releases. We think that it will be helpful to keep track of the potential functionalities that will be sooner or later incorporated into Solr code base.

The Summary

I try to see things in bright lights. I think we should think about Lucene and Solr split as something that can help both projects. Smaller code base, faster tests, easier development, dedicated people. Let’s support both projects as we, Lucene and Solr users have interest in keeping them both alive and so does the developers that spend their time working on them. Hopefully, some time from the publication of this post we will be able to get back to it and say: “Yes, things went the right way!”. I’m keeping my fingers crossed for that and I think you should as well, even if there are question that we don’t have answers for at the moment.

Leave a Reply

Your email address will not be published. Required fields are marked *