Solr 8 – ByteBuffersDirectory – quick look

Rafał Kuć — Mon, 15 Apr 2019 14:07:46 +0000

One of the new features introduced in the recently released Solr 8.0 is new implementation of the Directory interface – one that will replace not scalable RAMDirectory. The new implementation called ByteBuffersDirectory is dedicated to small, short lived data that is held only in memory. Let’s have a quick look into potential use cases, advantages and drawbacks of this new implementation.

Configuration

Simplest things first – let’s start with configuration. This time the situation is very simple. The only thing that we need to take care of is proper DirectoryFactory implementation in the solrconfig.xml file, file example

One thing we need to remember though is the lockType implementation. At the time of writing, the only possible value for lockType when ByteBuffersDirectoryFactory is used is simple. No other lock type is supported.

Possible Usage

The new implementation of the Directory interface will replace RAMDirectory – implementation that is present in Lucene and Solr for a long period of time now. However, keep in mind that RAMDirectory is not suggested to be used because of its issues.

Theortically we should use the in-memory Directory implementation for small, short lived Lucene indices. This allows us to avoid disk usage for writes and reads for potentially higher indexing throughput and lower query latency.

For example, you can use the newly introduced Directory implementation for keeping highly updated data like number of products or for that needed only once – i.e. reports coming from multiple data sources.

You need to remember though that similar to RAMDirectory, the ByteBuffersDirectory doesn’t save data to disk. This means that you need to re-index your data after each Solr restart or failure.

Advantages

The main advantage, that we already mentioned, is that the data indexed to the core/collection that uses ByteBuffersDirectory will not use disk resources. That means that the data write and read will be blazingly fast, which can result in higher indexing throughput and lower query latency. However keep in mind that similar behavior can be achieved when using mmap calls to the I/O subsystem of the operating system, so for example when using the MMapDirectory. This Directory implementation shares the I/O cache of the operating system and will re-use data that is put in memory by the operating system. The second advantage, especially when compared to the RAMDirectory is that the ByteBuffersDirectory fully supports multi-threading, so we won’t have issues when using the multi-threaded data indexing and access.

Drawbacks

Of course not everything is bright and shining, there are also drawbacks when using the ByteBuffersDirectory – we should mention three things here. First of all, when using this directory, the data will be volatile – every restart will require data re-indexing. It is one of the principles of that Directory implementation, but it is worth remembering about that. The second thing is heap memory. If the data will be stored on heap it will mean more heap memory usage and more work for the garbage collector, which can lead to less performance. Finally – we are partially limited by the amount of physical memory and the memory assigned to the JVM heap. This means that the core/collections that are using ByteBuffersDirectory shouldn’t be too large.

Migration to Solr 8

Rafał Kuć — Mon, 18 Mar 2019 15:07:07 +0000

With the recent release of Solr 8.0 you may be wondering if it is worth migrating to the new version and how to do it. Is it possible to upgrade your cluster without major downtime? Is it possible to upgrade to new version using rolling restarts? We will try to answer that question in this blog post.

Which version to migrate from

The first thing that we should mention are the changes to LIR and recovery algorithm introduced with Solr 7.3. Because of those changes it is impossible to use rolling restarted when migrating from version that is older then 7.3. In such cases you need a full cluster restart.

We should also remember that Apache Lucene library version 8.0 and thus Solr 8.0 is not able to read data created with Solr 5.x. Because of this you will have to go through each major version up to 8.x and rewrite your data on each or just created new cluster and re-idnex your data.

Migrating from Solr 7.3 and newer

If you will be migrating from Solr 7.3 or newer you can upgrade your cluster to 8.0.0 without the need of full cluster rrestart and just by using rolling restarts. This is how the procedure should look like:

Prepare the cluster for upgrade, create data backup, Zookeeper data backup and so on.
Turn off one Solr instance.
Update the turned off instance to 8.0.
Add -Dsolr.http1=true parameter to SOLR_OPTS property in solr.in.sh configuration file.
Launch the updated Solr instance.
Wait for the cluster to stabilize (there may be a need of data replication, so it may take a while).
Start the upgrade of the next Solr instance, so begin with point 2).

After being done with all the instances we need to do one more round of node by node restarts. This time we need to do the following for each instance belonging to our cluster:

Turn off a single Solr instance.
Remove the -Dsolr.http1 parameter from the SOLR_OPTS property in the solr.in.sh configuration file.
Start the modified Solr instance.
Wait for the cluster to stabilize.
Repeat from point 1) for the next instance.

Additional restarts round – but why?

You may be asking yourself why we do the additional, second round of restarts once we have all the nodes upgraded to version 8. The problem lies in the foundation of the changes that were introduced to the newest version of Solr. With Solr 8.0.0 the internal communication between the nodes is done using the HTTP/2 protocol. The usage of that protocol and the changes done to introduce it resulted in more robustness and higher efficiency of the inter-node communication. However the downside is that the changes are not compatible with the earlier Solr versions. That’s why, when we have 7.x and 8.0 Solr version in the cluster we need to be sure that they are compatible with each other. Once all the nodes are running the newest version of Solr we can finally turn off forcing the backwards compatibility and fully use the latest and greatest features introduced in Solr 8.0. This is why we first add the -Dsolr.http1=true parameter and then remove it.

Should I be afraid of the upgrade

Each migration comes with a degree of risk and possible complications. Because of that it is crucial to plan and prepare. We shouldn’t do the upgrade during peak hours or during the period of time where we expect a higher load on the cluster. We should think if it is possible to minimize or completely turn off indexing during the upgrade time – it will minimize the time needed for the instance restart. It is also crucial to be able to roll back if something goes wrong, though it is not always possible, especially if you have a large amount of data. You should think about it all and prepare. Good luck with your upgrades

8.0 – Solr.pl