Solr 8 – ByteBuffersDirectory – quick look

One of the new features introduced in the recently released Solr 8.0 is new implementation of the Directory interface – one that will replace not scalable RAMDirectory. The new implementation called ByteBuffersDirectory is dedicated to small, short lived data that is held only in memory. Let’s have a quick look into potential use cases, advantages and drawbacks of this new implementation.

Configuration

Simplest things first – let’s start with configuration. This time the situation is very simple. The only thing that we need to take care of is proper DirectoryFactory implementation in the solrconfig.xml file, file example

One thing we need to remember though is the lockType implementation. At the time of writing, the only possible value for lockType when ByteBuffersDirectoryFactory is used is simple. No other lock type is supported.

Possible Usage

The new implementation of the Directory interface will replace RAMDirectory – implementation that is present in Lucene and Solr for a long period of time now. However, keep in mind that RAMDirectory is not suggested to be used because of its issues.

Theortically we should use the in-memory Directory implementation for small, short lived Lucene indices. This allows us to avoid disk usage for writes and reads for potentially higher indexing throughput and lower query latency.

For example, you can use the newly introduced Directory implementation for keeping highly updated data like number of products or for that needed only once – i.e. reports coming from multiple data sources.

You need to remember though that similar to RAMDirectory, the ByteBuffersDirectory doesn’t save data to disk. This means that you need to re-index your data after each Solr restart or failure.

Advantages

The main advantage, that we already mentioned, is that the data indexed to the core/collection that uses ByteBuffersDirectory will not use disk resources. That means that the data write and read will be blazingly fast, which can result in higher indexing throughput and lower query latency. However keep in mind that similar behavior can be achieved when using mmap calls to the I/O subsystem of the operating system, so for example when using the MMapDirectory. This Directory implementation shares the I/O cache of the operating system and will re-use data that is put in memory by the operating system. The second advantage, especially when compared to the RAMDirectory is that the ByteBuffersDirectory fully supports multi-threading, so we won’t have issues when using the multi-threaded data indexing and access.

Drawbacks

Of course not everything is bright and shining, there are also drawbacks when using the ByteBuffersDirectory – we should mention three things here. First of all, when using this directory, the data will be volatile – every restart will require data re-indexing. It is one of the principles of that Directory implementation, but it is worth remembering about that. The second thing is heap memory. If the data will be stored on heap it will mean more heap memory usage and more work for the garbage collector, which can lead to less performance. Finally – we are partially limited by the amount of physical memory and the memory assigned to the JVM heap. This means that the core/collections that are using ByteBuffersDirectory shouldn’t be too large.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.