Distributed IDF

When Lucene and Solr searches through the data, each document is assigned a score that is calculated on the basis of query terms statistics. When using SolrCloud and our data inside the collection is distributed among multiple shards we are hit by a problem of not exact inverse document frequency calculation. The problem can be defined in the following way – each shard stores the term statistics locally and doesn’t share that with other shards during query execution. Can we do something about it to have more precise IDF calculation? Let’s see what we can do about it.

Read more

Solr 8 – ByteBuffersDirectory – quick look

One of the new features introduced in the recently released Solr 8.0 is new implementation of the Directory interface – one that will replace not scalable RAMDirectory. The new implementation called ByteBuffersDirectory is dedicated to small, short lived data that is held only in memory. Let’s have a quick look into potential use cases, advantages and drawbacks of this new implementation.

Read more