{"id":455,"date":"2012-04-30T23:45:48","date_gmt":"2012-04-30T21:45:48","guid":{"rendered":"http:\/\/sematext.solr.pl\/?p=455"},"modified":"2020-11-11T23:46:18","modified_gmt":"2020-11-11T22:46:18","slug":"solr-4-0-directsolrspellchecker","status":"publish","type":"post","link":"https:\/\/solr.pl\/en\/2012\/04\/30\/solr-4-0-directsolrspellchecker\/","title":{"rendered":"Solr 4.0: DirectSolrSpellChecker"},"content":{"rendered":"<p>One of the new features, which will be introduces with Solr 4.0 is a new SpellChecker implementation, which doesn&#8217;t require its own index. I decided to take a quick look at it and share my thoughts.<\/p>\n\n\n<!--more-->\n\n\n<h3>What We Have Today<\/h3>\n<p>As for today (<a href=\"http:\/\/solr.pl\/en\/2012\/04\/12\/apache-lucene-and-solr-3-6\/\" target=\"_blank\" rel=\"noopener noreferrer\">Solr 3.6<\/a>) we can use the following SpellChecker implementations:<\/p>\n<ul>\n<li><em>org.apache.solr.spelling.IndexBasedSpellChecker<\/em><\/li>\n<li><em>org.apache.solr.spelling.FileBasedSpellChecker<\/em><\/li>\n<\/ul>\n<p>With the upcoming Solr 4.0, we will get a new implementation:<\/p>\n<ul>\n<li><em>org.apache.solr.spelling.DirectSolrSpellChecker<\/em><\/li>\n<\/ul>\n<h3>Current Problems<\/h3>\n<p>In most of the cases I worked with the main problem of <em>IndexBasedSpellChecker<\/em> was the need to rebuild its index. In some cases the rebuild was long and it wasn&#8217;t possible to rebuild that index after every <em>commit<\/em> which for some was a bit issue. Of course it wasn&#8217;t a problem with <em>FileBasedSpellChecker<\/em>, but again, in my case, it was used as a support mechanism for the <em>IndexBasedSpellChecker<\/em>.<\/p>\n<h3>Configuration<\/h3>\n<p><em>DirectSolrSpellChecker <\/em>configuration is similar to the one you are used today in Solr 3. Of course, there are some additional parameters. Following you can find a sample configuration:\n<\/p>\n<pre class=\"brush:xml\">&lt;searchComponent name=\"spellcheck\" class=\"solr.SpellCheckComponent\"&gt;\n  &lt;str name=\"queryAnalyzerFieldType\"&gt;textTitle&lt;\/str&gt;\n  &lt;lst name=\"spellchecker\"&gt;\n    &lt;str name=\"name\"&gt;default&lt;\/str&gt;\n    &lt;str name=\"field\"&gt;title&lt;\/str&gt;\n    &lt;str name=\"classname\"&gt;solr.DirectSolrSpellChecker&lt;\/str&gt;\n    &lt;str name=\"distanceMeasure\"&gt;internal&lt;\/str&gt;\n    &lt;float name=\"accuracy\"&gt;0.7&lt;\/float&gt;\n    &lt;int name=\"maxEdits\"&gt;2&lt;\/int&gt;\n    &lt;int name=\"minPrefix\"&gt;1&lt;\/int&gt;\n    &lt;int name=\"maxInspections\"&gt;5&lt;\/int&gt;\n    &lt;int name=\"minQueryLength\"&gt;4&lt;\/int&gt;\n    &lt;float name=\"maxQueryFrequency\"&gt;0.01&lt;\/float&gt;\n    &lt;float name=\"thresholdTokenFrequency\"&gt;.01&lt;\/float&gt;\n  &lt;\/lst&gt;\n&lt;\/searchComponent&gt;<\/pre>\n<p>And the meaning for each of the parameters:<\/p>\n<ul>\n<li><em>queryAnalyzerFieldType<\/em> &#8211; name of the type on which basis SpellChecker query will be analyzed.<\/li>\n<li><em>field<\/em> &#8211; field which contents will be used to build SpellChecker results.<\/li>\n<li><em>classname<\/em> &#8211; SpellChecker implementation class.<\/li>\n<li><em>distanceMeasure<\/em> &#8211; algorithm which will be used to calculate terms distance, in our case we will use the default ones (Levensthein&#8217;s).<\/li>\n<li><em>accuracy<\/em> &#8211; precision that must be achieved for the suggest to be counted as proper one.<\/li>\n<li><em>maxEdits<\/em> &#8211; maximum number of changes during term enumeration. This property can be set to <em>1<\/em> or <em>2<\/em>.<\/li>\n<li><em>minPrefix<\/em> &#8211; minimal, common prefix during term enumeration.<\/li>\n<li><em>maxInspections<\/em> &#8211; maximum number of checks for each suggestion.<\/li>\n<li><em>minQueryLength<\/em> &#8211; minimal suggestion length for work to be taken into consideration as proper suggestion.<\/li>\n<li><em>maxQueryFrequency<\/em> &#8211; maximum percentage of documents in which word can appear for the word to be considered as one to correct (<em>0.01<\/em> value means <em>1%<\/em>).<\/li>\n<li><em>thresholdTokenFrequency<\/em> &#8211;&nbsp; minimal percentage of documents in which suggestion have to appear in order for it to be considered proper (<em>.01<\/em> value means <em>1%<\/em>).<\/li>\n<\/ul>\n<p>The above configuration attributes shows that <em>DirectSolrSpellChecker<\/em> gives us much degree of behavior configuration.<\/p>\n<h3>Usage<\/h3>\n<p><em>DirectSolrSpellChecker<\/em> is no different than other SpellChecker implementations when it comes to using it. As with the previous implementations you can configure Solr to add SpellChecker results to each query results or just configure new handler and decide when to query it for results. We wrote about how to use SpellChecker in the past &#8211; in the &#8220;<a href=\"http:\/\/solr.pl\/en\/2011\/05\/23\/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-spellcheckcomponent-%E2%80%93-did-you-really-mean-that-part-5\/\">Car sale application<\/a>&#8221; example.<\/p>\n<h3>What We Can Expect ?<\/h3>\n<p>Acording to the information which we can see at JIRA issue <a href=\"https:\/\/issues.apache.org\/jira\/browse\/LUCENE-2507\">LUCENE-2507<\/a> <em>DirectSolrSpellChecker<\/em> will not only remove the need of having a separate index, but will also improvement in suggestions quality. From what you can see in the mentioned JIRA issue, <em>DirectSolrSpellChecker<\/em> works better comparing to the previous implementations although it&#8217;s slightly slower, but I think that wont be an issue when you don&#8217;t use SpellChecker with every query.<\/p>","protected":false},"excerpt":{"rendered":"<p>One of the new features, which will be introduces with Solr 4.0 is a new SpellChecker implementation, which doesn&#8217;t require its own index. I decided to take a quick look at it and share my thoughts.<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27],"tags":[],"class_list":["post-455","post","type-post","status-publish","format-standard","hentry","category-solr-en"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/455","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=455"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/455\/revisions"}],"predecessor-version":[{"id":456,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/455\/revisions\/456"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=455"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/categories?post=455"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/tags?post=455"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}