{"id":274,"date":"2011-06-13T20:48:24","date_gmt":"2011-06-13T18:48:24","guid":{"rendered":"http:\/\/sematext.solr.pl\/?p=274"},"modified":"2020-11-11T20:48:53","modified_gmt":"2020-11-11T19:48:53","slug":"solr-3-1-fastvectorhighlighting","status":"publish","type":"post","link":"https:\/\/solr.pl\/en\/2011\/06\/13\/solr-3-1-fastvectorhighlighting\/","title":{"rendered":"Solr 3.1: FastVectorHighlighting"},"content":{"rendered":"<p>One  of the many new features that Lucene and Solr 3.1 brings is <em> FastVectorHighlighting <\/em>&#8211; as the change notes say nothing less than the  improved functionality of highlighting. Currently  the highlighting mechanism is not too fast, sometimes it could kill  your Solr instance when dealing with a large amount of data, or very  long text fields. I thought that it is worthwhile to test the performance of the new functionality.<\/p>\n\n\n<!--more-->\n\n\n<h3>A few words at the beginning<\/h3>\n<p>First, some information about the possibilities of a new Lucene highlighter:<\/p>\n<ul>\n<li>supports N-gram based fields<\/li>\n<li>enforces the use of Java 5 or higher<\/li>\n<li>takes boosts into consideration in order to boost the importance of the text fragments<\/li>\n<li>it is very fast for large documents<\/li>\n<\/ul>\n<p>It is also worth to notice that the current highlighter is marked as <em>Deprecated<\/em> according to the <a href=\"https:\/\/issues.apache.org\/jira\/browse\/SOLR-1696\" target=\"_blank\" rel=\"noopener noreferrer\">SOLR-1696<\/a> Jira issue.<\/p>\n<h3>How was the test performed ?<\/h3>\n<p>For  testing purposes I used an index that contains approximately 1.2  million documents (I&#8217;ve indexed the Polish Wikipedia &#8211; only the latest  changes). For  each of the following searches I used a one of the biggest fields to  highlight on, once with the old (<em>hl.useFastVectorHighlighter = false<\/em>),  once with the new (<em>hl.useFastVectorHighlighter = true<\/em>) highlighter. Tests were performed on the caches turned off. The  table contains the response times which are the average time of 10  queries sequentially excluding the largest and smallest. Solr was  restarted after each query. Below are the results of this simple test:<\/p>\n[table \u201c8\u201d not found \/]<br \/>\n\n<p>Although the test is very simple, it shows a pattern &#8211; <em>FastVectorHighlighter <\/em>is faster than the current highlighter.<\/p>\n<p>As  for the quality of highlighting fragments, I couldn&#8217;t see a major  differences, although this specific data is not made to such  observations.<\/p>\n<h3>One thing to remember<\/h3>\n<p>Please note that <em>FastVectorHighlighter <\/em>requires that the field on which it will work to be properly defined. It  is necessary to set the on the following attributes: term vectors  (<em>termVectors=&#8221;true&#8221;<\/em>), term positions (<em>termPositions=&#8221;true&#8221;<\/em>) and term  offsets (<em>termOffsets=&#8221;true&#8221;<\/em>). Otherwise, continue to be used for an old mechanism.<\/p>\n<h3>To sum up<\/h3>\n<p>Please remember that the performed test was not a detailed performance test of the new highlighting method. The test was just a simulation of environment which can be closely related to some production environments. However after making the test we can say, that we can expect the new highlighting method to be faster than the older one.<\/p>","protected":false},"excerpt":{"rendered":"<p>One of the many new features that Lucene and Solr 3.1 brings is FastVectorHighlighting &#8211; as the change notes say nothing less than the improved functionality of highlighting. Currently the highlighting mechanism is not too fast, sometimes it could kill<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27],"tags":[357],"class_list":["post-274","post","type-post","status-publish","format-standard","hentry","category-solr-en","tag-vector-2"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/274","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=274"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/274\/revisions"}],"predecessor-version":[{"id":275,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/274\/revisions\/275"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=274"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/categories?post=274"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/tags?post=274"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}