{"id":264,"date":"2011-05-02T20:44:43","date_gmt":"2011-05-02T18:44:43","guid":{"rendered":"http:\/\/sematext.solr.pl\/?p=264"},"modified":"2020-11-11T20:45:14","modified_gmt":"2020-11-11T19:45:14","slug":"solr-filters-keepwordfilter","status":"publish","type":"post","link":"https:\/\/solr.pl\/en\/2011\/05\/02\/solr-filters-keepwordfilter\/","title":{"rendered":"Solr filters: KeepWordFilter"},"content":{"rendered":"<p>This time I decided to look at one of the unusual filters available in the standard distribution of Solr. The first one in my hands is a filter called <em>KeepWordFilter<\/em>.<\/p>\n\n\n<!--more-->\n\n\n<h3>Let&#8217;s start<\/h3>\n<p>First, a few words about what this filter does. As the name might indicate the main purpose of this filter is to &#8220;stop&#8221;  words. More specifically, the filter does the opposite of filter called <em>StopFilter<\/em>. So how does this filter work ? I&#8217;ll talk about this in a moment &#8211; let&#8217;s start with the definition of the type and fields in the <em>schema.xml <\/em>file:\n<\/p>\n<pre class=\"brush:xml\">&lt;fieldtype name=\"keepwords\" class=\"solr.TextField\"&gt;\n   &lt;analyzer&gt;\n      <code>&lt;<\/code><code>tokenizer<\/code> <code>class<\/code><code>=<\/code><code>\"solr.WhitespaceTokenizerFactory\"<\/code><code>\/&gt;<\/code>\n      &lt;filter class=\"solr.KeepWordFilterFactory\" words=\"words.txt\" ignoreCase=\"true\"\/&gt;\n   &lt;\/analyzer&gt;\n&lt;\/fieldtype&gt;<\/pre>\n<p>As shown in the above definition in addition to the standard class and name attributes the filter has two additional attributes::<\/p>\n<ul>\n<li><em>words<\/em> &#8211; the list of words to keep<\/li>\n<li><em>ignoreCase<\/em> &#8211; <em>true<\/em> | <em>false<\/em> value indicating case ignore functionality.<\/li>\n<\/ul>\n<h3>File contents<\/h3>\n<p>Let&#8217;s assume that the <em>words.txt<\/em> file contain the following words:\n<\/p>\n<pre>ala\nma\nkota<\/pre>\n<p>If  you would like to index the phrase &#8220;Ala ma kota, a kot ma Al\u0119&#8221; the  following tokens will be written into the index: &#8220;ala&#8221;, &#8220;ma&#8221;, &#8220;kota&#8221;,  &#8220;ma&#8221; because only those terms are defined in the words.txt file. This is clearly visible evident in the Solr administration panel:<\/p>\n<p><a href=\"http:\/\/solr.pl\/wp-content\/uploads\/2011\/04\/keepwords.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1198\" title=\"keepwords\" src=\"http:\/\/solr.pl\/wp-content\/uploads\/2011\/04\/keepwords.png\" alt=\"\" width=\"626\" height=\"493\"><\/a><\/p>\n<h3>A few words at the end<\/h3>\n<p>Although I never used the filter it seems to me that this is a good filter to use when you need to store the values of&nbsp; enumerated types, or in situations where we are interested in finite, or even better &#8211; a small and known in advance list of values, such as the categories where we can not filter information at the application level, or when it is very difficult.<\/p>","protected":false},"excerpt":{"rendered":"<p>This time I decided to look at one of the unusual filters available in the standard distribution of Solr. The first one in my hands is a filter called KeepWordFilter.<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27],"tags":[181,347,346,164,348],"class_list":["post-264","post","type-post","status-publish","format-standard","hentry","category-solr-en","tag-filter","tag-keep-2","tag-keepwordfilter-2","tag-solr-2","tag-word-2"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/264","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=264"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/264\/revisions"}],"predecessor-version":[{"id":265,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/264\/revisions\/265"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/categories?post=264"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/tags?post=264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}