{"id":457,"date":"2012-05-14T23:46:23","date_gmt":"2012-05-14T21:46:23","guid":{"rendered":"http:\/\/sematext.solr.pl\/?p=457"},"modified":"2020-11-11T23:46:57","modified_gmt":"2020-11-11T22:46:57","slug":"developing-your-own-solr-filter","status":"publish","type":"post","link":"https:\/\/solr.pl\/en\/2012\/05\/14\/developing-your-own-solr-filter\/","title":{"rendered":"Developing Your Own Solr Filter"},"content":{"rendered":"<p>Sometimes Lucene and Solr out of the box functionality is not enough. When such time comes, we need to extend what Lucene and Solr gives us and create our own plugin. In todays post I&#8217;ll try to show how to develop a custom filter and use it in Solr.<\/p>\n\n\n<!--more-->\n\n\n<h3>Solr Version<\/h3>\n<p>The following code is based on <strong>Solr 3.6<\/strong>. We will make an updated version of this post, that match Solr 4.0, after its release.<\/p>\n<h3>Assumptions<\/h3>\n<p>Lets assume, that we need a filter that would allow us to reverse every word we have in a given field. So, if the input is &#8220;solr.pl&#8221; the output would be &#8220;lp.rlos&#8221;. It&#8217;s not the hardest example, but for the purpose of this entry it will be enough. One more thing &#8211; I decided to omit describing how to setup your IDE, how to compile your code, build jar and stuff like that. We will only focus on the code.<\/p>\n<h3>Additional Information<\/h3>\n<p>Code, which is presented in this post was created using Solr <a href=\"http:\/\/solr.pl\/en\/2012\/04\/12\/apache-lucene-and-solr-3-6\/\">3.6<\/a> libraries, although you shouldn&#8217;t have much problems with compiling it with Solr 4 binaries. Keep in mind though that some slight modifications may be needed (in case something changes before Solr 4.0 release).<\/p>\n<h3>What We Need<\/h3>\n<p>In order for Solr to be able to use our filter, we need two classes. The first class is the actual filter implementation, which will be responsible for handling the actual logic. The second class is the filter factory, which will be responsible for creating instances of the filter. Lets get it done then.<\/p>\n<h3>Filter<\/h3>\n<p>In order to implement our filter we will extends the <em>TokenFilter<\/em> class from the <em>org.apache.lucene.analysis<\/em> and we will override the <em>incrementToken<\/em> method. This method returns a <em>boolean<\/em> value &#8211; if a value is still available for processing in the token stream, this method should return <em>true<\/em>, is the token in the token stream shouldn&#8217;t be further analyzed this method should return <em>false<\/em>. The implementation should look like the one below:\n<\/p>\n<pre class=\"brush:java\">package pl.solr.analysis;\n\nimport java.io.IOException;\n\nimport org.apache.lucene.analysis.TokenFilter;\nimport org.apache.lucene.analysis.TokenStream;\nimport org.apache.lucene.analysis.tokenattributes.CharTermAttribute;\n\npublic final class ReverseFilter extends TokenFilter {\n  private CharTermAttribute charTermAttr;\n\n  protected ReverseFilter(TokenStream ts) {\n    super(ts);\n    this.charTermAttr = addAttribute(CharTermAttribute.class);\n  }\n\n  @Override\n  public boolean incrementToken() throws IOException {\n    if (!input.incrementToken()) {\n      return false;\n    }\n\n    int length = charTermAttr.length();\n    char[] buffer = charTermAttr.buffer();\n    char[] newBuffer = new char[length];\n    for (int i = 0; i &lt; length; i++) {\n      newBuffer[i] = buffer[length - 1 - i];\n    }\n    charTermAttr.setEmpty();\n    charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);\n    return true;\n  }\n}<\/pre>\n<h3>Description of the Above Implementation<\/h3>\n<p>A few words about some of the lines of code in the above implementation:<\/p>\n<ul>\n<li><em>Line 9<\/em> &#8211; class which extends <em>TokenFilter<\/em> class and will be used as a filter should be marked as <em>final<\/em> (Lucene requirement).<br>\n<em><\/em><\/li>\n<li><em>Line 10<\/em> &#8211; token stream attribute, which allows us to get and modify the text contents of the term. If we would like, our filter could have used more than a single stream attribute, for example one like attribute for getting and changing position in the token stream or payload one. List of <em>Attribute<\/em> interface implementation can be found in Lucene API (ie. <a href=\"http:\/\/lucene.apache.org\/core\/3_6_0\/api\/all\/org\/apache\/lucene\/util\/Attribute.html\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/lucene.apache.org\/core\/3_6_0\/api\/all\/org\/apache\/lucene\/util\/Attribute.html<\/a>).<\/li>\n<li><em>Lines 12 &#8211; 15 <\/em>&#8211; constructor which takes token stream as an argument and then adding (<em>line 14<\/em>) appropriate token stream attribute.<\/li>\n<li><em>Lines 18 &#8211; 30<\/em> &#8211; <em>incrementToken<\/em> method implementation.<\/li>\n<li><em>Lines 19 &#8211; 21<\/em> &#8211; check if token is available for processing. If not return <em>false<\/em>.<\/li>\n<li><em>Line 23<\/em> &#8211; getting the size of the buffer contents of which we want to reverse.<br>\n<em><\/em><\/li>\n<li><em>Line 24<\/em> &#8211; getting the buffer in which we have the word we want to reverse.&nbsp; Term text in stored as <em>char<\/em> array and thus the best one, will be to use it and not construct <em>String<\/em> object.<\/li>\n<li><em>Lines 25 &#8211; 28<\/em> &#8211; create a new buffer and reverse the actual one.<\/li>\n<li><em>Line 29<\/em> &#8211; clean the original buffer (needed in case of using <em>append<\/em> methods).<\/li>\n<li><em>Line 30<\/em> &#8211; copy the changes we made to the buffer of the token stream attribute.<\/li>\n<li><em>Line 31<\/em> &#8211; return <em>true<\/em> in order to inform that there is a token available for further processing.<\/li>\n<\/ul>\n<h3>Filter Factory<\/h3>\n<p>As I wrote earlier, in order for Solr to be able to use our filter, we need to implement filter factory class. Because, we don&#8217;t have any special configuration values and such, factory implementation should be very simple. We will extends <em>BaseTokenFilterFactory<\/em> class from the <em>org.apache.solr.analysis<\/em> package. The implementation can look like the following:\n<\/p>\n<pre class=\"brush:java\">package pl.solr.analysis;\n\nimport org.apache.lucene.analysis.TokenStream;\nimport org.apache.solr.analysis.BaseTokenFilterFactory;\n\npublic class ReverseFilterFactory extends BaseTokenFilterFactory {\n  @Override\n  public TokenStream create(TokenStream ts) {\n    return new ReverseFilter(ts);\n  }\n}<\/pre>\n<p>As you can see filter factory implementation is simple &#8211; we only needed to override a single <em>create<\/em> method in which we instantiate our filter and return it.<\/p>\n<h3>Configuration<\/h3>\n<p>After compilation and jar file preparation, we copy the jar to a directory Solr will be able to see it. We can do this by creating the <em>lib<\/em> directory in the Solr home directory and then adding the following entry to the <em>solrconfig.xml<\/em> file:\n<\/p>\n<pre class=\"brush:xml\">&lt;lib dir=\"..\/lib\/\" regex=\"*.jar\" \/&gt;<\/pre>\n<p>Then we change the <em>schema.xml<\/em> file and we add a new field type that will use our filter:\n<\/p>\n<pre class=\"brush:xml\">&lt;fieldType name=\"text_reversed\" class=\"solr.TextField\"&gt;\n  &lt;analyzer&gt;\n    &lt;tokenizer class=\"solr.WhitespaceTokenizerFactory\"\/&gt;\n    &lt;filter class=\"pl.solr.analysis.ReverseFilterFactory\" \/&gt;\n  &lt;\/analyzer&gt;\n&lt;\/fieldType&gt;<\/pre>\n<p>It is worth to note, that as <em>class<\/em> attribute value of the <em>filter<\/em> tag we provide the full package and class names of the factory we created, not the filter itself. It is important to remember that, otherwise Solr will throw errors.<\/p>\n<h3>Does it Work ?<\/h3>\n<p>In order to show you that it works, I provide the following screen shot of the Solr administration panel:<\/p>\n<p><a href=\"http:\/\/solr.pl\/wp-content\/uploads\/2012\/05\/ReverseFilter1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-2369\" title=\"ReverseFilter\" src=\"http:\/\/solr.pl\/wp-content\/uploads\/2012\/05\/ReverseFilter1.png\" alt=\"\" width=\"555\" height=\"211\"><\/a><\/p>\n<h3>To Sum Up<\/h3>\n<p>As you can see on the above example creating your own filter is not a complicated thing. Of course, the idea of the filter was very simple and thus its implementation was simple too. I hope this post will be helpful when the time comes that you need to create your own filter for Solr.<\/p>","protected":false},"excerpt":{"rendered":"<p>Sometimes Lucene and Solr out of the box functionality is not enough. When such time comes, we need to extend what Lucene and Solr gives us and create our own plugin. In todays post I&#8217;ll try to show how to<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27],"tags":[486,181,164],"class_list":["post-457","post","type-post","status-publish","format-standard","hentry","category-solr-en","tag-develop","tag-filter","tag-solr-2"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/457","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=457"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/457\/revisions"}],"predecessor-version":[{"id":458,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/457\/revisions\/458"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/categories?post=457"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/tags?post=457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}