{"id":372,"date":"2011-10-10T21:50:25","date_gmt":"2011-10-10T19:50:25","guid":{"rendered":"http:\/\/sematext.solr.pl\/?p=372"},"modified":"2020-11-11T21:51:09","modified_gmt":"2020-11-11T20:51:09","slug":"car-sale-application-solr-reversedwildcardfilter-lets-optimize-wildcard-queries-part-8","status":"publish","type":"post","link":"https:\/\/solr.pl\/en\/2011\/10\/10\/car-sale-application-solr-reversedwildcardfilter-lets-optimize-wildcard-queries-part-8\/","title":{"rendered":"\u201cCar sale application\u201d \u2013 solr.ReversedWildcardFilter \u2013 let&#8217;s optimize  wildcard queries (part 8)"},"content":{"rendered":"<p>\u201cCar sale application\u201d users started to use wildard queries more and more often. This fact forced us to think about wildcard queries optimization. solr.ReversedWildcardFilter comes to rescue us.<\/p>\n\n\n<!--more-->\n\n\n<h3>solr.ReversedWildcardFilter<\/h3>\n<p>The solr.ReversedWildcardFilter filter provides us with new tokens, which in fact are reverses tokens, that are indexed to provide faster leading wildcard queries. The filter supports the following init arguments:<\/p>\n<ul>\n<li><em>withOriginal<\/em> &#8211; if true, then produce both original and reversed tokens at the same positions. If false, then produce only reversed tokens.<\/li>\n<li><em>maxPosAsterisk<\/em> &#8211; maximum position (1-based) of the asterisk wildcard (&#8216;*&#8217;) that triggers the reversal of query term. Asterisk that occurs at positions higher than this value will not cause the reversal of query term.<\/li>\n<li><em>maxPosQuestion<\/em> &#8211; maximum position (1-based) of the question mark wildcard (&#8216;?&#8217;) that triggers the reversal of query term.<\/li>\n<li><em>maxFractionAsterisk<\/em> &#8211; additional parameter that triggers the reversal if asterisk (&#8216;*&#8217;) position is less than this fraction of the query token length.<\/li>\n<li><em>minTrailing<\/em> &#8211; minimum number of trailing characters in query token after the last wildcard character. For good performance this should be set to a value larger than 1.<\/li>\n<\/ul>\n<h3>schema.xml changes<\/h3>\n<p>New filter is added to the \u201ctext\u201d field type:\n<\/p>\n<pre class=\"brush:xml\">&lt;fieldType name=\"text\" class=\"solr.TextField\"\n\tpositionIncrementGap=\"100\"&gt;\n\t&lt;analyzer type=\"index\"&gt;\n\t\t&lt;tokenizer class=\"solr.WhitespaceTokenizerFactory\" \/&gt;\n\t\t&lt;filter class=\"solr.PatternReplaceFilterFactory\" pattern=\"'\"\n\t\t\treplacement=\"\" replace=\"all\" \/&gt;\n\t\t&lt;filter class=\"solr.WordDelimiterFilterFactory\"\n\t\t\tgenerateWordParts=\"1\" generateNumberParts=\"1\" catenateWords=\"1\"\n\t\t\tstemEnglishPossessive=\"0\" \/&gt;\n\t\t&lt;filter class=\"solr.LowerCaseFilterFactory\" \/&gt;\n\t\t<strong>&lt;filter class=\"solr.ReversedWildcardFilterFactory\" \/&gt;<\/strong>\n\t&lt;\/analyzer&gt;\n\t&lt;analyzer type=\"query\"&gt;\n\t\t&lt;tokenizer class=\"solr.WhitespaceTokenizerFactory\" \/&gt;\n\t\t&lt;filter class=\"solr.PatternReplaceFilterFactory\" pattern=\"'\"\n\t\t\treplacement=\"\" replace=\"all\" \/&gt;\n\t\t&lt;filter class=\"solr.WordDelimiterFilterFactory\"\n\t\t\tgenerateWordParts=\"1\" generateNumberParts=\"1\" catenateWords=\"1\"\n\t\t\tstemEnglishPossessive=\"0\" \/&gt;\n\t\t&lt;filter class=\"solr.LowerCaseFilterFactory\" \/&gt;\n\t&lt;\/analyzer&gt;\n&lt;\/fieldType&gt;<\/pre>\n<p>solr.ReversedWildcardFilterFactory filter is added only to the index analyzer. We do not define any arguments in the filter definition, because we would like to use the default configuration, which is:<\/p>\n<ul>\n<li><em>withOriginal<\/em> &#8211; \u201etrue\u201d, we would like to produce original tokens<\/li>\n<li><em>maxPosAsterisk<\/em> &#8211; 2<\/li>\n<li><em>maxPosQuestion<\/em> &#8211; 1<\/li>\n<li><em>maxPosQuestion<\/em> &#8211; 0.0f (disabled)<\/li>\n<li><em>maxPosQuestion<\/em> &#8211; 2<\/li>\n<\/ul>\n<h3>Sample data<\/h3>\n<p>Let&#8217;s index some sample data:\n<\/p>\n<pre class=\"brush:xml\">&lt;add&gt;\n  &lt;doc&gt;\n    &lt;field name=\"id\"&gt;1&lt;\/field&gt;\n    &lt;field name=\"make\"&gt;Lancia&lt;\/field&gt;\n    &lt;field name=\"model\"&gt;Delta&lt;\/field&gt;\n    ...\n  &lt;\/doc&gt;\n  &lt;doc&gt;\n    &lt;field name=\"id\"&gt;2&lt;\/field&gt;\n    &lt;field name=\"make\"&gt;Land Rover&lt;\/field&gt;\n    &lt;field name=\"model\"&gt;Defender&lt;\/field&gt;\n    ...\n  &lt;\/doc&gt;\n  &lt;doc&gt;\n    &lt;field name=\"id\"&gt;3&lt;\/field&gt;\n    &lt;field name=\"make\"&gt;Acura&lt;\/field&gt;\n    &lt;field name=\"model\"&gt;MDX&lt;\/field&gt;\n    ...\n  &lt;\/doc&gt;\n  &lt;doc&gt;\n    &lt;field name=\"id\"&gt;4&lt;\/field&gt;\n    &lt;field name=\"make\"&gt;Acura&lt;\/field&gt;\n    &lt;field name=\"model\"&gt;RDX&lt;\/field&gt;\n    ...\n  &lt;\/doc&gt;\n  &lt;doc&gt;\n    &lt;field name=\"id\"&gt;5&lt;\/field&gt;\n    &lt;field name=\"make\"&gt;Acura&lt;\/field&gt;\n    &lt;field name=\"model\"&gt;RSX&lt;\/field&gt;\n    ...\n  &lt;\/doc&gt;\n&lt;\/add&gt;<\/pre>\n<h3>Let&#8217;s create queries<\/h3>\n<p>Let me remind you that the default search field is the \u201ccontent\u201d field, that among others contains \u201cmake\u201d and \u201cmodel\u201d field. To analyse query results and solr.ReversedWildcardFilter filter behaviour, we will set the \u201estored\u201d argument of the \u201econtent\u201d field to \u201ctrue\u201d. We will also add the debugQuery query argument, which will allow us to find out, which tokens are used in the query processing (original or reversed).<\/p>\n<ol>\n<li>?q=lan*&amp;fl=id,content&amp;debugQuery=on\n<pre class=\"brush:xml\">&lt;result name=\"response\" numFound=\"2\" start=\"0\"&gt;\n  &lt;doc&gt;\n    &lt;arr name=\"content\"&gt;\n      &lt;str&gt;Lancia&lt;\/str&gt;\n      &lt;str&gt;Delta&lt;\/str&gt;\n      &lt;str&gt;2002&lt;\/str&gt;\n    &lt;\/arr&gt;\n    &lt;str name=\"id\"&gt;1&lt;\/str&gt;\n  &lt;\/doc&gt;\n  &lt;doc&gt;\n    &lt;arr name=\"content\"&gt;\n      &lt;str&gt;Land Rover&lt;\/str&gt;\n      &lt;str&gt;Defender&lt;\/str&gt;\n      &lt;str&gt;2002&lt;\/str&gt;\n    &lt;\/arr&gt;\n    &lt;str name=\"id\"&gt;2&lt;\/str&gt;\n  &lt;\/doc&gt;\n&lt;\/result&gt;\n&lt;lst name=\"debug\"&gt;\n  &lt;str name=\"rawquerystring\"&gt;lan*&lt;\/str&gt;\n  &lt;str name=\"querystring\"&gt;lan*&lt;\/str&gt;\n  &lt;str name=\"parsedquery\"&gt;content:lan*&lt;\/str&gt;\n  &lt;str name=\"parsedquery_toString\"&gt;content:lan*&lt;\/str&gt;\n  ...\n&lt;\/lst&gt;<\/pre>\n<p>We have used asterisk wildcard (&#8216;*&#8217;) at the end of the query (position = 4), so the original tokens were used:\n<\/p>\n<pre class=\"brush:xml\">&lt;str name=\"parsedquery\"&gt;content:lan*&lt;\/str&gt;<\/pre>\n<\/li>\n<li>?q=*dx&amp;fl=id,content&amp;debugQuery=on\n<pre class=\"brush:xml\">&lt;result name=\"response\" numFound=\"2\" start=\"0\"&gt;\n  &lt;doc&gt;\n    &lt;arr name=\"content\"&gt;\n      &lt;str&gt;Acura&lt;\/str&gt;\n      &lt;str&gt;MDX&lt;\/str&gt;\n      &lt;str&gt;2002&lt;\/str&gt;\n    &lt;\/arr&gt;\n    &lt;str name=\"id\"&gt;3&lt;\/str&gt;\n  &lt;\/doc&gt;\n  &lt;doc&gt;\n    &lt;arr name=\"content\"&gt;\n      &lt;str&gt;Acura&lt;\/str&gt;\n      &lt;str&gt;RDX&lt;\/str&gt;\n      &lt;str&gt;2003&lt;\/str&gt;\n    &lt;\/arr&gt;\n    &lt;str name=\"id\"&gt;4&lt;\/str&gt;\n  &lt;\/doc&gt;\n&lt;\/result&gt;\n&lt;lst name=\"debug\"&gt;\n  &lt;str name=\"rawquerystring\"&gt;*dx&lt;\/str&gt;\n  &lt;str name=\"querystring\"&gt;*dx&lt;\/str&gt;\n  &lt;str name=\"parsedquery\"&gt;content:#1;xd*&lt;\/str&gt;\n  &lt;str name=\"parsedquery_toString\"&gt;content:#1;xd*&lt;\/str&gt;\n  ...\n&lt;\/lst&gt;<\/pre>\n<p>We have used asterisk wildcard (&#8216;*&#8217;) at the beginning of the query (position = 1) and additionally we have two trailing characters after the last wildcard. That&#8217;s why the revesed tokens were used:\n<\/p>\n<pre class=\"brush:xml\">&lt;str name=\"parsedquery\"&gt;content:#1;xd*&lt;\/str&gt;<\/pre>\n<p>As we can see, the reversed tokens have a special prefix in order to avoid collisions and false matches.<\/p>\n<\/li>\n<li>?q=r?x&amp;fl=id,content&amp;debugQuery=on\n<pre class=\"brush:xml\">&lt;result name=\"response\" numFound=\"2\" start=\"0\"&gt;\n  &lt;doc&gt;\n    &lt;arr name=\"content\"&gt;\n      &lt;str&gt;Acura&lt;\/str&gt;\n      &lt;str&gt;RDX&lt;\/str&gt;\n      &lt;str&gt;2003&lt;\/str&gt;\n    &lt;\/arr&gt;\n    &lt;str name=\"id\"&gt;4&lt;\/str&gt;\n  &lt;\/doc&gt;\n  &lt;doc&gt;\n    &lt;arr name=\"content\"&gt;\n      &lt;str&gt;Acura&lt;\/str&gt;\n      &lt;str&gt;RSX&lt;\/str&gt;\n      &lt;str&gt;2006&lt;\/str&gt;\n    &lt;\/arr&gt;\n    &lt;str name=\"id\"&gt;5&lt;\/str&gt;\n  &lt;\/doc&gt;\n&lt;\/result&gt;\n&lt;lst name=\"debug\"&gt;\n  &lt;str name=\"rawquerystring\"&gt;r?x&lt;\/str&gt;\n  &lt;str name=\"querystring\"&gt;r?x&lt;\/str&gt;\n  &lt;str name=\"parsedquery\"&gt;content:r?x&lt;\/str&gt;\n  &lt;str name=\"parsedquery_toString\"&gt;content:r?x&lt;\/str&gt;\n  ...\n&lt;\/lst&gt;<\/pre>\n<p>We have used question mark wildcard (&#8216;?&#8217;) on position number 2 and additionally we have only one trailing character after the wildcard. The original tokens were used:\n<\/p>\n<pre class=\"brush:xml\">&lt;str name=\"parsedquery\"&gt;content:r?x&lt;&lt;\/str&gt;<\/pre>\n<\/li>\n<\/ol>\n<h3>The end<\/h3>\n<p>Thanks to the solr.ReversedWildcardFilter filter, we have successfully optimized wildcard queries. \u201cCar sale application\u201d users can now effectively use them \ud83d\ude42<\/p>","protected":false},"excerpt":{"rendered":"<p>\u201cCar sale application\u201d users started to use wildard queries more and more often. This fact forced us to think about wildcard queries optimization. solr.ReversedWildcardFilter comes to rescue us.<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27],"tags":[],"class_list":["post-372","post","type-post","status-publish","format-standard","hentry","category-solr-en"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/372","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=372"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/372\/revisions"}],"predecessor-version":[{"id":373,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/372\/revisions\/373"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=372"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/categories?post=372"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/tags?post=372"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}