{"id":915,"date":"2017-01-29T15:11:18","date_gmt":"2017-01-29T14:11:18","guid":{"rendered":"http:\/\/sematext.solr.pl\/?p=915"},"modified":"2020-11-14T15:11:50","modified_gmt":"2020-11-14T14:11:50","slug":"autocomplete-with-special-characters","status":"publish","type":"post","link":"https:\/\/solr.pl\/en\/2017\/01\/29\/autocomplete-with-special-characters\/","title":{"rendered":"Autocomplete With Special Characters"},"content":{"rendered":"<p>Its been a long time since the last real blog post here on <a href=\"http:\/\/solr.pl\">solr.pl<\/a>, but better later than ever, right? So, today we will get back to the topic of autocomplete functionality with Solr Suggesters, facets and a standard queries with n-gram approach. So, the same functionality, but done with different approach, depending on the needs and preferences.<\/p>\n\n\n<!--more-->\n\n\n<p>Today we will look at a more sophisticated autocomplete functionality &#8211; one that suggest words with special characters while the user types in words with or without autocomplete.<br>\n<!--more--><\/p>\n<h3>Autocomplete With Special Characters<\/h3>\n<p>Imagine we have words documents that have only two fields &#8211; <em>id<\/em>, which is the document identifier and a <em>name<\/em>, which is the name of the document and the field on which we would like to run autocomplete on. The <em>name<\/em> field is the one that can have language based characters, like <em>\u017c<\/em>, <em>\u0107<\/em> or <em>\u0119<\/em> in polish. And we would like to support them for all our users, no matter what type of keyboard they have or what type of locale they have set in their system.<\/p>\n<p>Let&#8217;s assume we have the following documents:\n<\/p>\n<pre class=\"brush:xml\">{\"id\":1, \"name\":\"Po\u015brednictwo nieruchomo\u015bci\"}\n{\"id\":2, \"name\":\"Posadowienie budynk\u00f3w\"}\n{\"id\":3, \"name\":\"Posocznica\"}<\/pre>\n<p>The names are not important, the use case is. We would like to be able to get all three documents when the user enters <em>pos<\/em> into the search box. Is that possible? Yes and let&#8217;s see how.<\/p>\n<h3>Preparing Collection Structure<\/h3>\n<p>Let&#8217;s start with the <em>schema.xml<\/em> file and the definition of the fields and types. We will completely ignore that search here and only focus on autocomplete. We also assume that we want the whole name to be displayed whenever user input matched any of the terms in the <em>name<\/em> field.<br>\nWe will start with the fields, which look like this:\n<\/p>\n<pre class=\"brush:xml\">&lt;field name=\"id\" type=\"int\" indexed=\"true\" stored=\"true\" required=\"true\" multiValued=\"false\" \/&gt;\n&lt;field name=\"name\" type=\"string\" indexed=\"false\" stored=\"true\" multiValued=\"false\" \/&gt;\n&lt;field name=\"name_ac\" type=\"text_ac\" indexed=\"true\" stored=\"false\" multiValued=\"false\" \/&gt;<\/pre>\n<p>So, we will have our <em>id<\/em> field which is an integer, <em>name<\/em> field which we will use only for display purposes. The <em>name_ac<\/em> field will be used for the actual autocomplete. To not bother with manually filling the <em>name_ac<\/em> field we will use <em>copyField<\/em> definition in the <em>schema.xml<\/em>:\n<\/p>\n<pre class=\"brush:xml\">&lt;copyField source=\"name\" dest=\"name_ac\" \/&gt;<\/pre>\n<p>The <em>name_ac<\/em> definition will use ngrams for efficient prefix queries and a filter that will remove all the funky, special, language specific characters &#8211; the <em>solr.ASCIIFoldingFilterFactory<\/em>. We need to do that both during index and during query time, so that both sides of the analysis works. The <em>text_ac <\/em>type definition looks as follows:\n<\/p>\n<pre class=\"brush:xml\">&lt;fieldType name=\"text_ac\" class=\"solr.TextField\" positionIncrementGap=\"100\"&gt;\n&nbsp; &lt;analyzer type=\"index\"&gt;\n&nbsp;&nbsp;&nbsp; &lt;tokenizer class=\"solr.KeywordTokenizerFactory\"\/&gt;\n&nbsp;&nbsp;&nbsp; &lt;filter class=\"solr.LowerCaseFilterFactory\"\/&gt;\n&nbsp;&nbsp;&nbsp; &lt;filter class=\"solr.ASCIIFoldingFilterFactory\"\/&gt;\n&nbsp;&nbsp;&nbsp; &lt;filter class=\"solr.EdgeNGramFilterFactory\" minGramSize=\"1\" maxGramSize=\"100\" \/&gt;\n&nbsp; &lt;\/analyzer&gt;\n&nbsp; &lt;analyzer type=\"query\"&gt;\n&nbsp;&nbsp;&nbsp; &lt;tokenizer class=\"solr.KeywordTokenizerFactory\"\/&gt;\n&nbsp;&nbsp;&nbsp; &lt;filter class=\"solr.LowerCaseFilterFactory\"\/&gt;\n&nbsp;&nbsp;&nbsp; &lt;filter class=\"solr.ASCIIFoldingFilterFactory\"\/&gt;\n&nbsp; &lt;\/analyzer&gt;\n&lt;\/fieldType&gt;<\/pre>\n<p>During index time, we lowercase the whole input, remove language specific characters and turn them into ASCII equivalent and finally created edge ngram with the minimum size of 1, which means that the ngram will start from a single letter and will have a maximum size of 100. During query time, we lowercase the whole input and remove language specific characters and that&#8217;s all that is needed.<\/p>\n<h3>Let&#8217;s Test<\/h3>\n<p>To test the changes we will start Solr in the SolrCloud mode with the local ZooKeeper by running the following command from Solr home directory:\n<\/p>\n<pre class=\"brush:xml\">$ bin\/solr start -c<\/pre>\n<p>Next we will upload the configuration that includes all the changes to ZooKeeper by running the following command:\n<\/p>\n<pre class=\"brush:xml\">$ bin\/solr zk upconfig -z localhost:9983 -n autocomplete -d \/home\/config\/autocomplete\/conf<\/pre>\n<p>The only thing you need to care about for that command to work on your local machine is adjusting the directory with the configuration. The configuration itself can be downloaded from our Github account (<a href=\"http:\/\/autocomplete_with_special_chars\/conf\/\">config download<\/a>).<\/p>\n<p>Now we need to create the collection by running the following command:\n<\/p>\n<pre class=\"brush:xml\">$ curl 'localhost:8983\/solr\/admin\/collections?action=CREATE&amp;name=autocomplete&amp;numShards=1&amp;replicationFactor=1&amp;collection.configName=autocomplete'<\/pre>\n<p>We have everything in place to index our documents, which we can do by running the following command:\n<\/p>\n<pre class=\"brush:xml\">$ curl \"http:\/\/localhost:8983\/solr\/autocomplete\/update?commit=true\" -H 'Content-type:application\/json' -d '[\n&nbsp; {\"id\":1, \"name\":\"Po\u015brednictwo nieruchomo\u015bci\"},\n&nbsp; {\"id\":2, \"name\":\"Posadowienie budynk\u00f3w\"},\n&nbsp; {\"id\":3, \"name\":\"Posocznica\"}\n]'<\/pre>\n<p>So now, let&#8217;s run two queries that should tell us if the above method really works. The first query looks as follows:\n<\/p>\n<pre class=\"brush:xml\">http:\/\/localhost:8983\/solr\/autocomplete\/select?q.op=AND&amp;defType=edismax&amp;qf=name_ac&amp;fl=id,name&amp;q=pos<\/pre>\n<p>The second query looks as follows:\n<\/p>\n<pre class=\"brush:xml\">http:\/\/localhost:8983\/solr\/autocomplete\/select?q.op=AND&amp;defType=edismax&amp;qf=name_ac&amp;fl=id,name&amp;q=pos<\/pre>\n<p>In both cases we are running a query using the Extended DisMax query parser, we are using <em>AND<\/em> as the Boolean operator (<em>q.op<\/em> parameter) and we are saying that we want to run the query on the <em>name_ac<\/em> field by using the <em>qf<\/em> parameter. We also say that we only want the <em>id<\/em> and <em>name<\/em> field to be returned in the search results by specifying the <em>fl<\/em> parameter.<\/p>\n<p>The results returned by Solr, in both cases are identical and look as follows:\n<\/p>\n<pre class=\"brush:xml\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;\n&lt;response&gt;\n&lt;lst name=\"responseHeader\"&gt;\n&nbsp; &lt;bool name=\"zkConnected\"&gt;true&lt;\/bool&gt;\n&nbsp; &lt;int name=\"status\"&gt;0&lt;\/int&gt;\n&nbsp; &lt;int name=\"QTime\"&gt;0&lt;\/int&gt;\n&nbsp; &lt;lst name=\"params\"&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"q\"&gt;po\u015b&lt;\/str&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"defType\"&gt;edismax&lt;\/str&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"qf\"&gt;name_ac&lt;\/str&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"fl\"&gt;id,name&lt;\/str&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"q.op\"&gt;AND&lt;\/str&gt;\n&nbsp; &lt;\/lst&gt;\n&lt;\/lst&gt;\n&lt;result name=\"response\" numFound=\"3\" start=\"0\"&gt;\n&nbsp; &lt;doc&gt;\n&nbsp;&nbsp;&nbsp; &lt;int name=\"id\"&gt;1&lt;\/int&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"name\"&gt;Po\u015brednictwo nieruchomo\u015bci&lt;\/str&gt;&lt;\/doc&gt;\n&nbsp; &lt;doc&gt;\n&nbsp;&nbsp;&nbsp; &lt;int name=\"id\"&gt;2&lt;\/int&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"name\"&gt;Posadowienie budynk\u00f3w&lt;\/str&gt;&lt;\/doc&gt;\n&nbsp; &lt;doc&gt;\n&nbsp;&nbsp;&nbsp; &lt;int name=\"id\"&gt;3&lt;\/int&gt;\n&nbsp;&nbsp;&nbsp; &lt;str name=\"name\"&gt;Posocznica&lt;\/str&gt;&lt;\/doc&gt;\n&lt;\/result&gt;\n&lt;\/response&gt;<\/pre>\n<p>So the method works \ud83d\ude42<\/p>","protected":false},"excerpt":{"rendered":"<p>Its been a long time since the last real blog post here on solr.pl, but better later than ever, right? So, today we will get back to the topic of autocomplete functionality with Solr Suggesters, facets and a standard queries<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27],"tags":[229,164],"class_list":["post-915","post","type-post","status-publish","format-standard","hentry","category-solr-en","tag-autocomplete","tag-solr-2"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/915","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=915"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/915\/revisions"}],"predecessor-version":[{"id":916,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/915\/revisions\/916"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=915"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/categories?post=915"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/tags?post=915"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}