{"id":258,"date":"2011-04-11T20:38:34","date_gmt":"2011-04-11T18:38:34","guid":{"rendered":"http:\/\/sematext.solr.pl\/?p=258"},"modified":"2020-11-11T20:38:59","modified_gmt":"2020-11-11T19:38:59","slug":"car-sale-application-unicode-collation-sorting-text-in-a-language-sensitive-way-part-4","status":"publish","type":"post","link":"https:\/\/solr.pl\/en\/2011\/04\/11\/car-sale-application-unicode-collation-sorting-text-in-a-language-sensitive-way-part-4\/","title":{"rendered":"&#8220;Car sale application&#8221; \u2013 Unicode Collation, sorting text in a language-sensitive way (part 4)"},"content":{"rendered":"<p>In the <a href=\"http:\/\/solr.pl\/en\/2011\/03\/14\/car-sale-application-\u2013-spatial-search-adding-location-data-part-3\/\" target=\"_blank\" rel=\"noopener noreferrer\">third part<\/a> of our \u201dCar sale\u201d application related posts we added some location data and the information about the city that is related to every car. Shortly afterwards we added the possibility to sort using the city field by simply modifying the schema:<\/p>\n\n\n<!--more-->\n\n\n<pre class=\"brush:xml\">&lt;field name=\"city_sort\" type=\"lowercase\" indexed=\"true\" stored=\"false\" \/&gt;\n...\n&lt;copyField source=\"city\" dest=\"city_sort\"\/&gt;<\/pre>\n<p>It turned out, that sorting using the city_sort field did not work as we expected. All because of the polish signs appearing in the city names. What should we do with it ?<\/p>\n<h2><!--more-->Requirements specification<\/h2>\n<p>Let&#8217;s check if the \u201ecity_sort\u201d field sorting does really not working well in conjunction with the polish signs. When we enter the query:\n<\/p>\n<pre class=\"brush:xml\">q=*:*&amp;fl=city&amp;sort=city_sort+asc<\/pre>\n<p>we have the result:\n<\/p>\n<pre class=\"brush:xml\">&lt;result name=\"response\" numFound=\"6\" start=\"0\"&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Bia\u0142ystok&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Koszalin&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Szczecin&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Warszawa&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;\u015awidnik&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;\u0141owicz&lt;\/str&gt;\n   &lt;\/doc&gt;\n&lt;\/result&gt;<\/pre>\n<p>That&#8217;s really not what we expect. We would like to have:\n<\/p>\n<pre class=\"brush:xml\">&lt;result name=\"response\" numFound=\"6\" start=\"0\"&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Bia\u0142ystok&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Koszalin&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;\u0141owicz&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Szczecin&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;\u015awidnik&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Warszawa&lt;\/str&gt;\n   &lt;\/doc&gt;\n&lt;\/result&gt;<\/pre>\n<p>To make the sorting functionality work well, we will use the \u201esolr.CollationKeyFilter\u201d filter.<\/p>\n<h2>solr.CollationKeyFilter<\/h2>\n<p>The filter called solr.CollationKeyFilter is used at index time, indexing special &#8220;sort keys&#8221; into the sort field. It allows us to choose the collator related to wanted country and language. We can also choose the strength of the collation which determines the minimum level of difference considered significant during comparison. For example:\n<\/p>\n<pre class=\"brush:xml\">&lt;filter class=\"solr.CollationKeyFilterFactory\" language=\"es\" country=\u201dES\u201d strength=\"primary\" \/&gt;<\/pre>\n<p>The given example shows us the configuration of the solr.CollationKeyFilterFactory, where we want to handle the spanish language with the <a href=\"http:\/\/download.oracle.com\/javase\/1.5.0\/docs\/api\/java\/text\/Collator.html#PRIMARY\" target=\"_blank\" rel=\"noopener noreferrer\">primary<\/a> strength.<\/p>\n<h2>Schema.xml changes<\/h2>\n<ol>\n<li>New field types definitions:\n<ul>\n<pre class=\"brush:xml\">&lt;fieldType name=\"polishLowercase\" positionIncrementGap=\"100\"&gt;\n  &lt;analyzer&gt;\n    &lt;tokenizer class=\"solr.KeywordTokenizerFactory\"\/&gt;\n    &lt;filter class=\"solr.LowerCaseFilterFactory\" \/&gt;\n    &lt;filter class=\"solr.TrimFilterFactory\" \/&gt;\n    &lt;filter class=\"solr.CollationKeyFilterFactory\"  language=\"pl\" country=\u201dPL\u201d strength=\"primary\" \/&gt;\n  &lt;\/analyzer&gt;\n&lt;\/fieldType&gt;<\/pre>\n<p>As we may notice, it&#8217;s the definition of the currently existing \u201elowercase\u201d type, where we added the solr.CollationKeyFilter, handling the polish language. The type will be used for the fields, where the data contains polish signs.<\/p>\n<\/ul>\n<\/li>\n<li>New \u201ecity_sort\u201d field definition:\n<ul>\n<li>let&#8217;s change the type for the \u201ecity_sort\u201d field to \u201epolishLowercase\u201d:<\/li>\n<pre class=\"brush:xml\">&lt;field name=\"city_sort\" type=\"polishLowercase\" indexed=\"true\" stored=\"false\" \/&gt;<\/pre>\n<\/ul>\n<\/li>\n<\/ol>\n<h2>Functional tests<\/h2>\n<p>Before we check if the given field type change is just what we need, we must remember that the solr.CollationKeyFilter is used at index time, so we need to re-index all of the data.<\/p>\n<p>Now let&#8217;s check our test query result:\n<\/p>\n<pre class=\"brush:xml\">q=*:*&amp;fl=city&amp;sort=city_sort+asc<\/pre>\n<p>It appears that the result is correct:\n<\/p>\n<pre class=\"brush:xml\">&lt;result name=\"response\" numFound=\"6\" start=\"0\"&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Bia\u0142ystok&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Koszalin&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;\u0141owicz&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Szczecin&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;\u015awidnik&lt;\/str&gt;\n   &lt;\/doc&gt;\n   &lt;doc&gt;\n      &lt;str name=\"city\"&gt;Warszawa&lt;\/str&gt;\n   &lt;\/doc&gt;\n&lt;\/result&gt;<\/pre>\n<h2>The end<\/h2>\n<p>Yet another reported problem has been solved successfully. We have improved the quality of the sorting mechanism, where we must handle the polish signs, by adding the solr.CollationKeyFilter which entirely fulfilled our needs. Now we can only wait for another notifications and improvements \ud83d\ude42<\/p>","protected":false},"excerpt":{"rendered":"<p>In the third part of our \u201dCar sale\u201d application related posts we added some location data and the information about the city that is related to every car. Shortly afterwards we added the possibility to sort using the city field<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27],"tags":[],"class_list":["post-258","post","type-post","status-publish","format-standard","hentry","category-solr-en"],"_links":{"self":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/comments?post=258"}],"version-history":[{"count":1,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/258\/revisions"}],"predecessor-version":[{"id":259,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/posts\/258\/revisions\/259"}],"wp:attachment":[{"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/media?parent=258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/categories?post=258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solr.pl\/en\/wp-json\/wp\/v2\/tags?post=258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}