<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sorting &#8211; Solr.pl</title>
	<atom:link href="https://solr.pl/en/tag/sorting-2/feed/" rel="self" type="application/rss+xml" />
	<link>https://solr.pl/en/</link>
	<description>All things to be found - Blog related to Apache Solr &#38; Lucene projects - https://solr.apache.org</description>
	<lastBuildDate>Wed, 11 Nov 2020 08:18:23 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>Sorting by function value in Solr (SOLR-1297)</title>
		<link>https://solr.pl/en/2011/02/28/sorting-by-function-value-in-solr-solr-1297/</link>
					<comments>https://solr.pl/en/2011/02/28/sorting-by-function-value-in-solr-solr-1297/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 28 Feb 2011 08:17:46 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[function]]></category>
		<category><![CDATA[function sorting]]></category>
		<category><![CDATA[function value]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[SOLR-1297]]></category>
		<category><![CDATA[sorting]]></category>
		<category><![CDATA[sorting by function value]]></category>
		<category><![CDATA[value]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=217</guid>

					<description><![CDATA[In Solr 3.1 and later we have a very interesting functionality, which enables us to sort by function value. What that gives us ? Actually a few interesting possibilities. Let&#8217;s start The first example that comes to mind, perhaps because]]></description>
										<content:encoded><![CDATA[<p>In Solr 3.1 and later we have a very interesting functionality, which enables us to sort by function value. What that gives us ? Actually a few interesting possibilities.</p>
<p><span id="more-217"></span></p>
<h3>Let&#8217;s start</h3>
<p>The  first example that comes to mind, perhaps because of the project on  which I worked some time ago, it&#8217;s sorting on the  basis of distance  between two geographical points. So far, to implement such functionality was needed changes in Solr (for example,  <em>LocalSolr </em>or&nbsp;<em>LocalLucene</em>). Using Solr 3.1 and later, you can sort your search results using the value returned by the defined functions. For example, is Solr, we have the dist function calculating the distance between two points. One variation of the function is a function accepting five parameters: algorithm and two pairs of points. If,  using this feature, we would like to sort your search results in  ascending order from the point of latitude and longitude 0.0, we should  add the following sort parameter to the Solr query:
</p>
<pre class="brush:xml">...sort=dist(2, geo_x, geo_y, 0, 0) asc</pre>
<p>I suspect that the most commonly used values of the first parameter will be:</p>
<ul>
<li><em>1</em> &#8211; calculation based on the Manhattan metrics</li>
<li><em>2</em> &#8211; calculation of Euclidean distance</li>
</ul>
<h3>A few words about performance</h3>
<p>Everything is fine till now, but how it looks like in terms of performance ? I&#8217;ve made a two simple tests.</p>
<p>During  the first test, I indexed 200 000 documents, every one of them  consisted of four fields: identifier (numeric field), description (a<em> text</em> field) and location (two numeric fields). In  order not to obscure the test results for sorting, I used one of the  simplest functions currently available in the Solr &#8211; the <em>sum </em>function which sums two  given arguments. I compared the query time of the default sorting (by <em>score</em>) with the ones which used the value of the function. The following table shows the results of the test:</p>
[table “13” not found /]<br />

<p>Another test was based on a comparison of sorting by a string field to sort using function. The  test was almost identical to the first test. I&#8217;ve indexed 200,000  documents indexed (with additional field: <em>name_sort</em> &#8211; type <em>string</em>) and  used the <em>sum</em> function. The following table shows the results of the test:</p>
[table “15” not found /]<br />

<p>Above test shows that sorting using the sort function is much slower than the default sort order (which you&#8217;d expect). Sorting  on the basis of function value is also slower than sorting with the use  of <em>string </em>based field, but the difference is not as significant as in  the previous case.</p>
<h3>A few words at the end</h3>
<p>Of  course, the above test just glides through the topic of sorting  efficiency using Solr functions, however, shows a direct relationship. Given  that, in most cases, this will not be the default sort method and  giving us a really powerful tool it seems to me that this is a feature  worth remembering. It  will definitely be worth using when the requirements says that we have  to sort on the value that depends on the query and index values &#8211; as in  the case of sorting by distance from the point specified by the user.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2011/02/28/sorting-by-function-value-in-solr-solr-1297/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Quick look &#8211; IndexSorter</title>
		<link>https://solr.pl/en/2010/10/04/quick-look-indexsorter/</link>
					<comments>https://solr.pl/en/2010/10/04/quick-look-indexsorter/#respond</comments>
		
		<dc:creator><![CDATA[Rafał Kuć]]></dc:creator>
		<pubDate>Mon, 04 Oct 2010 12:14:20 +0000</pubDate>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[index]]></category>
		<category><![CDATA[index sorter]]></category>
		<category><![CDATA[index sorting]]></category>
		<category><![CDATA[indexsorter]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[sorting]]></category>
		<guid isPermaLink="false">http://sematext.solr.pl/?p=82</guid>

					<description><![CDATA[At the Apache Lucene Eurocon 2010 conference, which took place in May this year, Andrew Białecki in his presentation talked about how to obtain satisfactory search results when using early termination search techniques. Unfortunately the tool he mentioned, was not]]></description>
										<content:encoded><![CDATA[<p>At  the Apache Lucene Eurocon 2010 conference, which took place in May this  year, Andrew Białecki in his presentation talked about how to obtain  satisfactory search results when using early termination search  techniques. Unfortunately the tool he mentioned, was not available in Solr &#8211; but it changed.</p>
<p><span id="more-82"></span></p>
<p>At the time of writing, the described tools are available only in branch named<em> branch_3x</em> in SVN, but it is planned to migrate this functionality to  version 4.x.</p>
<h3>But what is it?</h3>
<p>Using  the techniques of terminating the search after a predetermined time,  without looking at the number of search results, at some point we come  across the problem of the quality of search results. Instead  of receiving the best results, in the context of the search query, we  get them in a random fashion (or at least they may look random). This means that we are not able to ensure that the user that uses the system gets the best matching results. Of  course, we talk about the situation, when you terminate the search  after a predetermined period of time and we that is why Solr can&#8217;t  gather all the documents that match your query.</p>
<h3>Is it useful for me ?</h3>
<p>When ending a search after a predetermined time may be useful? There are many uses cases of such a search. Imagine that our implementation is composed of many separate shards, which operate on large amounts of data each. When  making a distributed query, each of the shards, present in the search  system, must be queried for relevant documents, then all results must be  gathered and displayed to the end user (of course, this not need to be a  man, this may be an application). But  what if each of the shards needs a very long time to process all search  results, and we are, for example, only interested in those added in  recent times (eg last week). This  is where we have the possibility of early termination of search query &#8211;  assuming that we are more interested in documents added the day before  rather than two weeks ago.</p>
<h3>How to achieve it ?</h3>
<p>Example above illustrates the case when we can use the search that is terminated after a specified time. However, when looking further into search results we come to a problem &#8211; to sort search results Solr must collect them all. So  when making query with a sort parameter like <code>sort=added+desc</code> to get the  documents sorted correctly, each of the shards would have to return all  search results &#8211; this mean that we can&#8217;t use early termination of  search ? Not really. To  help us, Solr provides a tool &#8211; IndexSorter, which until now was  available only in the Apache Nutch project, but recently was commited to  Lucene and Solr. With this tool, we can pre-sort the index by the parameter that we need. Thus,  an index sorted descending by date of a document adding, Solr would  first get the documents that have been added lately, and thus we would  be able to use early termination.</p>
<h3>Using IndexSorter</h3>
<p>What to do to use the IndexSorter tool ? Can I tell You the truth ? &#8211; It&#8217;s not that complicated. Note,  however, that at the time of publication of this entry the mentioned  tool is only available in <em>branch_3x</em> of Lucene/Solr project. To  sort an index on the basis of a field, run the following command from  the command line (of course keeping in mind the appropriate location of  the library<em> lucene-misc-3.1.jar</em> &#8211; after building the project we find it  in directory<em> lucene/build/contrib/misc</em>):
</p>
<pre class="brush:bash">java IndexSorter SOURCE_DIRECTORY TARGET_DIRECTORY FIELD_NAME</pre>
<p>The parameters mean:</p>
<ul>
<li><em>SOURCE_DIRECTORY </em>&#8211; a catalog with an index that you want to sort,</li>
<li><em>TARGET_DIRECTORY</em> &#8211; the directory where sorted index will be saved,</li>
<li><em>FIELD_NAME </em>&#8211; the field on which basis the index will be sorted.</li>
</ul>
<p>If everything goes correctly, You should see something like this:
</p>
<pre class="brush:bash">IndexSorter: done, 896 total milliseconds</pre>
<h3>The end</h3>
<p>In  my opinion, Lucene and Solr just got a very interesting feature, which  can be used for example wherever the amount of data is very large, when  response time can not exceed a certain time limit, or when the results  beyond the first (the first 100 or 1000) are not significant. All  who are interested in the subject or index sorting and early  termination techniques should watch a slide presentation titled  &#8220;<em>Munching and Crunching: Lucene Index Post-Processing</em>&#8221; (<a href="http://lucene-eurocon.org/slides/Munching-&amp;-crunching-Lucene-index-post-processing-and-applications_Andrzej-Bialecki.pdf" target="_blank" rel="noopener noreferrer">slides</a>) led by  Andrzej Bialecki during Lucene Eurocon Conference 2010, who discussed  these topics.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://solr.pl/en/2010/10/04/quick-look-indexsorter/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
