FieldCollapsing, or in other words grouping of search results has just been commited to the svn repository. I decided to take a look at this functionality and see how it works.
6 sins of solrconfig.xml modifications
Solrconfig.xml file is another file that defines the behavior Solr. Unlike a file that describes the structure of the index file solrconfig.xml determines the functionality available in Solr. Just like in the case schema.xml file we can distinguish a number of standard mistakes made by those who implement Solr, and I’m not talking only about people who have little experience with Solr. In order to learn some of those mistakes I invite you to read the following entry.
Solr: data indexing for fun and profit
Solr is not very friendly to novice users. Preparing good schema file requires some experience. Assuming that we have prepared the configuration files, what remains for us is to share our data with the search server and take care of update ability.
5 sins of schema.xml modifications
I made a promise and here it is – the entry on the most common mistakes when designing Solr index, which is when You create or modify the schema.xml file for Your system implementation. Feel free to read on 😉
The scope of Solr faceting
Faceting is one of the ways to categorize the content found in the process of information retrieval. In case of Solr this is the division of set of documents on the basis of certain criteria: content of individual fields, queries or on the basis of compartments or dates. In today’s entry I will try to some scope on the possibility of using the faceting mechanism, both currently available in Solr 1.4.1, as well as what will be available in the future.
English rss feed available
Just a quick information. Recently many people asked about English rss feed. We patched one of our plug-ins and here it is – rss feed in English. If You want to subscribe, to English rss feed, go to the following URL: http://solr.pl/feed?lang=en.
What is schema.xml?
One of the configuration files that describe each implementation Solr is schema.xml file. It describes one of the most important things of the implementation – the structure of the data index. The information contained in this file allow you to control how Solr behaves when indexing the data, or when making queries. Schema.xml is not only the very structure of the index, is also detailed information about data types that have a large influence on the behavior Solr, and usually are treated with neglect. This entry will try to bring some insight about schema.xml.
6 deadly sins in the context of query
In my work related to Lucene and Solr I have seen various queries. While in the case of Lucene, developer usually knows what he/she wants to achieve and use more or less optimal solution, but when it comes to Solr it is not always like this. Solr is a product which could theoretically be used by everyone, both the person who knows Java, one that does not have a broad and specialized technical knowledge, as well as programmer. Precisely because of that Solr is a product which is easy to run and use it, at least when it comes to simple functionalities. I suppose, that is why not many people are worried about reading Solr wiki or at least review the mailing list. As a result, sooner or later people tend to make mistakes. Those errors arise from various shortcomings – lack of knowledge about Solr, lack of skills, lack of experience or simply a lack of time and tight deadlines. Today I would like to show some major mistakes when submitting queries to Solr and how to avoid those mistakes.
Search Process
We hereby inaugurate part of solr.pl which is not related to a particular search engine but rather to the development and functioning of websites related to search.
Do you ever wondered what causes the search engine, on the site, to be considered good ? To answer this question we should consider how a typical process of finding desired information by the customer look like, and is there such a thing as a typical process.
CSVResponseWriter
Solr recently received another small, but worth mentioning functionality – another response format available in standard distribution – CSV response format. I decided to write a short note about it.