Conferences – Solr.pl

Explain 0.9.1 – New version

Marek Rogoziński — Wed, 11 Jan 2012 20:58:24 +0000

After a few weeks (counting Christmas, when people don’t usually analyze theirs Solr queries ;)) we would like to share some thoughts about releasing explain.solr.pl.

We noticed the following:

Explain tool have goaded some attention, but most to the explains are marked as private.
In relatively large amount of explain only query was submit instead of the whose response.
Often the results don’t have diagnostic information (debugQuery=true).
Unreleased Apache Solr 4.0 is quite popular
Explain parser did a good job (of course without counting the obvious Solr 4.0 explain parsing problems).

Our thoughts:

We need to work on clarity of the messages.
We need to focus on enabling Solr 4.0 explain parsing.

Today new version of explain.solr.pl was deployed. The changes are as follows:

Simple analysis of query performance was added.
Displayed messages are now more user friendly.
Explain parser was corrected according to text files generated by Solr 3.5 unit test cases.
Minor GUI enhancements.

Explain.solr.pl beta version available

Rafał Kuć — Fri, 23 Dec 2011 20:57:02 +0000

We are pleased to announce, that http://explain.solr.pl is now available for users. Please remember that this is a beta version and will be further developed. If you want to know more details please read the following blog post.

Design

Currently explain.solr.pl can analyze the query of 3.x version of Solr and it’s getting ready to be able to handle 4.0 version of Solr.

Explain visualization

In the current version explain.solr.pl visualizes score values of the given documents.

In order to add a new explain visualization you need to run your query with an additional debugQuery=on parameter and next paste the result returned by Solr to explain.solr.pl (if you will use web browser to get the results please remember to paste the page source). After hitting the “Create explain” button your explain visualization will be created and you will be taken to that visualization web page. Currently explain.solr.pl supports only results in XML format, so if you use another format (like json) please ommit the wt parameter or set it to wt=xml.

It is also possible to verify hits for any document, even one that was not found in the search results (given to Solr with the use of explainOther parameter). It can be useful to check why the given document wasn’t included in the result list.

Available information

After creating the new explain visualization you have a series of information available. Information about the query is shown ones for the whole explain, while the rest of the information is shown for each document.

Information about query

In this part of explain.solr.pl you can see:

Unique identifier of your explain visualization.
Information about debugOther parameter not being used.
The actual query.
Information about explain visualization being available to public.

Fields and values

In this part of the explain.solr.pl you can see information about fields and theirs values that were available in the search results.

Visualization of elements influencing total score value

The following information is visible in this part of the explain.solr.pl:

Percentage influence of each value on the global score value of the document.
Pie chart visualizing the data visible in the left column.

Getting back to once visualized explain

If you wish to get back to your once visualised explain please to the following URL address: http://explain.solr.pl/explains/EXPLAIN_ID, where EXPLAIN_ID is the unique identifier of your explain. If you don’t remember unique identifier of your visualized explain, you can find it out in the “history” page.

Hiding your explain

If you don’t want your explain information to be available to public, please uncheck the “I want this explain to be visible on history page” checkbox during explain creation. Please remember that in order to get back to private explain you will need to remember its unique identifier.

Issues raising

To raise a new issue You found in explain.solr.pl, please create a new issue on https://github.com/solrpl/explain or send a mail to explain(at)solr(dot)pl.

Request

We also have a request to the community. If you have some notes, if you encounter an error or would like some additional functionality please write an mail to explain(at)solr(dot)pl. This will allow us to better understand Your needs and will make explain.solr.pl better suite Your needs.

Something at the end

At the very end we would like to show you how the process of developing http://explain.solr.pl looks like. We used a tool called gsource

Lucene Eurocon 2011 – day two

Rafał Kuć — Mon, 14 Nov 2011 20:53:41 +0000

In the previous entry, we tried to cover what we saw at the first day of Lucene Eurocon 2011 conference. However, from our point of view, the second day was the more important one, due to the fact that this was the day we had our talk. But let’s start from the beginning.

Keynote (Realtime Search at Twitter, Michael Busch)

Day two began with one of the better, in my opinion, talks during Lucene Eurocon 2011. Michael Busch, during his presentation titled “Realtime Search at Twitter” (slides, video) showed us what the Twitter team to handle real-time search based on Lucene, so that the whole system was able to handle the daily traffic of Twitter. The presentation consisted of a very large amount of technical details, details that were explained in details. Conference participants, in addition to the information about Lucene, also got a brief lesson of Java multithreaded programming

What will come (Lucene Today, Tomorrow & Beyond, Simon Willnauer)

The conference was divided into three parallel sessions, and we chose the presentation titled “Lucene Today, Tomorrow & Beyond” (slides, video) , led by Simon Willnauer. The presentation focused on what Lucene is not capable of at this time, and what it may be capable of in the future (for example, in version 5). Simon began by presenting the history of Lucene, the people involved in the development of the project and the companies they work for. Then we found out what users can expect from Lucene (in terms of functionalities), when version 4.0 is officially released. During the talk, Simon also talked about that the role of positions of tokens in the token stream, incremental updates and JIT. At the end we were shown a part of his vision of Lucene and Solr as a modular projects.

A thing about explain in Solr (Understanding and Visualising Solr ‘explain’ Information, Rafał Kuć)

For personal reasons and because I wanted to prepare for the talk I desided to skip one of the talks. And after the lunch, my talk began. The presentation titled “Understanding and Visualising Solr ‘explain’ Information” (slides) was divided into two parts – first, the theory regarding the information returned by Apache Solr and the second – what we have done for http://explain.solr.pl. Let me not to comment on my own presentation, and focus on what I felt after it. Unfortunately, I was not pleased after the talk, stress ate me alive – I didn’t say all I wanted to say, I rushed and not explained what I wanted to. But now, after having some experience I can promise one thing – next time will be better, much better. I am pleased with the positive reception of the talk and the topic itself.

Randomized tests (Randomized Continous Testing: Solr & Lucene Use Case, Dawid Weiss)

After a short rest we went to listen as David Weiss talks about tests using a random factor during the talk titled “Randomized Continous Testing: Solr & Lucene Use Case” (slides). If you are interested about the automated testing (and you should be) I definitely recommend the slides, as well as video as soon as it’ll be available. David started with the theory about where you can add a random factor to the tests (of course everything in the context of Lucene and Solr). Then we got a good deal of technical information on how to implement a randomized test (with examples of course). There were also information on how to test multi-threading.

How to test your search engine (Better Search Engine Testing, Eric Pugh)

Staying in the topic of testing, we decided to go to the last talk during Lucene Eurocon 2011 titled “Better Search Engine Testing” (slides), led by Eric Pugh.

Commiter panel

The last attraction in the Lucene Eurocon 2011 was the “Commiter panel” – the meeting of conference participants and the people that develop Apache Lucene and Solr every day. After brief introductions, there was a series of questions and answers. If you are interested how the meeting went, you will have to wait for the video material. Another interesting presentation, at least from my point of view. Eric started from the fact that search, and particularly good search is increasingly important in today’s world. What’s more, data storage is cheap, but data management is by no means cheap, and therefore the role of search engines will increase. Eric then turned our attention to the problem of testing the search engines, what should be tested and how we can automatically test the search engines. It’s hard to write a summary of that talk, due to the large amount of information that Eric had given.

To sum up

From my point of view, Lucene Eurocon 2011 conference was much better organized than Lucene Eurocon 2010, even considering the fact that the Lucene Eurocon 2010 conference was very good. You could see that the organizers gained experience The talks at the conference were miscellaneous – there were strictly technical subjects and those that focused on the user. Because of the three parallel sessions, it was impossible not to find the presentation for yourself – the organizers deserve thanks and congratulations, they did a very good job. I hope that we will meet most of this year’s participants in the Lucene Eurocon 2012

explain.solr.pl: Status

Marek Rogoziński — Fri, 11 Nov 2011 20:52:52 +0000

During the Lucene Eurocon 2011 conference we gave a talk about a tool that will enable you to analyze Solr search results. We promised, that the public version of the tool will be released soon. We would like to ensure that we are getting close to that moment. Right now we are focusing on the following things:

rebuilding Lucene explain information analyzer so it is ready for the changes in Lucene 4.0 (per field similarity and flexible similarity)
user interface rebuild along with some changes to the usability
code cleaning and preparation for publishing sources

Fortunately autumn and winter evenings are just perfect for work We will try to make http://explain.solr.pl available as soon as possible.

Lucene Eurocon 2011 – day one

Rafał Kuć — Mon, 07 Nov 2011 20:52:18 +0000

As we wrote a few days ago we are back from this year Lucene Eurocon, which took place in Barcelona. Despite the fact that the videos will be available shortly, we decided to wrote something about those presentations we attended to.

Keynote (Search + Big Data: It’s (still) All About the User”, Grant Ingersoll)

The first day started at 8:30, at the time of the day which is called a “night” by the Spanish This year the conference was started by Grant Ingersoll with his presentation titled “Search + Big Data: It’s (still) All About the User” (slides, video). Grant reminded us that despite all the superb technologies, we deal with every day, we should always focus at the end user. Therefore, we should do everything to ensure that users are satisfied with the application we created. Grant showed that, despite the fact that developers have more and more technologically advanced tools, everything comes down to the same – the most important is still the user. You should definitely keep this in mind.

Keynote #2 (Architecting the Future of Big Data & Search, Eric Baldeschwieler)

The second presentation we attended to during the conference was “architecting the Future of Big Data & Search” by Eric Baldeschwieler (slides, video). During the presentation Eric tried to answer whether Lucene and Hadoop are able to work efficiently within a single system. At the beginning we were given a large dose of information on the Map/Reduce algorithm, Hadoop and HBase. The second part of the presentation was a case study of the implementation of the Apache Hadoop at Yahoo and how Map/Reduce helped in the implementation of daily tasks of the system. At the very end of the we were given information about Lucene and Hadoop integration. Lots of interesting information, I recommend to watch the movie as soon as it gets published.

The thing about the index structure (Portable Lucene Index Format & Applications, Andrzej Białecki)

At this time the conference was split into three parallel sessions and we decided to go and listen to Andrzej Bialecki and his presentation titled “Lucene Index Portable Format & Applications” (slides, video). Andrzej started with information about the structure of Apache Lucene index and why backward compatibility is difficult to achieve. Then, the presentation focused on its title, which is why portable Lucene index format is needed, what are the challenges and what is the state of implementation of PortableCodec. Attendees could also see an interesting example of how to use the SimpleTextCodec and how the index data are stored in plain text:) I recommend the video to watch the whole presentation.

First topic about Solr (Improving Solr’s Update Chain, Jan Høydahl)

We changed the conference room to listen to Jan Høydahl talking about “Improving Solr’s Update Chain” (slides). During the presentation Jan showed how to define your own components in the Solr update chain. It turns out that a simple modification of the XML file allows you to expand and adapt the process of indexing the data for your own needs. During the presentation we were also shown how to implement your own update chain components that will enable you to influence the indexation process. A large part of the presentation was the case-study of the implementation of a project for the University of Oslo. At the end of the presentation we were given information about the plans and the functionalities that have been donated to the Apache Software Foundation as patches to Solr.

Know what your users are doing (Search Analytics: Business Value & BigData NoSQL Backend, Otis Gospodnetic)

Another presentation focused on what should be one of the most esential things in all search systems – the user, presented by Otis Gospodnetic under the title “Search Analytics: Business Value & BigData NoSQL backend” (slides, video). If you are using Solr you should look at what Sematext has to offer and what’s more – completely for free (at least for now). During the presentation Otis told the audience that the analysis of user behavior and how users use your system, is not an option – it’s necessary. Without this it is impossible to tune and refine the search, because I do not know how, nor did we know where. Additionally, without the data about user behavior, we can not say whether the changes that we do are going in the right direction. I highly recommend watching the video.

Solr and Hadoop (Scaling Search at Trovit with Solr & Hadoop, Marc Sturlese)

After a short break we decided to listen about integration of Solr and Hadoop during the presentation “Scaling Search with Solr at Trovit & Hadoop” (slides), led by Marc Sturlese. The author focused on how they managed to disperse the indexation based using the Map/Reduce algorithm. In short, the presentation was about how pretty efficiently, using the data stored in HDFS, create smaller indexes, and then combine them into larger indices (or a large index, depending on your needs). Additionally, Marc showed what they did to take advantage of the functionality of Solr (data analysis and deduplication) not running Solr itself.

Solr and UIMA (Natural language search in Solr, Tommaso Teofili)

The next presentation in which we participated was “Natural language search in Solr” (slides) led by Tommaso Teofili. The presentation was about usage of Apache Solr with conjunction of UIMA and a case-study of a real life project. A bit of technical detail, some performance comparisons and information about the necessity of system learning. Interesting topic and very interesting presentation. I recommend to watch the video as soon as it gets published.

Here comes the new (Improved Search with Lucene 4, Robert Muir)

The last presentation scheduled for the first day of Lucene Eurocon 2011 was “Improved Search with Lucene 4” (slides, video), led by Robert Muir. He talked about the changes that await the Apache Lucene in version 4.0. We also heard a lot about the performance of Lucene 4.0, changes in the API, NRT, and deep paging support. An interesting technical presentation about what we can expect from the upcoming version of Lucene. Persons interested in this subject should refer to the video.

Lighting Talks

Then again, the conference has been combined into a single session which was called “Lighting Talks”. During the session presenters had about 7 minutes for their talk. The talk that I remember the most (because of the large dose of humor) was “Java 7 and Lucene: the story behind the story” by Uwe Schindler which in my opinion should be rather called “Do not use Java 7 for anything” A sincere congratulations for amusing the audience The lighting talks session consisted of the following presentation:

“Morphological Analysis and Named Entity Recognition for your Lucene/Solr Search Applications” (slides) led by Christoph Goller
“Java 7 and Lucene: the story behind the story” (slides) led by Uwe Schindler
“Navigating Subdocuments with Solr” (slides) led by Mikhail Khludnev
“Powered by Lucene: IBM Content Analytics with Enterprise Search” (slides) led by Wolfgang Jung
“Solr performance monitoring” led by Otis Gospodnetic
“Searching in more than 140 years newspaper articles” (slides) led by Nicola Provenzano

Stump the chump (Chris Hostetter)

The last session for the day was fun and it was possible to win a few Euros if you asked a good question (100, 50 and 25 as I recall). The whole session was about asking a question that Chris (aka Hoss) would not been able to answer. Summarizing this part does not make sense, I recommend the video which is available at the following URL: http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/apache-lucene-eurocon-2011-stump-chump.

End of the first day

And that was all about the first day of the conference. Some of the participants then went to FC Barcelona vs FC Viktoria Plzeň, a small part of us went in the direction of their hotels, and all the rest, including us, went to Shoko Club for some cava and something to eat.

Another Lucene Eurocon is a history

Marek Rogoziński — Mon, 31 Oct 2011 20:51:33 +0000

Another Lucene Eurocon is a history. That was a very intense two days, where the basic problem was: which of the three lectures go to. Sometimes the choice was very difficult, the only hope is that this year all the presentations were recorded and will soon be available on the web.

For us the most important was the second day and the presentation of the results of our work: Understanding & Visualising Solr ‘explain’ Information. The presentation consisted of two parts. First, the theoretical, described what Solr presents about the validity of the returned document. This information, unfortunately, do not belong to the most readable, especially when the search is done on many fields, often using dedicated query parser (the screenshot contains explain fragment describing a hit in only one document (!)).

The situation gets worse because the questions from the client about the documents positioning are one of the most time consuming ones during the project.

The second part of the presentation showed what we decided to do with this problem. The idea came after seeing explain.depesz.com – a similar tool for visualization of information provided by the postgreSQL database.

In retrospect, our explain seems to be a good idea. Although its not finished we are using the tool almost everyday. What’s more – the Lucene Eurocon presentation showed us that there is also considerable interest in the community for a tool like this. Thank you for your kind words and the promises of sending us your own modules.

The current version explain.solr.pl focuses on reading the information generated by the solr 3.x. We are currently working on going public, and (somewhat later) the opensourcing the code.

To sum up the plans for the nearest future are quite ambitious:)

Firstly – the chronicler’s duty – we will try to describe how Lucene Eurocon 2011 looked from our perspective. We plan to publish two entries – each of which will be dedicated to one day of the conference. That is something to read before they appear on the official videos from the conference.
Opening explain.solr.pl for a wide audience:)
Publication of explain.solr.pl code (github)
Back to more regular posts about Solr.

Keep your fingers crossed:)

Lucene Eurocon 2010

Rafał Kuć — Mon, 12 Jul 2010 09:17:24 +0000

Following the announcement by the Apache Software Foundation’s intention to abandon the organization of ApacheCon conference on the old continent, we Europeans were left with no conference under the sign of the Apache near us. But as we all know, nature does not like emptiness, and thus the company Lucid Imagination, in cooperation with the sponsors, decided to organize the first conference dedicated to Lucene and Solr – Lucene Eurocon. Due to the fact that we had the pleasure to participate in this conference, we decided to pass you a short account of its progress.

The conference was divided into two parts: a training and a typical conference. I wont write about the training part, because I did not participated in this part of the conference. Trainings were conducted by two commiters of Lucene/Solr project: Erik Hatcher and Grant Ingersoll. More information on topics that were addressed in this part of the conference can be found at http://lucene-eurocon.org/training.html.

The Begining

Proper conference began Thursday, May 20, with the greeting done spoken by Grant Ingersoll, followed by two presentations: “The Search Revolution – How Lucene & Solr Are Changing the World“, led by Lucid Imagination`s CEO – Eric Gries (slides) and “From Publisher To Platform: How The Guardian Used Content, Search, and Open Source To Build a Powerful New Business Model” (slides), presented by Stephen Dunn – the head of strategies and technologies department in the The Guardian. Eric Gries in his presentation focused on the increasing demand for information processing, information retrieval and of course on Lucene/Solr project which we can use in both discused cases. Of course, there was an information about growing interest in people with skills in both projects. I must mention that I envy the skills of talking with such audience interest. I just bow my forehead. While the second presentation, led by Stephen Dunn, was a discussion about the new developer platform released by The Guardian named “The Open Platform” which allow access to a database of articles published by The Guardian since 1991 (millions of documents and still growing). The author focused on describing the technical details of implementation based on a Solr search engine and the profits of using their platform.

Then, the conference split into two tracks. Due to the fact that physical presence was possible only on one of the two presentations, I will describe only those presentations in which I was present. To be precise – full agenda is available at http://lucene-eurocon.org/agenda.html.

Tika and the future of Lucene

I decided that I`ll go for the track one. For the start some interesting and sometimes detailed product information about Tika in presentation called “Text and Metadata Extraction with Apache Tika” (slides) led by Jukka Zitting. He started with a bit of background information, assumptions and historical overview of the project. Then came purely technical informations including code fragments showing how to use the framework. Overall, from the developer perspective, I think it was a good presentation. Without any break we got a second presentation led by Uwe Schindler and Simon Willnauer under the intriguing title “Lucene Forecast: Version, Unicode, Flex and Modules” (slides). In my humble opinion I received enormous amount of information about the future of both Lucene and Solr. The presentation began by explaining the recent moves in the Lucene and Solr projects – the development merge. From now on the trunk for both projects is collective, same as the commiters. Further more the trunk is now a place where the development of the newest version of Lucene and Solr are kept. We also heard that the 4.0 version of Lucene wont be backwards compatible. Further information provided by speakers were no less interesting – plans to port the actual faceting mechanism to Lucene. Much time was devoted to discussing the changes associated with full support for Unicode (ICU module of Lucene) and then speakers went on to lead a very interesting topic for me, a Flexible Indexing. I`ll try to write something more about the ICU in the near future.

Below are the titles of two presentations of the second track. Unfortunately due to the fact that I did not take part in them, I can not say anything more about them:

“Use of Solr at Trovit, A Leading Search Engine For Classified Ads“, led by Mark Sturlese
“Implementing Solr in Online Media As An Alternative to Commercial Search Products“, led by Bo Raun

Magic of post-processing

After a lunch break I decided not to change the trac and I went to see presentation titled “Munching and Crunching: Lucene Index Post-processing” (slides) led by Andrzej Białecki and after the whole presentation I think it was the right choice. I decided that I wont write much about the topic he discussed because in my opinion, the amount of information deserves a separate post. He provided information main about the possibility of Lucene index post-processing like: separation, cleaning, filtering and strong the whole index in RAM memory. As I said, more on this topics in a separate entry. Meanwhile, on the second track, Peter Kiraly led a presentation titled “Bringing Solr to Drupal: A General, and a Library-Specific Use Case” (slides).

Solr and NLP

I decided to leave the first track and the presentation “Solr Schema Demystified” led by Uri Boness in favor of “Integration of Natural Language Processing tools with Solr” (slides) led by Joan Codina-Filbà. I though that the topic should be interesting for those seeking to integrate Solr with the linguistic analysis tools – to save the results returned by these tools and use them. The presented information included use of UIMA, the problems that the developers encountered and how they were resolved. Joan also mentioned how they used Carrot2 in their systems. The presenter also showed differences between stem and lemma, how they diagnosed which part of speech the term is and how this information was used for classification of positive or negative connotation of comments about the products. All this have been said in the context of Lucene and Solr and the use of payloads and without it. It`s a pity that it was discussed only in the context of the Spanish language, but hey, You can`t have everything right ?

Document processing and pipe-line

Due to the fact that early this day I heard a bit about “The Open Platform” made by the Guardian, I decided not to go and see a presentation titled “Solr in the Wild: The Guardian’s Open Content Platform API” led by Graham Tackley (slides) and instead listen to a speech led by Max Charas and Karl Jansson titled “Modular Document Processing for Solr/Lucene“(slides). Presentation in my opinion, was more about the capabilities of sponsors products, than something very insightful, especially comparing to the previous presentation. That was my impression after this presentation, but maybe it was the result of information overload and general tiredness.

Solr in IBM

After another shot break two further presentations began. Given the choice: “Make Your Domain Objects Searchable with Hibernate Search” (slides) led by Gustavo Fernandes and “Social and Network Discover (SaND) over Lucene” (slides) led by Shai Erera i chose the second one. I short, the presentation concentrated on the application created by IBM to search for information within the company, information like: documents, people, parties, etc. The presenter showed how search was implemented (what can we find), how to narrow the results (faceting, narrowing based on date, location or source), and how they presented the relationship between individuals or documents. In addition to search, we were showed a pretty interesting functionality – the graph of relationships (slide 25).

From FAST to Solr

Being tired already I choose the last presentation of the day, the “Key Topics When Migrating From FAST to Solr” (slides) led by John Høydahl over “Query by Document: When “More Like This” Is Insufficient” (slides) led by Dusan Omercevic. It was interesting to hear about some solutions that have been implemented in FAST that are not in Lucene/Solr projects. John Høydahl spoke quite interestingly, showed differences between the two technologies and how to handle the case of deficiencies of certain functionality, both with one side or the other. For the person who had no commercial experience with FAST, like me, it was interesting to know that most of the data processing in FAST is prepared in the stage of indexing – for example, sorting must be defined at the stage of indexing. Looking at the FAST can see, what else is missing in Solr – such as multilingual fields and pipeline. In the later part of the presentation we were shown how to make migration from FAST to Solr, of course, highly simplified, but very informative. Overall, a very good presentation in my opinion.

End of the first day

Then, after a 90 minut break, we had a five short presentations, less formal. Due to their nature, and that our thoughts were already at the Czech Beer Festival, I decided only to mention them:

“Social Media Scheduler based on Solr + Hadoop + Amazon EC2” (slides) led by Pablo Aragón
“Introduction to Collaborative Filtering using Mahout” (slides) led by Frank Scholten
“Enterprise Search meets Enterprise CMS – TYPO3 and Apache Solr” (slides) led by Olivier Dobberkau
“BM25 Scoring for Lucene – From Academia to Industry” (slides) led by Yuval Feinstein
“How We Scaled Solr to 3+ Billion Documents” (slides) led by Jason Rutherglen

Second day

Similar to the first day, the second day began with a short introduction by Grant Ingersoll.

So we start again

Just like in the case of the previous day, at the beginning we got two presentations: “Software Disruption: How Open Source, Search, Big Date and Cloud Technology are Disrupting IT” (slides), led by Zack Urlocker and “Solr 1.5 and Beyond” (slides) led by Yonik Seeley. The first presentation, due to its marketing target didn`t interest me much, which does not change the fact that it has been forwarded to give interesting facts about the state of development of the open source software. There were also predictions about the direction of software development and it`s turn to the processing in the cloud. Zak Urlocker is for sure a speaker with a wide knowledge of open source as one of the fathers of the success of the MySQL. The second presentation was from my point of view much more interesting, because of technical orientation. Yonik Seeley concentrated on several key themes for future of Solr – merge of developments, extended Dismax parser, integration with Zookeeper (so-called Solr Cloud), spatial search, field collapsing and NRE (near real time) search. In my opinion it is worth keep an eye on changes in Solr, since the future of the project appears very bright, if only everything that was mentioned during the presentation can be achieved and yet we know that not everything was told.

Grant Ingersoll about relevance

And again the conference split into two paths. It seemed to me, that I`ll benefit more by listening to Grant`s Ingersoll talk about “Practical Relevance” (slides) rather than “Building Multilingual Search Based Applications” (slides) led by Steve Kearn. I was not mistaken. Grant spoke about the improvement of search results quality – how to do it, how to analyze logs, how to do what can be drawn from the search statistics and what needs to be targeted during the process of quality improvement. He referred to the need to collect information from users, because they ultimately decide the success or failure of the application. There were also details how to quickly gain satisfactory results by adding phrase boosting. Approaching the end of the presentation, Grant talked about the advanced methods of influencing the so-called relevance, i.e. by developing own components responsible for counting the validity of the document. He also referred to the “Open Relevance” project – one of the projects of Lucene ecosystem.

Lucene Connectors Framework

I decided to stay in the same conference room and listen to Karl Wrigth, who spoke about Lucene Connectors Framework, in a presentation titled “Lucene Connectors Framework: An Introduction” (slides). At the same time conference participants could listen to Karel Braeckman presenting “Unlocking a Breadcaster Video Archive Using Solr” (slides). Presentation about LCF was mainly providing assumptions and architecture of the framework. We learned that the framework itself is currently in the process of migration from a commercial to open source form, so at this time a lot of functionality may not work, because some of them were based on commercial libraries, which for obvious reasons can not be used in a project open source. While the framework itself will certainly be interesting and useful tool to supplement Solr and Lucene with capabilities such as reading the various data sources (both by PULL and PUSH), the ability to deliver data periodically. Nice thing about Lucene Connectors Framework is it`s security model. In my opinion, at this point, the framework should be regarded as a curiosity, but I really hope this will change soon.

Solr + Zookeeper = Solr Cloud

After a lunch break and the free talks with the commiters of Lucene/Solr projects began the afternoon presentation session. Rested and hungry for information I went to see Mark Miller talking about “Solr in the Cloud” (slides). At the same time in the next room Tyler Tate and Stefan Olafsson talked about “The Path to Discovery: Facets and the Scent of Information” (slides). As you can guess the theme of the presentation it was about so-called Solr Cloud, which is a distributed instance of Solr farm managed using Zookeeper framework. The talk began with the presentation of master – slave architecture, and how this architecture looks like in the case of distributed index among many shards. Then we had a few words about the index replication – from shell scripts to new Java based mechanism. Then, Mark Miller, moved to describe the core assumptions that underlie the integration with the Zookeeper – centralized configuration, fault tolerance, the ability to automatically delete and add more Solr instances or support for checking the status of each instance. Then he discussed what has been done so far with the integration with Zookeeper and what more needs to be done. He also mentioned about what is planned for the future. In addition to matters related to the Zookeeper integration Mark Miller presented the new features in Solr 1.4, ie. LoadBalanced Solr Http Server. In conclusion, the presentation was very interesting, showing a further path of development Solr.

A moment of grace

When we talked with other participants of the conference and made some contacts, there were two consecutive presentations:

“Neural Networks, Newspapers and Solr – A short tour through through extending Solr for real-world text-classification” (slides), led by Sven Maurmann
“Rapid Prototyping” (slides), led by Eric Hatcher

Almost the end

The last presentation I watched at the conference was “Combating Information Overload – Search in Military Information gathering Systems” (slides), the speaker was Alexandra Larsson, Captain of the Swedish Air Force, but at first I went to hear about “European Language Analysis with Hunspell” led by Chris Male. After a while I switched to adjacent conference room and listen about how the Swedish military service is using Lucene and Solr in their systems. Because I didn`t listen any of the presentations from the beginning to the end I will not write anything about them. I`ll just say, that the Alexandra Larsson showed some pretty examples of how military is using Solr.

Summary

The last word said on the Lucene Eurocon 2010 were spoken by Grant Ingersoll, there were some books given, t-shirts and lottery for a pass to Lucene Revolution in Boston, something fun for the audience

To sum up the whole event I’m happy that I could participate. Interesting presentations, the sea of ideas and plans and to make sure the community is that Lucene and Solr are mature products that have not rested on its laurels, and despite the growing interest in their present shape there are peaople behind who care about it, so projects are not standing still. I hope that I will participate in the Lucene Eurocon 2011.

Conferences – Solr.pl

Explain 0.9.1 – New version

Explain.solr.pl beta version available

Design

Explain visualization

Available information

Information about query

Fields and values

Visualization of elements influencing total score value

Getting back to once visualized explain

Hiding your explain

Issues raising

Request

Something at the end

Lucene Eurocon 2011 – day two

Keynote (Realtime Search at Twitter, Michael Busch)

What will come (Lucene Today, Tomorrow & Beyond, Simon Willnauer)

A thing about explain in Solr (Understanding and Visualising Solr ‘explain’ Information, Rafał Kuć)

Randomized tests (Randomized Continous Testing: Solr & Lucene Use Case, Dawid Weiss)

How to test your search engine (Better Search Engine Testing, Eric Pugh)

Commiter panel

Other presentations

To sum up

explain.solr.pl: Status

Lucene Eurocon 2011 – day one

Keynote (Search + Big Data: It’s (still) All About the User”, Grant Ingersoll)

Keynote #2 (Architecting the Future of Big Data & Search, Eric Baldeschwieler)

The thing about the index structure (Portable Lucene Index Format & Applications, Andrzej Białecki)

First topic about Solr (Improving Solr’s Update Chain, Jan Høydahl)

Know what your users are doing (Search Analytics: Business Value & BigData NoSQL Backend, Otis Gospodnetic)

Solr and Hadoop (Scaling Search at Trovit with Solr & Hadoop, Marc Sturlese)

Solr and UIMA (Natural language search in Solr, Tommaso Teofili)

Here comes the new (Improved Search with Lucene 4, Robert Muir)

Lighting Talks

Stump the chump (Chris Hostetter)

End of the first day

Another Lucene Eurocon is a history

Lucene Eurocon 2010

The Begining

Tika and the future of Lucene

Magic of post-processing

Solr and NLP

Document processing and pipe-line

Solr in IBM

From FAST to Solr

End of the first day

Second day

So we start again

Grant Ingersoll about relevance

Lucene Connectors Framework

Solr + Zookeeper = Solr Cloud

A moment of grace

Summary