Lucene Eurocon 2011 – day one

As we wrote a few days ago we are back from this year Lucene Eurocon, which took place in Barcelona. Despite the fact that the videos will be available shortly, we decided to wrote something about those presentations we attended to.

Keynote (Search + Big Data: It’s (still) All About the User”, Grant Ingersoll)

The first day started at 8:30, at the time of the day which is called a “night” by the Spanish 🙂 This year the conference was started by Grant Ingersoll with his presentation titled “Search + Big Data: It’s (still) All About the User” (slides, video). Grant reminded us that despite all the superb technologies, we deal with every day, we should always focus at the end user. Therefore, we should do everything to ensure that users are satisfied with the application we created. Grant showed that, despite the fact that developers have more and more technologically advanced tools, everything comes down to the same – the most important is still the user. You should definitely keep this in mind.

Keynote #2 (Architecting the Future of Big Data & Search, Eric Baldeschwieler)

The second presentation we attended to during the conference was “architecting the Future of Big Data & Search” by Eric Baldeschwieler (slides, video). During the presentation Eric tried to answer whether Lucene and Hadoop are able to work efficiently within a single system. At the beginning we were given a large dose of information on the Map/Reduce algorithm, Hadoop and HBase. The second part of the presentation was a case study of the implementation of the Apache Hadoop at Yahoo and how Map/Reduce helped in the implementation of daily tasks of the system. At the very end of the we were given information about Lucene and Hadoop integration. Lots of interesting information, I recommend to watch the movie as soon as it gets published.

The thing about the index structure (Portable Lucene Index Format & Applications, Andrzej Białecki)

At this time the conference was split into three parallel sessions and we decided to go and listen to Andrzej Bialecki and his presentation titled “Lucene Index Portable Format & Applications” (slides, video). Andrzej started with information about the structure of Apache Lucene index and why backward compatibility is difficult to achieve. Then, the presentation focused on its title, which is why portable Lucene index format is needed, what are the challenges and what is the state of implementation of PortableCodec. Attendees could also see an interesting example of how to use the SimpleTextCodec and how the index data are stored in plain text:) I recommend the video to watch the whole presentation.

First topic about Solr (Improving Solr’s Update Chain, Jan Høydahl)

We changed the conference room to listen to Jan Høydahl talking about “Improving Solr’s Update Chain” (slides). During the presentation Jan showed how to define your own components in the Solr update chain. It turns out that a simple modification of the XML file allows you to expand and adapt the process of indexing the data for your own needs. During the presentation we were also shown how to implement your own update chain components that will enable you to influence the indexation process. A large part of the presentation was the case-study of the implementation of a project for the University of Oslo. At the end of the presentation we were given information about the plans and the functionalities that have been donated to the Apache Software Foundation as patches to Solr.

Know what your users are doing (Search Analytics: Business Value & BigData NoSQL Backend, Otis Gospodnetic)

Another presentation focused on what should be one of the most esential things in all search systems – the user, presented by Otis Gospodnetic under the title “Search Analytics: Business Value & BigData NoSQL backend” (slides, video). If you are using Solr you should look at what Sematext has to offer and what’s more – completely for free (at least for now). During the presentation Otis told the audience that the analysis of user behavior and how users use your system, is not an option – it’s necessary. Without this it is impossible to tune and refine the search, because I do not know how, nor did we know where. Additionally, without the data about user behavior, we can not say whether the changes that we do are going in the right direction. I highly recommend watching the video.

Solr and Hadoop (Scaling Search at Trovit with Solr & Hadoop, Marc Sturlese)

After a short break we decided to listen about integration of Solr and Hadoop during the presentation “Scaling Search with Solr at Trovit & Hadoop” (slides), led by Marc Sturlese. The author focused on how they managed to disperse the indexation based using the Map/Reduce algorithm. In short, the presentation was about how pretty efficiently, using the data stored in HDFS, create smaller indexes, and then combine them into larger indices (or a large index, depending on your needs). Additionally, Marc showed what they did to take advantage of the functionality of Solr (data analysis and deduplication) not running Solr itself.

Solr and UIMA (Natural language search in Solr, Tommaso Teofili)

The next presentation in which we participated was “Natural language search in Solr” (slides) led by Tommaso Teofili. The presentation was about usage of Apache Solr with conjunction of UIMA and a case-study of a real life project. A bit of technical detail, some performance comparisons and information about the necessity of system learning. Interesting topic and very interesting presentation. I recommend to watch the video as soon as it gets published.

Here comes the new (Improved Search with Lucene 4, Robert Muir)

The last presentation scheduled for the first day of Lucene Eurocon 2011 was “Improved Search with Lucene 4” (slides, video), led by Robert Muir. He talked about the changes that await the Apache Lucene in version 4.0. We also heard a lot about the performance of Lucene 4.0, changes in the API, NRT, and deep paging support. An interesting technical presentation about what we can expect from the upcoming version of Lucene. Persons interested in this subject should refer to the video.

Lighting Talks

Then again, the conference has been combined into a single session which was called “Lighting Talks”. During the session presenters had about 7 minutes for their talk. The talk that I remember the most (because of the large dose of humor) was “Java 7 and Lucene: the story behind the story” by Uwe Schindler which in my opinion should be rather called “Do not use Java 7 for anything” 🙂 A sincere congratulations for amusing the audience 😉 The lighting talks session consisted of the following presentation:

“Morphological Analysis and Named Entity Recognition for your Lucene/Solr Search Applications” (slides) led by Christoph Goller
“Java 7 and Lucene: the story behind the story” (slides) led by Uwe Schindler
“Navigating Subdocuments with Solr” (slides) led by Mikhail Khludnev
“Powered by Lucene: IBM Content Analytics with Enterprise Search” (slides) led by Wolfgang Jung
“Solr performance monitoring” led by Otis Gospodnetic
“Searching in more than 140 years newspaper articles” (slides) led by Nicola Provenzano

Stump the chump (Chris Hostetter)

The last session for the day was fun and it was possible to win a few Euros if you asked a good question (100, 50 and 25 as I recall). The whole session was about asking a question that Chris (aka Hoss) would not been able to answer. Summarizing this part does not make sense, I recommend the video which is available at the following URL: http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011/apache-lucene-eurocon-2011-stump-chump.

End of the first day

And that was all about the first day of the conference. Some of the participants then went to FC Barcelona vs FC Viktoria Plzeň, a small part of us went in the direction of their hotels, and all the rest, including us, went to Shoko Club for some cava and something to eat.

Solr.pl