SPA Conference session: Building Fast Search Application In a Day

One-line description:A hands on introduction to building a search application in a day
Session format: Long tutorial (330 mins) [read about the different session types]
Abstract:In this workshop you will build a search application, and integrate a number of exciting features into in (eg auto-complete, faceting, did you mean?) and see how you can optimise search to suit your needs.

We will be using Solr/Lucene. Solr joined the Apache incubator in 2006 and since then has rapidly grown in popularity and capability. Solr in a highly scalable, fast, open-sourced enterprise search platform from the Apache Lucene project.

Participants will learn how to: set up and configure Solr; import data from a SQL DB; and build a search solution. We will look at some of the more advanced features of Solr and how these can be used to enrich the search experience for your users. Also, we will look at performance tuning of queries and how you can use different forms of caching in Solr to improve search speed.

The session will be run using Java in Eclipse, but people can feel free to work in any language they like. If you wish to work in a language other than Java it may be best to work in a language for which a Client library already exists: Java, C#, Ruby, Perl, PHP, Python, Scala and JavaScript.

We have used Solr with Java and C#; in other languages we will not be able to offer as much assistance.
Audience background:Some experience developing web application.

If you don't want to run off a boot-able flash drive, please bring a laptop with Eclipse, MySQL and Tomcat pre-installed.
Benefits of participating:* Build a search solution from scratch
* An awareness of search problem and how to solve them
* Understand document DBs Pros and Cons
* How Solr fits into existing architectures
* Extended features of Solr and Lucene
Materials provided: - Dataset
- Boot-able flash drive with Ubuntu with Tomcat, Solr, MySQL.
- Document DBs vs RDBMS
- Solr search capabilities
- Solving search problems
Time for questions
Then working in pairs or individually:
- Check machines are set up
- Import RDBMs data into Solr
- Configure Solr
- Import data into Solr
- Write an API/App/Web-page to search the dataset
- Tuning Solr search performance
- Implement search suggestions/spelling corrections
- Look faceting, caching and feature such as EdgeNgram filtering
Final half hour to tidy up search products, ask questions and get feedback.
Detailed timetable:00:00 00:15 Set up machines & data
00:15 - 00:30 Introduction to Solr
00:30 - 00:45 Solr in infrastructure/architecture
00:45 - 01:00 Q&A
01:00 - 01:15 Import data into Solr
01:15 - 02:00 Build Basic Search Product
02:00 - 02:10 Spelling Correction explained
02:10 - 02:30 Add in Spelling correction
02:30 - 02:40 Faceting explained
02:40 - 03:20 Add in Faceting correction
03:20 - 03:30 break
03:30 - 04:15 Add in Search suggest/Auto complete
04:15 - 04:30 Tuning caches
04:30 - 05:00 Tuning Solr queries
05:00 - 05:30 Further Q&A and Feedback
Outputs:Search application with:
spelling corrections
1. James Atherton
2. Greg Sochanik 3.