Everyone agrees that MySQL alone isn’t a great solution for text searching. Whenever the need arises, I’ve always recommended jumping ship to sphinx. And sphinx is great – it works, it’s highly customizable, and it’s free. But it doesn’t really come for free, it has overhead. You have to manage your indexer cron jobs, hack up capistrano, and teach your developers how to use it. It also makes managing multiple environments (staging, local development, etc) a more tangled hairball. That’s all expensive in terms of time, management, communication, and complexity.
So consider the tradeoffs carefully. Sphinx is a much better hammer to the nail of text searching. But it takes a lot of work. For a recent project, we decided to revisit the idea of text searching within MySQL. Wat is the best we could do with MySQL alone, and would it be sufficient? It turned out good enough. Here’s what we learned:
InnoDB
You can’t search a text field without a decent index, and you can’t index a text field properly in an InnoDB table. MyISAM will work (and it’s generally faster than InnoDB), if you can handle table level locking. Basically, if you’re doing very, very few writes (INSERT, UPDATE, DELETE) and many of reads (SELECT), you may be able to tolerate this. Migrate to change table types and build an index:
1 2 3 4 |
# change the table type execute "ALTER TABLE `orders` ENGINE = MyISAM;" # create new index execute "CREATE FULLTEXT INDEX my_orders_text_index ON orders (text_field)" |
Google to read a lot more about mysql, MyISAM, and InnoDB.
boolean full-text search
I’ve seen simple_search.rb used before in rails projects. SimpleSearch uses the mysql LIKE string comparison function, which only works for text fields in Innodb tables, which are by definition not text index-able. So while this solution may appear to work, it could be unwise to build a production site using SimpleSearch.
If you are searching google-style (multiple words, fragments, etc), it’s likely you will use mysql’s boolean full-text search. For our app, we gsub’ed in the logic operands and did a named scope all at once:
1 2 |
named_scope :like_text, lambda { |str| {:conditions =>
["match(text_field) AGAINST(? IN BOOLEAN MODE)", str.gsub(/(\S+)/, '+\1*')] } }
|
config
You’ll also want to pay close attention to your MyISAM settings in my.cnf, as well as your stopwords files and word length setting. Example:
1 2 |
ft_min_word_len = 2 ft_stopword_file = |
An empty stopword file means no stopwords. Make sure you make mysql config changes on all of your db servers.
When all is done, you should be able to quickly search using your MyISAM indexed text field using a named scope:1 |
cheesey_sandwichy_orders = Order.like_name("sand chee")
|
Mmmm, tasty.
If Sphinx is required, there are several plugins to help Sphinx and Rails jive. We’ve used ultrasphinx with moderate success, though it’s limited in terms of flexibility. There’s also sphincter (snicker) and acts_as_sphinx.
Leave a Comment