redesigning lclark.edu

Robb Shecter

Try our search. You’ll like it.

This month we’ve started planning how to build the best search ever for Lewis & Clark.  And we’re strongly considering Google Site Search (GSS).  It’s an excellent service, but we’re not sure if it’s the right tool for the job.

I ran a few quick comparisons between GSS and Sphinx on one of my websites — a new online version of the Oregon Revised Statutes.  When I made this search feature, I went with Sphinx because I had a rich object model stored in a SQL database.  It wasn’t much work: excluding look & feel, I implemented the search in a fraction of one day.

Back to the comparison.  Off the bat, I found a few problems with Google’s results.  (NB: I’m not concerned here with differences in appearance, or the snippets.  I also verified that Google had indexed the pages I’d like it to find.)  Here are my site’s results for “robbery“:

And here are the GSS results:

The problems seen with Google Site Search in this small test

1. A problem of unwanted exclusion:  Robbery in the second degree is missing.    Notice also that the results are limited to one page, and the “very similar” rest can be seen after clicking the link.  Maybe Robbery 2 would appear there.

2. A problem of unwanted inclusion:  (I don’t care about the blog hits — those can be filtered out.)  Notice the appearance of 166.715 Definitions in the GSS results, but not OregonLaws.org’s.  This page is actually fairly irrelevant to robbery.  So why did Google rank it so high?

Google is solving a different problem than OregonLaws.org.  Google indexes web pages, but can’t know how important each one is.  And so, their innovation is to look at the number and quality of links to a page, and consider each one a “vote” for it.

My theory is that GSS ranks these Definitions pages high because so many other pages on the site link back to them.

But what about OregonLaws.org’s search?  How does it know to rank the Definition pages so low?  Easy.  When making the site, I know which pieces are important.  I don’t need to look at something as tangentially related as incoming links.  Take a look: here’s the algorithm I used to implement the search for ORS Sections:

 define_index do
    indexes title
    indexes body
    indexes number
    indexes annotations
    set_property :field_weights =>
      {”number” => 10, “title” => 6, “body” => 3, “annotations” => 2}
 end

This is the Ruby on Rails code.  It should be easy to see what’s going on.  My website assembles pages from “objects”.  One object type is “ORS Section”, which has attributes such as title, body, number, and annotations.  It has other attributes too, but I don’t want the search to account for them, so I left those out.  And finally, I’ve set relative weights for each of these fields which produce the relevant search results I want.

Epilog: Another small problem

Google has similar pages links.  OregonLaws.org has more like this.  Here are the results when clicking the respective similar/more link under Robbery first degree:

Filed Under

Make the whale happy; post a comment.

Wavemaker Cloud-based Studio

I’m not exactly sure what I can do with this, but it looks really cool.  (And is itself apparently a Wavemaker app.)Wavemaker Screenshot

Filed Under

Make the whale happy; post a comment.

Looking at Mac text editors (moving on from Emacs)

If this doesn’t qualify as “technical arcana”, what does?

The big thing that’s freed me up from using emacs is that I do all my development now on an Ubuntu VM running in VM Ware Fusion on my Mac Pro.  I installed the netatalk Ubuntu package which provides AFP file sharing, and now I can use a native Mac app to edit files.  So I want to see why people talk so much about TextMate.  I’ve been using it for a day, and already am really enjoying it.  It supports most emacs keystrokes, but it’s much easier to move around in a large project.

David, on the otherhand, tells me about Coda.

I’m doing a little looking around for comparisons — here are a couple recent on-point posts:

Filed Under

One person has made the whale happy; will you?(Go ahead, make a comment…)

Can an image-upload page be a work of art?

I think so, the more I think about Etsy’s.  I was making an upload page for a summer project of mine (Green Fabric), and the more I thought about how to design the page, the more complicated I found it:  How should everything be explained?  The image sizes, types, how thumbnails are cropped, etc., etc.  But look at Etsy’s page:

I love the simplicity and clarity of the process that’s laid out.  And the text at the very bottom — unbelievable!  No big long header ala “How to get best results when uploading images:” But simply,”Tips:“  Beautiful!  Perfect!  People know they’re there to upload images … why needlessly repeat those words?  And the sentence, “We’ll resize everything for you.“  Sublime.  Communicates several positive things simultaneously. And it saves paragraphs.

With copy that’s this succinct and on-point, it will actually stand a chance of being read.

Filed Under

One person has made the whale happy; will you?(Go ahead, make a comment…)

Wondering about “CSS Frameworks”.

I’ve known about projects like the YUI for a while, but this post at Smashing Magazine has made me consider trying one or two out.  Mostly for the cross-platform advantages: On a couple projects, I still find myself running into problems with Windows/IE browsers.

“Let’s take a look at the idea behind CSS Frameworks, their advantages and disadvantages, most popular CSS frameworks and dozens of default-stylesheets you can use designing a new web-site from scratch.

That article has a nice summary of the disadvantages as well as advantages, but this post, Please do not use CSS frameworks, goes further:

 ”At their surface, frameworks seem like a great thing; unfortunately, that’s not the case.

The biggest drawback I see is the reduction in semantic markup.  YUI-based HTML pages look like a total mess to the uninitiated.  Any insight?  Any strong feelings one way or another?

Filed Under

4 people have already made the whale happy; but who couldn’t be happier?(Go ahead, make a comment…)