4.07.2008

Do we even know what the needle looks like?

Different kinds of searches require different approaches. Koll (1999) identified the following types of searches:

  • A known needle in a known haystack
  • A known needle in an unknown haystack
  • An unknown needle in an unknown haystack
  • Any needle in a haystack
  • The sharpest needle in a haystack
  • Most of the sharpest needles in a haystack
  • All the needles in a haystack
  • Affirmation of no needles in a haystack
  • Things like needles in any haystack
  • Let me know whenever a new needle shows up
  • Where are the haystacks?
  • Needles, haystacks -- whatever

In the case of the unknowns, how can we describe the needle, or even find the haystack?

One of the biggest challenges of information retrieval is knowing what information is out there, and what we are missing - the information that we do not even think to look for because we have no idea that it exists. This is the “fundamental paradox of information retrieval” described by Hjerppe: “the need to describe that which you do not know in order to find it” (Hjerppe in Hildreth, 1989, p.189).

On the Web, term match, rather than concept match, is the norm. Keyword-based retrieval systems work by finding matches, which means that the searcher must guess at the terms used by the documents’ creators. This can prevent us from retrieving similar items if their creators used a different vocabulary, or from finding related items whose terms we cannot know beforehand.

If the Semantic Web can deliver what it has been teasing us with, search systems in the future will be able to understand the conceptual content of documents. But until that time, we’ll still approach some haystacks with our best guess at what the needle looks like.

References

Hildreth, C. R. (1989). Appropriate user interfaces for subject searching in bibliographic retrieval systems. Bookmark, 48(3), 186-193.

Koll, M. (1999, September 27). Major trends and issues in the information industry. Keynote address at the Association of Information and Dissemination Center (ASIDIC) Fall Meeting, Baltimore, MD. Retrieved April 6, 2008, from http://www.asidic.org/meetings/newsletters/archive/xml/fal1999n78.xml

3 comments:

Bridget Gay said...

I enjoyed your article.
There is a lot of research being done right now on "bad searches". Searches that may produce a great deal of data, but not what the searcher is seeking for. Even back in his famous "As we may think" speech, Vannevar Bush talks about information overload, indicating that even at that time, it was an issue of concern. I have to give the google creators credit for attempting to do the best they can with this issue.

Ken said...

Great analogy to bring the problems of IR into perspective.

Stacy Davis said...

I find myself in an interesting position sometimes when I am at the reference desk. For example, an individual will come up to the desk without a clear idea of what type of information they need. It is my job to figure out what they are looking for and what sources would have the best information. Yes, and sometimes I do not even know what the needle looks like.