Discussion:
Using Lucene for Semantic search
Chris Wildgoose
2006-07-20 18:19:50 UTC
Permalink
I have been working with Lucene for some time, and I have an interest in developing a Semantic Search solution. I was looking into extending lucene for this. I know this would involve some significant re-engineering of the indexing procedure to support the ability to assign words to nodes within an ontology. In addition the query would need to be modified. I was wondering whether anyone out there had gone down this path?


Chris
karl wettin
2006-07-20 20:00:57 UTC
Permalink
Post by Chris Wildgoose
I have been working with Lucene for some time, and I have an interest
in developing a Semantic Search solution. I was looking into extending
lucene for this. I know this would involve some significant
re-engineering of the indexing procedure to support the ability to
assign words to nodes within an ontology. In addition the query would
need to be modified. I was wondering whether anyone out there had gone
down this path?
I'm not sure what you mean, please do develop your paragraph a bit more.
You want to index an RDFS (or so) storage? Want to use Lucene as the
primary storage? Or perhaps you just want to classify your documents in
lots and lots of dimensions? Something else?
Chuck Williams
2006-07-21 05:36:59 UTC
Permalink
I have built such a system, although not with Lucene at the time. I
doubt you need to modify anything in Lucene to achieve this.

You may want to index words, stems and/or concepts from the ontology.
Concepts from the ontology may relate to words or phrases. Lucene's
token structure is flexible, supporting all of these. E.g., you can
create your own Analyzer that looks up words and phrases in your
ontology and then generates appropriate concept tokens that supplement
the word/stem tokens. Concept tokens can similarly span phrases.
Presuming you want some kind of word sense disambiguation through
context, you can either integrate your model into the Analyzer or create
a separate pre-processor.

The same Analyzer or a variant of it could be used to map the Query into
tokens to search. This would support concept-->concept searches, useful
for example in cross-language search.

Word sense disambiguation is generally more difficult in typically short
queries, so there are alternatives worth considering. E.g., you could
expand queries (or index tokens) into the full set of possibilities
(synonym words or concepts). If you have an a-priori or contextual
ranking of those possibilities, you can generate boosts in Lucene to
reflect that.

If all you want is ontologic search, there are your hooks. If you want
more sophisticated query transformations, e.g. for natural language Q&A,
you probably want a custom query pre-processor to generate the specific
queries you want.

Hope these thoughts are useful,

Chuck
Post by Chris Wildgoose
I have been working with Lucene for some time, and I have an interest in developing a Semantic Search solution. I was looking into extending lucene for this. I know this would involve some significant re-engineering of the indexing procedure to support the ability to assign words to nodes within an ontology. In addition the query would need to be modified. I was wondering whether anyone out there had gone down this path?
Chris
---------------------------------------------------------------------
Continue reading on narkive:
Loading...