Concept-Based Search
KinetX Aerospace has developed an application that semantically evaluates text such that it can be searched using a
Natural Language (NL) Query. Our application takes a particular set of text files, such as an
authors novels, and produces a database called a Semantic Mart. This mart can be queried to
find fragments of text that match the querys meaning. The results are sorted into small clusters
each containing a handful of fragments which are presented using a Fisheye view and two dimensional Semantic
Map.
Why Natural Language Queries?
Historically, search engines use key words to find documents, returning anything that matches. Given
the explosive growth in information, different techniques have been developed to reduce the millions of
results to a reasonable list that is practical to use. For example, some web search engines return
results that are found to be most popular on the grounds that this maximizes the likelihood of their being
both precise and relevant. A more recent approach has proposed adding semantic annotations to enable
specialized query engines to retrieve and collate exact data from different sources of information.
In all these cases, retrieval precision requires the author and the person querying to agree on the
keywords. In real life, this seldom happens. It has been found that for any given population of
users, only 20% will agree on the meaning of a key word. In reality, we each use words in different
ways to describe the world as we perceive it. Because of this, it has been said that words are
understood by the company they keep and that company changes from person to person.
An alternative approach developed by KinetX Aerospace is to uncover the latent semantics within a document or a
collection of documents belonging to a particular domain; Jane Austin's works (corpus) provide a simple
example. Jane Austen had her own way of using words to describe and interpret the world she
knew. Natural Language (NL) Queries are semantically matched with phrases and sentences within the
corpus to retrieve fragments of text that are a match in meaning and content. The application
developed by KinetX Aerospace to do this is called kPOOL.
How Semantic Marts Work
The key to finding the right information is to know where to look, which is why kPOOL organizes information
into Semantic Marts. The terms Warehouse and Mart have been used in the world
of information to make a distinction between the source of information (which is extremely large) and
smaller subsets designed to meet specific needs. For example, the World Wide Web is an Information
Warehouse which cannot be effectively spanned by any one application. As a consequence, search engines
have to place limits on their search space and with millions of returns the best answer can never be
assured. Conversely, a Semantic Mart is something like a small portal where someone can focus on a
specific domain such as Jane Austen or Shakespeare.
Using kPOOL, Information Warehouses are automatically broken into diverse Semantic Marts that capture the
different views of the world. If a Semantic Mart has not been chosen, the NL Query first identifies
suitable marts for the user to search. Then the user drills into one or more marts, reviewing
retrieved information and learning in the process. This is like going to a large mall, reading the
floor plan and immediately homing onto the shop that specializes on what they want to buy.
Using Semantic Marts, the growing, diverse world of on-line information is broken down into chunks that can
be easily searched in depth.
Where kPOOL Can Help
If you are looking for the nearest pizza shop, then use one of the many search engines available on the
web. Or, if you are looking for a house in a specific neighborhood and know the price range and floor
space required, try a search engine that understands semantic tags. However, when searching to learn
what you do not know or when the words are vague or uncertain, kPOOL overcomes these problems by using the
semantics latent in the text to match the NL Querys meaning. It is often the case that the
retrieved text does not exactly match any of the querys words but matches synonyms or special word use
instead. This type of search accelerates cognitive learning by retrieving meaningful information that
other search engines miss.
Cognitive acceleration requires information retrieval that is both precise and relevant. The use of
semantic searches enables precision. To enable relevance, kPOOL presents the results in clusters each
containing a handful of text fragments using the Fisheye method. Each cluster is titled using text
fragments that best match the meaning of the query. The user can rapidly review each cluster and
expand those that best capture the intended meaning of the query.
This approach is further aided by providing two dimensional, hierarchically navigable cluster maps.
Each map locates a clusters documents based on the closeness of their meaning. These maps are
used when the initial search phrase is either too vague or too specific and the retrieved information found
to have poor relevance. Using cluster maps, the user finds documents and language that is more
relevant to the intended meaning of the query. As new language is learned the search can be refined to
retrieve information that is both precise and relevant.
Using kPOOL, cognitive learning becomes a natural part of the search process going beyond the ordinary
search engine to accelerate development of knowledge.
To Summarize
kPOOL understands how words are used in different contexts and semantically matches fragments of text within
documents with the meaning of the NL Query to produce precise retrievals. The unique clustering
solution enables semantic maps to be used to improve retrieval relevance. Using these techniques,
cognitive acceleration becomes a natural part of the search process. The consequent retrieved
information is both relevant and precise.