KinetX Aerospace — Providing Visionary, Highly Prized Engineering in Support of Space- and Earth-Based Endeavors

 

 

Software Engineering

Concept-Based Search

KinetX Aerospace has developed an application that semantically evaluates text such that it can be searched using a Natural Language (NL) Query.  Our application takes a particular set of text files, such as an author’s  novels, and produces a database called a Semantic Mart.  This mart can be queried to find fragments of text that match the query’s meaning.  The results are sorted into small clusters each containing a handful of fragments which are presented using a Fisheye view and two dimensional Semantic Map.

Why Natural Language Queries?

Historically, search engines use key words to find documents, returning anything that matches.  Given the explosive growth in information, different techniques have been developed to reduce the millions of results to a reasonable list that is practical to use.  For example, some web search engines return results that are found to be most popular on the grounds that this maximizes the likelihood of their being both precise and relevant.  A more recent approach has proposed adding semantic annotations to enable specialized query engines to retrieve and collate exact data from different sources of information.

In all these cases, retrieval precision requires the author and the person querying to agree on the keywords.  In real life, this seldom happens.  It has been found that for any given population of users, only 20% will agree on the meaning of a key word.  In reality, we each use words in different ways to describe the world as we perceive it.  Because of this, it has been said that words are understood by the company they keep and that company changes from person to person.

An alternative approach developed by KinetX Aerospace is to uncover the latent semantics within a document or a collection of documents belonging to a particular domain; Jane Austin's works (corpus) provide a simple example.  Jane Austen had her own way of using words to describe and interpret the world she knew.  Natural Language (NL) Queries are semantically matched with phrases and sentences within the corpus to retrieve fragments of text that are a match in meaning and content.  The application developed by KinetX Aerospace to do this is called kPOOL.

How Semantic Marts Work

The key to finding the right information is to know where to look, which is why kPOOL organizes information into Semantic Marts.  The terms “Warehouse” and “Mart” have been used in the world of information to make a distinction between the source of information (which is extremely large) and smaller subsets designed to meet specific needs.  For example, the World Wide Web is an Information Warehouse which cannot be effectively spanned by any one application.  As a consequence, search engines have to place limits on their search space and with millions of returns the best answer can never be assured.  Conversely, a Semantic Mart is something like a small portal where someone can focus on a specific domain such as Jane Austen or Shakespeare.

Using kPOOL, Information Warehouses are automatically broken into diverse Semantic Marts that capture the different views of the world.  If a Semantic Mart has not been chosen, the NL Query first identifies suitable marts for the user to search.  Then the user drills into one or more marts, reviewing retrieved information and learning in the process.  This is like going to a large mall, reading the floor plan and immediately homing onto the shop that specializes on what they want to buy.

Using Semantic Marts, the growing, diverse world of on-line information is broken down into chunks that can be easily searched in depth.

Where kPOOL Can Help

If you are looking for the nearest pizza shop, then use one of the many search engines available on the web.  Or, if you are looking for a house in a specific neighborhood and know the price range and floor space required, try a search engine that understands semantic tags.  However, when searching to learn what you do not know or when the words are vague or uncertain, kPOOL overcomes these problems by using the semantics latent in the text to match the NL Query’s meaning.  It is often the case that the retrieved text does not exactly match any of the query’s words but matches synonyms or special word use instead.  This type of search accelerates cognitive learning by retrieving meaningful information that other search engines miss.

Cognitive acceleration requires information retrieval that is both precise and relevant.  The use of semantic searches enables precision.  To enable relevance, kPOOL presents the results in clusters each containing a handful of text fragments using the Fisheye method.  Each cluster is titled using text fragments that best match the meaning of the query.  The user can rapidly review each cluster and expand those that best capture the intended meaning of the query.

This approach is further aided by providing two dimensional, hierarchically navigable cluster maps.  Each map locates a cluster’s documents based on the closeness of their meaning.  These maps are used when the initial search phrase is either too vague or too specific and the retrieved information found to have poor relevance.  Using cluster maps, the user finds documents and language that is more relevant to the intended meaning of the query.  As new language is learned the search can be refined to retrieve information that is both precise and relevant.

Using kPOOL, cognitive learning becomes a natural part of the search process going beyond the ordinary search engine to accelerate development of knowledge.

To Summarize

kPOOL understands how words are used in different contexts and semantically matches fragments of text within documents with the meaning of the NL Query to produce precise retrievals.  The unique clustering solution enables semantic maps to be used to improve retrieval relevance.  Using these techniques, cognitive acceleration becomes a natural part of the search process.  The consequent retrieved information is both relevant and precise.

Valid XHTML 1.0 Transitional