Thursday, March 20, 2008

TalkMine lets you talk to your documents !!!!

Distributed information sources pose problems in personalized and customized retrieval of information:

· Passive Environments. There is no genuine interaction between user and system, the former pulls information from a passive database and therefore needs to know how to query relevant information with appropriate keywords. Furthermore, such impersonal interfaces cannot respond to queries in a user-specific fashion because they do not keep user-specific information, or user profiles. The net result is that users must know in advance how to characterize the information they need before pulling it from the environment.
· Idle Structure. Structural relationships between documents, keywords, and information retrieval patterns are not utilized. Different kinds of structural relationships are available, but not typically used, for different DIS, e.g. citation structure in scientific library databases, the link structure in the WWW, the clustering of keyword relationships into different meanings of keywords, temporal patterns of retrieval, etc.
· Fixed Semantics. Keywords are initially provided by document authors (or publishers, librarians, and indexers), and do not necessarily reflect the evolving semantic expectations of users.
· Isolated Information Resources. No relationships are created or information is exchanged among documents and/or keywords in different information resources such as databases, web sites, etc. Each resource is accessed with a private set of keywords and query language.


The Solution !!!!!

TalkMine is an adaptive recommendation system which is both collaborative and content-based, and exploits currently untapped sources of information in DIS. In particular, it integrates information from the patterns of usage of groups of users, and also categorizes DIS (Distributed information sources) content or semantics in a manner relevant to those groups. It is currently being developed as a test bed environment for the Research Library at the Los Alamos National Laboratory, more specifically, for its Library without Walls project (1) under the Adaptive Recommendation Project (ARP).

Talkmine categorizes information sources based on keyword organizations. For a given information source, the system builds a network of keywords consisting of nodes and edges. Nodes represent keywords themselves, and edges represent the proximity of the keywords to other keywords based on the number of documents they share.
By using a conversationalist approach, the system finds out the users interests and models them into an evidence set. Using these evidence sets, the user’s interests in a particular information source can be quantified.

The system works as follows:
· User is presented with all the information sources he has at his disposal.
· User weighs these information sources based on his preference.
· User then inputs an initial keyword.
· System uses this keyword and its proximity with other keywords to build evidence set that emulates user’s interests in a particular information source. This evidence set is also called learned set.
· The uncertainty of this evidence set is calculated (fuzziness, nonspecificity, and conflict). If it below a certain value, the system stops (low uncertainty indicates that the evidence set clearly depicts the user’s tastes)
· The user is then presented with another keyword, selected based on its proximity with the first keyword.
· If user expresses interest in this keyword, another evidence set is formed for this keyword, and its union with the earlier learned set is performed to give a new learned set.
· Again, uncertainty for this new learned set is calculated and if less than a threshold value, the system stops.

Following this approach, the system builds a set of keywords which reflect the user preferences. The user is finally presented with the documents associated with these keywords. Each document consists of multiple keywords, and a network of these keywords represents the document semantics. The relationship between this keyword network and the learned set defines the relevance of the document the user.

The system also uses long-term usage characteristics to refine the keyword network. By associating relatively close keywords with documents that are not previously tagged with those keywords and by increasing and decreasing the distances between keywords based on how often they are correlated; the system refines the keyword network to accurately reflect the evolving user semantics.

System Architecture:

The architecture of TalkMine has both user-side and system-side components. Each user owns a browser (or plug-in to an existing Internet browser), which functions as a consolidated interface to all information resources searched. This individual browser stores user preferences and tracks information retrieval patterns and relationships which it utilizes to adapt to the user. User preferences are stored as a set of local knowledge contexts which the user has constructed while using the system under a set of different interests. These local knowledge contexts store both semantic semi-metric and structural proximity information.

Advantages:
· There is recommendation as the system pro-actively pushes relevant documents to users about related topics that they may have been unaware of. This is achieved because of the structural and semantic proximity information kept in the distributed memory (section 3.3.1), this integration with user-specific (also structural and semantic) information in the categorization process (section 3.3.2), and finally by the document retrieval operations (section 3.3.3).
· There is conversation between users and information resources and among information resources (and indirectly among users) as a mechanism to exchange or crossover knowledge among then is established. As categories are constructed with the question-answering process (section 3.3.2), a list of documents is produced (section 3.3.3) and communicated not only to users but also to information resources that did not contain them, and the semantics of all parties involved are adapted (section 3.3.4).
· There is creativity as new semantic and structural associations are set up by TalkMine. The categorization process brings together knowledge from the different contexts of the information resources. This not only adapts existing local semantics, but combines knowledge not locally available to individual information resources. In this sense, because of the conversation process, information resources gain new knowledge previously unavailable.

The researchers conclude that they have, through this system, established a human machine symbiosis which can be used in the automatic, adaptive, organization of knowledge in DIS.
Links:
1. Talkmine literature: http://informatics.indiana.edu/rocha/dl99.html
2. More on the project and its underlying logic: http://informatics.indiana.edu/rocha/ijhms_pask.html