Selected Projects
|
One of biggest problems faced by knowledge workers is information overload. Various information filtering and routing systems have been designed to address this issue. Two key components in these systems are: a) a user query that represents a user’s information need; and b) a ranking function that is used for comparing incoming documents against that user query. My dissertation integrates both of these two components by developing two-stage model of information seeking. First, I applied a query construction method to infer and represent a user's persistent interest as a persistent query. Second, I used an inductive learning technique called Genetic Programming to learn the personalized or context-specific ranking function for that persistent query. By effective usage of the two-stage model, the routing performance can be dramatically improved over other competitive models.
This line of research analyzes the relationships among documents on the Internet to discover new knowledge. For example, medical documents may be analyzed in an effort to see how the published literature can be used to suggest new treatments for certain illnesses.
I am currently also affiliated with the CLAIR group (headed by Prof. Dragomir R. Radev at the University of Michigan), which is developing advanced text mining techniques for text summarization applications and evaluating the capabilities of search engines to answer natural language questions.
commKnowledge (http://www.commknowledge.ws) is a web-based global knowledge management system that solicits and automatically evaluates and organizes contributions from participants from around the world. Various advanced algorithms and strategies have been deployed in the system to improve the system’s usability and scalability. We are going to report the experience in developing this system and perform usability studies after the system is fully functioning.
I have worked with Prof. Stuart Madnick (MIT) and Prof. Hongjun Lu (NUS and HKUST) on one large project on integrating financial data using data mining techniques. Because the same data can be represented in different formats (for example, Price can be represented in two different currencies in two different database systems), data value conversion rules are proposed and robust regression techniques are used to uncover such value conversion rules.
|