KDD 2010 Banner
KDD-2010 Program Schedule
Click here to access the official KDD-2010 Social Networking Interactive Schedule. This will allow you to click to add talks to your conference schedule calendar using Outlook or other popular calendar programs, leave notes, ask questions to presenters, and engage in other ways! If you would like a quick look at the schedule, you can use the summary below.

Saturday, July 24  
9:00AM - 5:00PM Workshop 1: Mining and Learning with Graphs Workshop 2010 (MLG-2010) Jefferson
7:30PM - 9:00PM Registration Foyer, Independence Foyer
Sunday, July 25  
7:30AM - 8:00PM Registration Foyer, Independence Foyer
8:00AM-6:00PM Exhibits Independence Center B
9:00AM - 12:00PM Workshop 1: Mining and Learning with Graphs Workshop 2010 (MLG-2010) Potomac 1
Workshop 2: Large-scale Data Mining: Theory and Applications (LDMTA-2010) Potomac 2
Workshop 3: Useful Patterns (UP) Potomac 3
Workshop 4: Social Media Analytics (SOMA 2010) Potomac 4
Tutorial 1: Large-scale Data Mining: MapReduce and Beyond Regency E
Tutorial 2: New Developments in the Theory of Clustering Regency F
Tutorial 3: Temporal Pattern Mining Potomac 5
Tutorial 4: Learning through Exploration Potomac 6
Tutorial 5: Geometric Tools for Graph Mining of Large Social and Information Networks Tidewater 2
Tutorial 8: Mining Web Search and Browse Logs Regency F
Workshop 5: KDD Cup 2010: Improving Cognitive Models with Educational Data Mining Roosevelt
Workshop 6: 9th International Workshop on Data Mining in Bioinformatics (BIOKDD10) Lincoln
Workshop 7: Tenth International Workshop on Multimedia Data Mining (MDMKDD 2010) Arlington
Workshop 8: The Fourth International Workshop on Data Mining and Audience Intelligence for Online Advertising (ADKDD'10) Prince William
Workshop 9: Human Computation Workshop (HCOMP 2010) Fairfax
10:00AM-10:30AM Coffee Break Foyer, Ballroom 1, AV Wall
12:00PM-2:00PM Lunch Independence Center
2:00PM - 5:30PM Tutorial 7: Introduction to Graphical Models for Data Mining Regency E
Tutorial 6: Privacy-aware Data Mining in Information Networks Kennedy
Tutorial 9: Mining Heterogeneous Information Networks Potomac 5
Tutorial 10: Outlier Detection Techniques Potomac 6
Tutorial 11: Recommender Problems for Web Applications Tidewater 2
Tutorial 12: Indexing and Mining Time Sequences Kennedy
Workshop 10: Intelligence and Security Informatics (ISI-KDD) Roosevelt
Workshop 11: The 4th International Workshop on Knowledge Discovery from Sensor Data (SensorKDD) Lincoln
Workshop 12: The 4th SNA-KDD Workshop on Social Network Mining and Analysis (SNAKDD 2010) Arlington
Workshop 13: Novel Data Stream Pattern Mining Techniques (StreamKDD) Prince William
Workshop 14: Discovering, Summarizing and Using Multiple Clusterings (MultiClust) Fairfax
Workshop 1: Mining and Learning with Graphs Workshop 2010 (MLG-2010) Potomac 1
Workshop 2: Large-scale Data Mining: Theory and Applications (LDMTA-2010) Potomac 2
Workshop 3: Useful Patterns (UP) Potomac 3
Workshop 4: Social Media Analytics (SOMA 2010) Potomac 4
3:00PM-3:30PM Coffee Break AV Wall
6:00PM-6:15PM Opening Remarks Ballroom, Regency EF CTR
6:15PM-6:45PM Award Presentations Ballroom, Regency EF CTR
6:45PM-7:45PM Innovation Award Talk (Christos Faloutsos) Ballroom, Regency EF CTR
Monday, July 26  
7:30AM-8:00PM Registration Independence Foyer
8:00AM-6:00PM Exhibits Independence Center B
7:30AM-9:00AM Continental Breakfast AV Wall
9:00AM-10:00AM Plenary Invited Talk: Data Mining in the Online Services Industry Regency EF CTR
10:00AM-10:30AM Coffee Break AV Wall
10:30AM-10:50AM Mining Medical Data to Improve Patient Outcomes (DMCS 2005 and 2009 Winner) Roosevelt
Grafting-Light: Fast, Incremental Feature Selection and Structure Learning of Markov Random Fields Independence Center A
Mining Advisor-Advisee Relationships from Research Publication Networks Regency E
UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Regency F
Versatile Publishing For Privacy Preservation Potomac 3+4
10:50AM-11:10AM Mining Medical Data to Improve Patient Outcomes (DMCS 2005 and 2009 Winner) Roosevelt
A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques Independence Center A
Estimating Rates of Rare Events with Multiple Hierarchies through Scalable Log-linear Models Regency E
Frequent Regular Itemset Mining Regency F
Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Potomac 3+4
11:10AM-11:30AM Interactive Data Mining and its Business Applications (Accenture Technology Labs) Roosevelt
An Efficient Algorithm for a Class of Fused Lasso Problems Independence Center A
Mining Uncertain Data with Probabilistic Guarantees Regency F
On the Quality of Inferring Interests From Social Neighbors Potomac 3+4
User Browsing Models: Relevance versus Examination Regency E
11:30AM-11:50AM Modeling with networked data Roosevelt
DUST: A Generalized Notion of Similarity between Uncertain Time Series Potomac 3+4
Mining Top-K Frequent Items in a Data Stream with Flexible Sliding Windows Regency F
Suggesting Friends Using the Implicit Social Graph Regency E
Unsupervised Feature Selection for Multi-Cluster Data Independence Center A
11:40AM-11:50AM Cold Start Link Prediction Potomac 3+4
11:50AM-12:00PM Modeling with networked data Roosevelt
Feature Selection for Support Vector Regression Using Probabilistic Prediction Independence Center A
New Perspectives and Methods in Link Prediction Regency E
Probably the Best Itemsets Regency F
12:40PM-1:40PM Conference Lunch Independence Center
1:00PM-1:20PM KDD Cup Awards Presentation (at the Conference Lunch) Independence Center
1:20PM-1:50PM Dissertation Awards Lectures (at the Conference lunch) Independence Center
2:00PM-2:20PM Discovering Precursors to Aviation Safety Incidents: from Massive Data to Actionable Information Roosevelt
Discovering Significant Relaxed Order-Preserving Submatrices Regency F
Fast Nearest-neighbor Search in Disk-resident Graphs Potomac 3+4
k-Support Anonymity Based on Pseudo Taxonomy for Outsourcing of Frequent Itemset Mining Independence Center A
Learning with Cost Intervals Regency E
2:20PM-2:40PM What's in your (customer's) Wallet? (DMCS 2005 Prize winner, Edelman Prize winner) Roosevelt
Balanced Allocation with Succinct Representation Potomac 3+4
Collusion-Resistant Privacy-Preserving Data Mining Independence Center A
The New Iris Data : Modular Data Generators Regency E
Topic Dynamics: An Alternative Model of Bursts" in Streams of Topics" Regency F
2:40PM-3:00PM Text Mining to Fast-Track Deserving Disability Applicants Roosevelt
Data Mining with Differential Privacy Independence Center A
Extracting Temporal Signatures for Comprehending Systems Biology Models Regency F
Neighbor Query Friendly Compression of Social Networks Potomac 3+4
Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance Regency E
3:00PM-3:20PM (Privacy friendly!) Social Network Targeting for Online Advertising Roosevelt
Discovering Frequent Patterns in Sensitive Data Independence Center A
Negative Correlations in Collaboration:Concepts and Algorithms Regency F
3:10PM-3:20PM Parallel SimRank Computation on Large Graphs with Iterative Aggregation Potomac 3+4
Dynamics of Conversations Potomac 3+4
3:35PM-4:00PM Coffee Break AV Wall
4:00PM-4:20PM Discriminative Topic Modeling based on Manifold Learning Potomac 3+4
Evaluating Online Ad Campaigns in a Pipeline: Causal Models At Scale Independence Center A
Fast Euclidean Minimum Spanning Tree: Algorithm, Analysis, and Applications Regency F
Flexible Constrained Spectral Clustering Regency E
4:20PM-4:40PM A Hierarchical Information Theoretic Technique for the Discovery of Non Linear Alternative Clusterings Regency E
Mining Program Workflow from Interleaved Traces Regency F
Online Multiscale Dynamic Topic Models Potomac 3+4
Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (KDD-2010 Best Application Honorable Mention) Independence Center A
4:30PM-4:40PM Exploitation and Exploration in a Performance based Contextual Advertising System Independence Center A
4:40PM-5:00PM Clustering by Synchronization Regency E
Connecting the Dots Between News Articles (KDD-2010 Best Research Paper Innovative Contribution) Regency F
MineFleet®*: An Overview of a Widely Adopted Distributed Vehicle Performance Data Mining System Independence Center A
Topic Models with Power-Law Using Pitman-Yor Process Potomac 3+4
5:00PM-5:10PM Discovering Frequent Subgraphs over Uncertain Graph Databases under Probabilistic Semantics Regency F
Multiple Kernel Learning for Heterogeneous Anomaly Detection: Algorithm and Aviation Safety Case Study Independence Center A
5:00PM-5:20PM The Topic-Perspective Model for Social Tagging Systems Potomac 3+4
Unifying Dependent Clustering and Disparate Clustering for Non-homogeneous Data Regency E
5:10PM-5:20PM Boosting with Structure Information in the Functional Space: an Application to Graph Classification Regency F
5:30PM-7:30PM Poster Reception I Independence Center B
8:00PM-9:30PM Dinner Independence Center B
Tuesday, July 27  
7:30AM-8:00PM Registration Independence Foyer
8:00AM-6:00PM Exhibits Independence Center B
7:30AM-9:00AM Continental Breakfast AV Wall)
9:00AM-10:00AM Plenary Talk: Computational Social Science Regency EF CTR
10:00AM-10:30AM Coffee Break AV Wall
10:30AM-10:50AM Combining Predictions for Accurate Recommender Systems Regency E
Discovery of Significant Emerging Trends Potomac 3+4
Learning to Combine Discriminative Classifiers Regency F
Semi-supervised Feature Selection for Graph Classification Independence Center A
10:50AM-11:10AM Data Mining to Predict and Prevent Errors in Health Insurance Claims Processing Potomac 3+4
Fast Online Learning through Offline Initialization for Time-sensitive Recommendation Regency E
Mining Positive and Negative Patterns for Relevance Feature Discovery Regency F
Modeling Relational Events via Latent Classes Independence Center A
11:10AM-11:30AM Document Clustering via Dirichlet Process Mixture Model with Feature Selection Regency F
On Community Outliers and their Efficient Detection in Information Networks Independence Center A
Optimizing Debt Collections Using Constrained Reinforcement Learning (KDD-2010 Best Application Paper) Potomac 3+4
Training and Testing of Recommender Systems on Data Missing Not at Random Regency E
11:30AM-11:40AM Detecting Abnormal Coupled Sequences and Sequence Changes in Group-based Manipulative Trading Behaviors Potomac 3+4
Semantic Relation Extraction With Kernels Over Typed Dependency Trees Regency F
Temporal Recommendation on Graphs via Long- and Short-term Preference Fusion Regency E
11:30AM-11:50AM Redefining Class Definitions using Constraint-Based Clustering : An Application to Remote Sensing of the Earth's Surface Independence Center A
11:40AM-11:50AM Generative Models for Ticket Resolution in Expert Networks Regency E
Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach Regency F
12:30PM-1:30PM SIGKDD Business Lunch Independence Center
1:00PM-2:00PM Invited Talk Independence Center
2:35PM-2:55PM Fast Query Execution for Retrieval Models Based on Path-Constrained Random Walks Regency F
Large Linear Classification When Data Cannot Fit In Memory (KDD-2010 Best Research Paper - Technical Contribution) Regency E
PET: A Statistical Model for Popular Events Tracking in Social Communities Independence Center A
2:45PM-4:05PM The Next Generation of Transportation Systems,Greenhouse Emissions, and Data Mining Potomac 3+4
2:55PM-3:15PM Class-Specific Error Bounds for Ensemble Classifiers Regency E
The community-search problem and how to plan a successful cocktail party Independence Center A
Trust Network Inference for Online Rating Data Using Generative Models Regency F
The Next Generation of Transportation Systems,Greenhouse Emissions, and Data Mining (continued 2:45pm - 4:05pm) Potomac 3+4
3:15PM-3:35PM An Energy-Efficient Mobile Recommender System Regency F
Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost Regency E
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Independence Center A
The Next Generation of Transportation Systems,Greenhouse Emissions, and Data Mining (continued 2:45pm - 4:05pm) Potomac 3+4
3:35PM-3:45PM A Probabilistic Model for Personalized Tag Prediction Independence Center A
Direct Mining of Discriminative Patterns for Classifying Uncertain Data Regency E
Mixture Models for Learning Low-dimensional Roles in High-dimensional Data Regency F
The Next Generation of Transportation Systems,Greenhouse Emissions, and Data Mining (continued 2:45pm - 4:05pm) Potomac 3+4
3:45PM-3:55PM BioSnowball: Automated Population of Wikis Independence Center A
Ensemble Pruning via Individual Contribution Ordering Regency E
Towards Mobility-based Clustering Regency F
The Next Generation of Transportation Systems,Greenhouse Emissions, and Data Mining (continued 2:45pm - 4:05pm) Potomac 3+4
4:05PM-4:25PM Coffee Break AV Wall
4:25PM-4:45PM Automatic Malware Categorization Using Cluster Ensemble Independence Center A
Combined Regression and Ranking Regency E
Inferring Networks of Diffusion and Influence Regency F
4:45PM-4:55PM Beyond Heuristics: Learning to Classify Vulnerabilities and Predict Exploits Independence Center A
4:45PM-5:05PM Mass Estimation and Its Applications Regency E
Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks Regency F
4:55PM-5:05PM Diagnosing Memory Leaks using Graph Mining on Heap Dumps Independence Center A
5:05PM-5:15PM Community-based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks Regency F
5:05PM-5:25PM Multi-Label Learning by Exploiting Label Dependency Regency E
Using Data Mining Techniques to Address Critical Information Exchange Needs in Disaster Affected Public-Private Networks Independence Center A
5:15PM-5:25PM DivRank: the Interplay of Prestige and Diversity in Information Networks Regency E
Social Action Tracking via Noise Tolerant Time-varying Factor Graphs Regency F
5:25PM-5:35PM Finding Effectors in Social Networks Regency F
Tropical Cyclone Event Sequence Similarity Search via Dimensionality Reduction and Metric Learning Independence Center A
5:45PM-6:30PM SIGKDD Transfer Meeting (SIGKDD 2010 / 2011 organizers only) Regency EF CTR
5:45PM-8:00PM Poster Reception II & Demo Session Independence Center B
5:45PM-8:00PM Small Appetizer Buffet Independence Center A
Wednesday, July 28  
7:30AM-9:00PM Registration Independence Foyer
8:00AM-12:30PM Exhibits Independence Center B
7:30AM-9:00AM Continental Breakfast AV Wall
9:00AM-10:00AM Plenary Invited Talk: The quantification of advertising and lessons from building a business based on large scale data mining Regency EF CTR
10:00AM-10:30AM Coffee Break AV Wall
10:30AM-10:50AM An Efficient Causal Discovery Algorithm for Linear Models Regency F
GLS-SOD: A Generalized Local Statistical Approach for Spatial Outlier Detection Regency E
MalStone: Towards a Benchmark for Analytics on Large Data Clouds Independence Center A
Unsupervised Transfer Classification: Application to Text Categorization Potomac 3+4
10:50AM-11:10AM Compressed Fisher Linear Discriminant Analysis: Classification of Randomly Projected Data Regency F
Evolutionary Hierarchical Dirichlet Processes for Multiple Correlated Time-varying Corpora Regency E
Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval Potomac 3+4
TIARA: A Visual Exploratory Text Analytic System Independence Center A
11:10AM-11:30AM Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks Potomac 3+4
Metric Forensics: A Multi-Level Approach for Mining Volatile Graphs Independence Center A
Online Discovery and Maintenance of Time Series Motifs Regency E
Scalable Similarity Search with Optimized Kernel Hashing Regency F
11:30AM-11:40AM Active Learning for Biomedical Citation Screening Independence Center A
11:30AM-11:50AM Mining Periodic Behaviors for Moving Objects Regency E
Multi-Task Learning for Boosting with Application to Web Search Ranking Potomac 3+4
Semi-Supervised Sparse Metric Learning Using Alternating Linearization Optimization Regency F
11:40AM-11:50AM An integrated machine learning approach to stroke prediction Independence Center A
11:50AM-12:00PM Medical Coding Classification by Leveraging Inter-Code Relationships Independence Center A
Transfer Metric Learning by Learning Task Relationships Potomac 3+4
Universal Multi-Dimensional Scaling Regency F
12:10PM-12:30PM Closing Remarks Regency EF CTR

Plenary Invited Talks

Online Services Division Strategy Overview

Qi Lu, President of Online Services Division, Microsoft

Abstract The online services industry is a rapidly growing industry with a worldwide online ad market projected to grow from $48 billion in 2011 to $67 billion in 2013, of which 47% will come from display advertising and 53% from search advertising. Online Services Division (OSD) within Microsoft is a leader in the consumer cloud space today with a strong portfolio of a set of 3 mutually reinforcing businesses: Search, Portal, Advertising. They are supported by a shared foundational asset of Intent & Knowledge Stores and a shared technology platform supporting large scale data and high performance systems. MSN (Portal) and Bing (Search) generate the content, traffic and data, that make for an exciting fertile environment for large scale data mining practice and system development. Our advertisers are thus given more valuable targeting opportunities and better ROI, which in turn, provide better economics, usability data, and allows for a higher quality services for our advertisers and experience for our users. The ability to transform data into meaningful, actionable insight is an important source of competitive advantage for OSD. The data mining initiatives within the division continue to strive for excellence around the following goals: actionable insights through deep data analysis, data mining and data modeling at scale and with speed, increased productivity from deployed large scale data systems and tools, improved product and service development and decision making gained from effective measurement and experimentation, and a mature data culture in product teams that made the above possible. With many technical and data challenges ahead of us, we are committed to utilizing our huge data asset well to understand the need, intent, and behavior of our users for the purpose of serving them better.

Bio As president of Microsoft's Online Services Division (OSD), Dr. Qi Lu leads the company's search and online advertising efforts. Dr. Lu oversees the OSD Research & Development team which has responsibility for the evolution of Microsoft's search, portal and advertising services; the Online Audience Business Group; and the Advertiser and Publisher Solutions Business Group. Dr. Lu reports to Microsoft chief executive officer Steve Ballmer. Prior to joining Microsoft, Dr. Lu spent 10 years as a Yahoo! senior executive. His roles included serving as the executive vice president of engineering for the company's Search and Advertising Technology Group where he oversaw the development of Yahoo!'s Web search and monetization platforms and vice president of engineering responsible for the technology development of Yahoo!'s search, e-commerce and local listings of businesses and products. Before joining Yahoo!, Dr. Lu worked as a research staff member at IBM's Almaden Research Center and Carnegie Mellon University and was a faculty member at Fudan University in China. He received his bachelor of science and master of science in computer science from Fudan University and his Ph.D. in computer science from Carnegie Mellon University. Dr. Lu holds 20 U.S. patents.

Computational Social Science

David Jensen, Department of Computer Science, University of Massachusetts Amherst

Abstract Research and applications in knowledge discovery and data mining increasingly address some of the most fundamental questions of social science: What determines the structure and behavior of social networks? What influences consumer and voter preferences? How does participation in social systems affect behaviors such as fraud, technology adoption, or resource allocation? Often for the first time, these questions are being examined by analyzing massive data sets that record the behavior and interactions of individuals in physical and virtual worlds.

A new kind of scientific endeavor - computational social science - is emerging at the intersection of social science and computer science. The field draws from a rich base of existing theory from psychology, sociology, economics, and other social sciences, as well as from the formal languages and algorithms of computer science. The result is an unprecedented opportunity to revolutionize the social sciences, expand the reach and impact of computer science, and enable decision-makers to understand the complex systems and social interactions that we must manage in order to address fundamental challenges of economic welfare, energy production, sustainability, health care, education, and crime.

Computational social science suggests an impressive array of new tasks and technical challenges to researchers and practitioners of KDD. These include modeling complex systems with temporal, spatial, and relational dependence; identifying cause and effect rather than mere association; modeling systems with feedback; and conducting analyses in ways that protect the privacy of individuals. Many of these challenges interact in fundamental ways that are both surprising and encouraging. Together, they point to an exciting new future for knowledge discovery and data mining.

Bio David Jensen is Associate Professor of Computer Science and Director of the Knowledge Discovery Laboratory at the University of Massachusetts Amherst. His current research focuses on causal discovery in relational data, computational social network analysis, fraud detection, and privacy. He serves on the Executive Committee of the ACM Special Interest Group on Knowledge Discovery and Data Mining and on the program committees of the International Conference on Machine Learning and the International Conference on Knowledge Discovery and Data Mining. He is an associate editor of the ACM Transactions on Knowledge Discovery from Data. He serves on DARPA's Information Science and Technology (ISAT) Group. He recently served on a National Research Council panel assessing the research program of the National Institutes of Justice. From 1991 to 1995, he served as an analyst with the Office of Technology Assessment, an agency of the United States Congress. He received his doctorate from Washington University in St. Louis in 1992.

The quantification of advertising and lessons from building a business based on large scale data mining

Konrad Feldman, CEO of Quantcast

Abstract As electronic communication, media and commerce increasingly permeate every aspect of modern life, real-time personalization of consumer experience through data-mining becomes practical. Effective classification, prediction and change modeling of consumer interests, behaviors and purchasing habits using machine learning and statistical methods drives efficiency, insights and consumer relevance that were never before possible. The internet has brought on a rapid evolution in advertising. Everything about behavior on the internet can be quantified and responses to behavior can occur in real time. This dynamic interaction with the user has created opportunities to better understand the way in which individuals move from awareness of a product to considering a purchase, through to intent and ultimately a sale for the marketer. When a marketer can answer the question „did those TV ads cause consumers to switch shampoo brands?‟ they can model behavior change and adjust marketing strategies accordingly. Underpinning this shift in how the world‟s trillion dollar marketing budget is spent is transactional data on an unprecedented scale, creating new challenges for software that must interpret this stream and make real time decisions tens, even hundreds of thousands of times every second. I will explore advances in modeling media consumption, advertising response and the real-time evaluation of media opportunities through reference to Quantcast, a business launched in September 2006 which today interprets in excess of 10 billion new digital media consumption records every day. We will examine the challenges of applying machine learning to non-search advertising and in doing so explore the creation of business environments – organization, infrastructure, tools, processes (and costs considerations) – in which scientists can quickly develop new petabyte scale algorithmic approaches, migrate them rapidly to real-time production and deliver fully customized experiences for marketers, publishers and consumers alike.

Bio Konrad Feldman, CEO, co-founded and launched Quantcast in 2006 along with Paul Sutter to transform the effectiveness of online advertising through the use of science and scalable computing. Prior to co-founding Quantcast, Feldman co-founded Searchspace (now Fortent) the leading provider of terrorist financing detection and anti-money laundering software for the world's financial services industry. As CEO of Searchspace's North American business, he established the business in the US and directed its rapid growth to become a market leader. Prior to Searchspace, Feldman was a Research Fellow in the Intelligent Systems Laboratory at University College London. Feldman holds a Bachelor of Science in Computer Science from University College, London.

Industrial Data Mining Case Studies - Invited Talks
Discovering Precursors to Aviation Safety Incidents: from Massive Data to Actionable Information

Ashok Srivastava, Intelligent Data Understanding group, NASA Ames Research Center

Abstract Modern aircraft are producing data at an unprecedented rate with hundreds of parameters being recorded on a second by second basis. The data can be used for studying the condition of the hardware systems of the aircraft and also for studying the complex interactions between the pilot and the aircraft. NASA is developing novel data mining algorithms to detect precursors to aviation safety incidents from these data sources. This talk will cover the theoretical aspects of the algorithms and practical aspects of implementing these techniques to study one of the most complex dynamical systems in the world: the national airspace.

Bio Ashok N. Srivastava, Ph.D. is the Principal Investigator for the Integrated Vehicle Health Management research project at NASA. His current research focuses on the development of data mining algorithms for anomaly detection in massive data streams, kernel methods in machine learning, and text mining algorithms.

Dr. Srivastava is also the leader of the Intelligent Data Understanding group at NASA Ames Research Center. The group performs research and development of advanced machine learning and data mining algorithms in support of NASA missions. He performs data mining research in a number of areas in aviation safety and application domains such as earth sciences to study global climate processes and astrophysics to help characterize the large-scale structure of the universe.

Dr. Srivastava is the author of many research articles in data mining, machine learning, and text mining, and has edited a book on Text Mining: Classification, Clustering, and Applications(with Mehran Sahami, 2009). He is currently editing two more books: Advances in Machine Learning and Data Mining for Astronomy (with Kamal Ali, Michael Way, and Jeff Scargle) andData Mining in Systems Health Management (with Jiawei Han).

He has won numerous awards including the IEEE Computer Society Technical Achievement Award for "pioneering work in Intelligent Information Systems," the NASA Exceptional Achievement Medal for contributions to state-of-the-art data mining and analysis, the NASA Distinguished Performance Award, several NASA Group Achievement Awards, the IBM Golden Circle Award, and the Department of Education Merit Fellowship.

Modeling with networked data

Francoise Fogelman-Soulie, VP Strategic Business Development, KXEN

Abstract Social Network Analysis has been one of the hottest topics among data mining scientists in the last 5 years. Meanwhile, more recently, companies, especially in Telco, have progressively started using these techniques to improve their predictive models. Through a few case studies, I will present the questions that SNA can address, the methodology we have used and the results which the companies obtained. I will then present other applications (in retail and social network sites), currently being deployed, with the scientific issues they raise.

Bio Francoise Soulie Fogelman is responsible for leading KXEN business development, identifying new business opportunities for KXEN and working with Product development, Sales and Marketing to help promote KXEN's offer. She is also in charge of managing KXEN's University Program. Ms Soulie Fogelman has over 30 years of experience in data mining and CRM both from an academic and a business perspective. Prior to KXEN, she directed the first French research team on Neural Networks at Paris 11 University where she was a CS Professor. She then co-founded Mimetics, a start-up that processes and sells development environment, optical character recognition (OCR) products and services using neural network technology, and became its Chief Scientific Officer. After that she started the Data Mining and CRM group at Atos Origin and, most recently, she created and managed the CRM Agency for Business & Decision, a french IS company specialized in Business Intelligence and CRM. Ms Soulie Fogelman holds a master’s degree in mathematics from Ecole Normale Superieure and a PhD in Computer Science from University of Grenoble. She was advisor to over 20 PhD on data mining, has authored more than 100 scientific papers and books and has been an invited speaker to many academic and business events.

Interactive Data Mining and its Business Applications

Rayid Ghani, Researcher, Accenture Technology Labs

Abstract A lot of practical data mining applications deal with settings where the goal is to help human experts find rare cases that are of interest to them. Fraud Detection, Intrusion Detection, Surveillance for security applications, Information Filtering, Recommender Systems are some examples of these applications. A common aspect among all of these problems is that they involve users (or experts) in an interactive classification setting, i.e. the experts are interacting with the results of the data mining system and in turn providing feedback that is valuable for the system. The competing goals of the data mining system are to make these experts more efficient and effective in performing their task as well as getting feedback that would allow it to improve itself over time. In this talk, I will describe this interactive data mining setting, give examples of case studies where this setting applies, and how data mining techniques help manage this tradeoff to build practical interactive systems that are not only useful but also improve over time.

Text Mining to Fast-Track Deserving Disability Applicants

John F. Elder IV, Chief Scientist, Elder Research, Inc.

Abstract If your health and finances are sufficiently poor, the Social Security Administration will send you taxpayer dollars to help out. But, applying and qualifying can be a long and frustrating process - sometimes taking up to two years! In the meantime, your health and finances are undoubtedly worsening. (Likely the reason half of those appealing a rejection eventually get approved; the lack of timely help ensures their deterioration.) Yet, by mining the important text of the applications, the SSA can identify those most likely to be approved upon analyst review, and put them in a much more efficient fast track - helping all applicants. The solution involves text extraction, token collocation, Bayesian inference, and a new way to combine evidence.

Dr. John Elder heads a data mining consulting team with offices in Charlottesville Virginia, Washington DC, Mountain View California, and Manhasset New York. Founded in 1995, Elder Research, Inc. focuses on investment, commercial and security applications of advanced analytics, including text mining, forecasting, stock selection, image recognition, process optimization, cross-selling, biometrics, drug efficacy, credit scoring, market timing, and fraud detection.

John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems Engineering from the University of Virginia, where he’s an adjunct professor teaching Optimization or Data Mining. Prior to 15 years at ERI, he spent 5 years in aerospace defense consulting, 4 heading research at an investment management firm, and 2 in Rice University's Computational & Applied Mathematics department.

Dr. Elder has authored innovative data mining tools, is a frequent keynote speaker, and was co-chair of the 2009 Knowledge Discovery and Data Mining conference, in Paris. John’s courses on analysis techniques -- taught at dozens of universities, companies, and government labs -- are noted for their clarity and effectiveness. Dr. Elder was honored to serve for 5 years on a panel appointed by the President to guide technology for National Security. His book with Bob Nisbet and Gary Miner, Handbook of Statistical Analysis & Data Mining Applications, won the PROSE award for Mathematics in 2009. His book with Giovanni Seni, Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions, was published in February 2010.

Mining Medical Data to Improve Patient Outcomes

R Bharat Rao, Balaji Krishnapuram, Murat Dundar, Siemens Healthcare

Abstract The last century has seen a massive increase in the accuracy and sensitivity of diagnostic tests: from observing external symptoms, to precise laboratory panels, to complex imaging methods for non-invasive internal examinations, to, in the very near future, the use of genomic and molecular analysis at the bedside. This improved diagnostic accuracy has resulted in an exponential increase in the patient data available to the physician. Furthermore, medical knowledge is continuously growing, with physicians being flooded with an expanding array of new tests, updated clinical guidelines on how to diagnose and treat patients, and evidence-based results from clinical trials. Both these trends – the increase in patient data and medical knowledge – will only intensify, as healthcare transforms into the practice of increasingly personalized medicine.

There is a tremendous opportunity for data mining methods to assist the physician, improve patient care, control costs, and ultimately to save lives. In this talk we will provide an overview of the special challenges faced in launching new healthcare data mining products, and identify a few key take aways for entrepreneurs who want to create new businesses in this domain. We begin by analyzing the clinical need for products to mine medical images to enable radiologists to identify cancers and other medical conditions in asymptomatic patients, and thus begin treatment as early as possible. The next step is personalized therapy selection, which requires data mining methods to mine different patient data sources, including images, free text, labs, pharmacy, molecular & genomic data. We discuss how to determine the scope and market size for products such as these, and identify the key methodological issues we have tackled. We focus on the clinical, regulatory and marketing challenges that we have had to solve over the last decade, as we have gone from concepts, to deployed products that are used today in thousands of patient encounters worldwide. We conclude by highlighting results that demonstrate the impact of data mining on patient care and improved outcomes.

Bio Dr. R. Bharat Rao is the Director of Knowledge Solutions in the the Health Services Division in Siemens Healthcare. Headquarted in Malvern, PA, USA, and Knowledge Solutions focuses on developing products and services that (a) help improve patient outcomes by integrating medical knowledge with various parts of a patient record (free text, images, labs, pharmacy, genomics, etc.), and (b) support the increasing drive to personalize medicine.

Dr. Rao received a B.Tech in Electronics Engineering from the Indian Institute of Technology, Madras in 1985, and an M.S. and Ph.D. focusing on machine learning from the Dept. of Electrical Engineering, University of Illinois, Urbana-Champaign, in 1993. He joined Siemens Corporate Research in 1993, and formed the Data Mining group there in 1996. In 2002, he moved to Siemens Healthcare to help found the Computer-Aided Diagnosis & Knowledge Solutions group.

Dr. Rao's research interests include probabilistic inference, machine learning, natural language processing, classification, and graphical models, with a focus on developing decision-support systems that can help physicians improve the quality of patient care. He is particularly interested in the development of novel data mining methods to collectively mine the structured and unstructured parts of a patient record and the automatic integration of medical domain knowledge into the mining process. He has published over 100 papers in peer-reviewed scientific journals and conferences in machine learning and medicine and has filed over 50 patents. In 2005, Siemens honored him with its "Inventor of the Year" award for “outstanding contributions related to improving the technical expertise and the economic success of the company” for developing the REMIND™ (Reliable Extraction and Meaningful Inference from Nonstructured Data) Platform. The REMIND Platform supports both the integration of knowledge into medical decision-support, as well as the discovery of novel medical knowledge to support personalized medicine. He has twice received the IEEE Data Mining Practice Prize for the best deployed industrial and government data mining application in 2005 (for the REMIND Platform) and 2009 (for Computer-Aided Diagnosis applications).

(Privacy-friendly!) Social Network Targeting for On-line Advertising

Foster Provost, Professor, Leonard N. Stern School of Business, New York University

Abstract I will discuss privacy-friendly methods for finding good audiences for on-line display advertising, by extracting quasi-social networks from browser behavior on user-generated content sites. Targeting social-network neighbors resonates well with advertisers, and on-line browsing behavior data counterintuitively can allow the identification of good audiences anonymously. I will discuss methods for extracting quasi-social networks from data on visitations to social media pages. The data are completely anonymous with respect to both browser identity and content. I will introduce measures of computing which browsers are "close" to other browsers that in the past have exhibited brand affinity. Results show that audiences with high brand proximity indeed show substantially higher brand affinity themselves, as well as higher propensity to convert. Time permitting, I also will present additional findings relating to whether the the quasi-social network actually embeds a true social network, how to gather appropriate training data, and whether on-line advertising actually is effective. This work was done in collaboration with Michael Barnathan, Brian Dalessandro, Rod Hook, Alan Murray, Claudia Perlich, and Xiaohan Zhang.

Bio Foster Provost is Professor, NEC Faculty Fellow, and Paduano Fellow of Business Ethics (Emeritus) at the NYU Stern School of Business. He is Chief Scientist for Coriolis Ventures, a NYC-based early stage venture and incubation firm. In 2001 he was Program Chair of the KDD Conference, and he just retired as Editor-in-Chief of the journal Machine Learning. His main research interests these days include predictive modeling with (social) network data, and alternative methods for data acquisition for data mining. Foster has applied data mining in practice to applications including on-line advertising, fraud detection, network diagnosis, targeted marketing, counterterrorism, and others. His work has won best paper awards at KDD, IBM Faculty Awards, and a President's Award at NYNEX Science and Technology. Last year his work on social network-based marketing systems won the 2009 INFORMS Design Science Award.

What's in your (customer's) wallet?

Claudia Perlich, Chief Scientist, Media6Degrees

Abstract In 2009 IBM was recognized as a finalist of the INFORMS Edelman competition for its predictive modeling initiative to improve the productivity of its global salesforce and with an estimated business impact of ~ 100 Million dollars. The first component implements some traditional propensity modeling to identify new sales opportunities and is currently used by over 13,000 sales reps. The second 'wallet estimation' component is used strategically to allocate sales resources based on validated analytical estimates of revenue opportunity. In this case study we cover the key elements leading to the success including the data integration, data mining and predictive modeling, solution delivery, human guided model validation, integration of the business process and we conclude with an assessment of the bottom-line business impact.

Bio Prior to joining Media6Degrees, Claudia spent five years working at the Data Analytics Research group at the IBM T.J. Watson Research Center, concentrating on research in data analytics and machine learning for complex real-world domains and applications. She has been published in over 30 scientific publications and holds multiple patents in the area of machine learning. Claudia has won many data mining competitions, including the prestigious 2007 KDD CUP on movie ratings, the 2008 KDD CUP on breast-cancer detection, and the 2009 KDD CUP on churn and propensity predictions for telecommunication customers. Claudia received her Ph.D. in Information Systems from Stern School of Business, New York University in 2005 and holds a Master of Computer Science from Colorado University.

Go to top

Washington Monument data mining picture

Gold Sponsor

Microsoft Advertising Logo

Silver sponsors

Yahoo Logo

SAS Logo

Google Logo

Accenture logo

Become a Corporate Sponsor!