Tutorial Sessions/Invited Talks

All tutorials and invited talks are free to registered conference attendees of all conferences held at WOLDCOMP'13. Those who are interested in attending one or more of the tutorials are to sign up on site at the conference registration desk in Las Vegas. A complete & current list of WORLDCOMP Tutorials can be found here.

In addition to tutorials at other conferences, DMIN'13 aims at providing a set of tutorials dedicated to Data Mining topics. The 2007 key tutorial was given by Prof. Eamonn Keogh on Time Series Clustering. The 2008 key tutorial was presented by Mikhail Golovnya (Senior Scientist, Salford Systems, USA) on Advanced Data Mining Methodologies. DMIN'09 provided four tutorials presented by Prof. Nitesh V. Chawla on Data Mining with Sensitivity to Rare Events and Class Imbalance, Prof. Asim Roy on Autonomous Machine Learning, Dan Steinberg (CEO of Salford Systems) on Advanced Data Mining Methodologies, and Peter Geczy on Emerging Human-Web Interaction Research. DMIN'10 hosted a tutorial presented by Prof. Vladimir Cherkassky on Advanced Methodologies for Learning with Sparse Data. He was a keynote speaker as well (Predictive Data Modeling and the Nature of Scientific Discovery). In 2011, Gary M. Weiss (Fordham University, USA) presented a tutorial on Smart Phone-Based Sensor Data Mining. Michael Mahoney (Stanford University, USA) gave a tutorial on Geometric Tools for Identifying Structure in Large Social and Information Networks. DMIN'12 hosted a talk given by Sofus A. Macskassy (Univ. of Southern California, USA) on Mining Social Media: The Importance of Combining Network and Content as well as a talk given by Haym Hirsh (Rutgers University, USA): Getting the Most Bang for Your Buck: The Efficient Use of Crowdsourced Labor for Data Annotation. Professor Hirsh was a WORLDCOMP keynote speaker, too. In addition, we hosted tutorials and invited talks held by Peter Geczy on Web Mining, Data Mining and Privacy: Water and Fire?, and Data Mining in Organizations.

DMIN'13 will be hosting the following tutorials/invited talks:

Tutorials

Tutorial A
Speaker	Vladimir Cherkassky, Dept. Electrical & Computer Eng., University of Minnesota, Minneapolis, USA
Topic/Title	EXTENSIONS and APPLICATIONS of UNIVERSUM LEARNING
Date & Time	Wednesday, July 24, 03:20 - 05:20pm
Location	Cohiba 3
Description	ABSTRACT: Most learning methods developed in statistics, machine learning, and pattern recognition assume a standard inductive learning formulation, in which the goal is to estimate a predictive model from finite training data. While this inductive setting is very general, there are several emerging non-standard learning settings that are particularly attractive for data-analytic modeling with sparse high-dimensional data. Such recent non-standard learning approaches include transduction, learning using privileged information, universum learning and multi-task learning. This tutorial describes the methodology called Universum learning or learning through contradiction (Vapnik 1998, 2006). It provides a formal mechanism for incorporating a priori knowledge about the application data for binary classification problems. This knowledge is provided in the form of unlabeled Universum data samples, in addition to labeled training samples (under standard inductive setting). The Universum samples belong to the same application domain as training data. However, they do not belong to either class, so they are treated as contradictions under a modified SVM-like Universum formulation. Several recent analytical and empirical studies provide ample evidence that Universum learning can improve generalization performance, especially for very ill-posed sparse settings. This tutorial will present an overview of Universum learning for binary classification along with practical conditions for evaluating the effectiveness of Universum learning, relative to standard SVM classifiers (Cherkassky et al, 2011; Cherkassky, 2013). Then I will present an extension of Universum SVM to cost-sensitive classification settings (Dhar and Cherkassky, 2012). The Universum learning methodology is known only for classification setting. It is not clear how to extend or modify the idea of learning through contradiction to other types of learning problems because the notion of ‘contradiction’ has been originally introduced for binary classification (Vapnik 1998). In the second part of this tutorial I will present some recent work on extending the idea of Universum learning to other types of learning problems, such as regression and single-class learning. For these problems, one can also expect to achieve improved generalization performance by incorporating a priori knowledge in the form of additional data samples from the same application domain. I will present new Universum problem settings for regression and single-class learning, along with (mathematical) SVM-like optimization formulations and discuss several application examples. INTENDED AUDIENCE: Researchers and practitioners interested in understanding advanced learning methodologies, and their applications. Participants are expected to have background knowledge of standard Support Vector Machine (SVM) classifiers. References Cherkassky, V. and F. Mulier, Learning from Data, second edition, Wiley, 2007 Cherkassky, V., Predictive Learning, VCtextbook.com. 2013 Cherkassky, V., Dhar, S., and W. Dai, Practical Conditions for Effectiveness of the Universum Learning, IEEE Transactions on Neural Networks,vol.22, no. 8, 1241-1255, 2011. Dhar, S. and V. Cherkassky, Cost-Sensitive Universum-SVM, Proc. ICMLA, 2012 Vapnik, V., Statistical Learning Theory, Wiley, 1998 Vapnik, V., Empirical Inference Science: Afterword of 2006, Springer 2006
Short Bio	Vladimir Cherkassky is Professor of Electrical and Computer Engineering at the University of Minnesota. He received Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin in 1985. His current research is predictive learning from data, and he has co-authored a monograph Learning From Data published by Wiley in 1998 (first edition) and 2007(second edition). He served on the Board of Governors of INNS in 1996-1997. He serves / has served on editorial boards of many journals including Neural Networks, IEEE Transactions on Neural Networks, Neural Networks, Neural Processing Letters and Natural Computing. He served on the program committee of major international conferences on Artificial Neural Networks. He was Director of NATO Advanced Study Institute (ASI) From Statistics to Neural Networks: Theory and Pattern Recognition Applications held in France, in 1993. He presented numerous tutorials and invited talks on neural networks and predictive learning from data. Prof. Cherkassky was active in promoting applications of predictive learning and artificial neural networks since late 1980’s. More recently, he organized and co-chaired several special sessions on Climate Modeling and Earth Sciences Applications at IJCNN 2005-2008. He was elected in 2007 as Fellow of IEEE for ‘contributions and leadership in statistical learning and neural networks’, and in 2008 he received The A. Richard Newton Breakthough Research Award from Microsoft Research for development and application of new learning methodologies

Tutorial B
Speaker	Alfred Inselberg, School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel
Topic/Title	Visualization & Data Mining for High Dimensional Datasets
Date & Time	Tuesday, July 23, 05:00 - 07:00pm
Location	Cohiba 3
Description	A dataset with M items has 2^M subsets anyone of which may be the one fullfiling our objectives. With a good data display and interactivity our fantastic pattern-recognition can not only cut great swaths searching through this combinatorial explosion, but also extract insights from the visual patterns. These are the core reasons for data visualization. With parallel coordinates (abbr. \|\|-cs) the search for relations in multivariate datasets is transformed into a 2-D pattern recognition problem. The foundations are developed interlaced with applications. Guidelines and strategies for knowledge discovery are illustrated on several real datasets (financial, process control, credit-score, intrusion-detection etc) one with hundreds of variables. A geometric classification algorithm is presented and applied to complex datasets. It has low computational complexity providing the classification rule explicitly and visually. The minimal set of variables required to state the rule (features) is found and ordered by their predictive value. Multivariate relations can be modeled as hypersurfaces and used for decision support. A model of a (real) country’s economy reveals sensitivies, impact of constraints, trade-offs and economic sectors unknowingly competing for the same resources. An overview of the methodology provides foundational understanding; learning the patterns corresponding to various multivariate relations. These patterns are robust in the presence of errors and that is good news for the applications. We stand at the threshold of breaching the gridlock of multidimensional visualization. The parallel coordinates methodology has been applied to collision avoidance and conflict resolution algorithms for air traffic control (3 USA patents), computer vision (1 USA patent), data mining (1 USA patent), optimization, decision support and elsewhere. Audience The accurate visualization of multidimensional problems and multivariate data unlocks insigths into the role of dimensionality. The tutorial is designed to provide such insights for people working on complex problems.
Short Bio	Alfred Inselberg received a Ph.D. in Mathematics and Physics from the University of Illinois (Champaign-Urbana) and was Research Professor there until 1966. He held research positions at IBM, where he developed a Mathematical Model of Ear (TIME Nov. 74), concurrently having joint appointments at UCLA, USC and later at the Technion and Ben Gurion University. Since 1995 he is Professor at the School of Mathematical Sciences at Tel Aviv University. He was elected Senior Fellow at the San Diego Supercomputing Center in 1996, Distinguished Visiting Professor at Korea University in 2008 and Distinguished Visiting Professor at National University of Singapore in 2011. Alfred invented and developed the multidimensional system of Parallel Coordinates for which he received numerous awards and patents (on Air Traffic Control, Collision-Avoidance, Computer Vision, Data Mining). The textbook Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications”, Springer (October) 2009, has a full chapter on Data Mining and was acclaimed, among others, by Stephen Hawking.

Invited Talks

Invited Talk A
Speaker	Peter Geczy, National Institute of Advanced Industrial Science and Technology (AIST), Japan
Topic/Title	Big Data = Big Challenges?
Date & Time	Wednesday, July 24, 11:00 - 12:20pm
Location	Cohiba 3
Description	Digital revolution of the past few decades has led to ever-expanding quantity and diversity of data. Complex systems and devices generate rapidly large amounts of operational data, human interactions with digital environments are recorded to a great detail, and sensors in gadgets and mobile devices collect a broad spectrum of data with increasing frequency. We leave a constantly growing amount of digital tracks as we go about our lives. Vast volumes of diverse data present various novel challenges and opportunities. Big data enable us to tackle longstanding complex problems that we have been unable to approach formerly. However, they also bring forward new challenges ranging from technological issues, through processing and data mining, to social and policy implications. We shall explore pertinent interdisciplinary aspects of these emerging initiatives.
Short Bio	Dr. Peter Geczy is with the National Institute of Advanced Industrial Science and Technology (AIST). He also held positions at the Institute of Physical and Chemical Research (RIKEN) and the Research Center for Future Technologies. His interdisciplinary scientific interests encompass domains of data and web mining, human interactions and behavior, social intelligence technologies, privacy, information systems, knowledge management and engineering, artificial intelligence, and adaptable systems. His recent research focus also extends to the spheres of service science, engineering, management, and computing. He received several awards in recognition of his accomplishments. Dr. Geczy has been serving on various professional boards and committees, and has been a distinguished speaker in academia and industry.

Invited Talk B
Speaker	Vladimir Cherkassky, Dept. Electrical & Computer Eng., University of Minnesota, Minneapolis, USA
Topic/Title	The Problem of Induction: When Karl Popper meets Big Data
Date & Time	Monday, July 22, 03:20 - 04:40pm
Location	Cohiba 3
Description	ABSTRACT: The main intellectual appeal of ‘Big Data’ is its promise to generate knowledge from data. This talk will provide critical evaluation of the popular view ‘more_data --> more_knowledge’, using both philosophical and technical arguments. In the philosophy of science, data-driven knowledge discovery is known as the problem of induction (or inductive inference). It has been known and studied by scientists and philosophers for ages. In particular, the problems of induction and (classical) knowledge discovery have been thoroughly investigated in Western philosophy of science. Later, in the 20-th century, two different technical methodologies for making (mathematically) rigorous inferences from data have been developed by Ronald Fisher (~ classical statistics) and by Vladimir Vapnik (~ VC-theory). Recent growth of digital data produced many data-analytic techniques developed by mathematicians/statisticians, engineers, biologists, computer scientists, economists etc. Yet current understanding of the important methodological aspects of these data-analytic algorithms (among practitioners and researchers) is very rudimentary or non-existent. My talk will expand on: The philosophical aspects of data-driven knowledge discovery, e.g. the difference between classical scientific knowledge and modern data-analytic knowledge. The difference between classical statistics and predictive (VC-theoretical) methodology. In particular, VC-theoretical methodology is more appropriate for estimating predictive models, and it has clear philosophical interpretation (which is very different from classical statistics). Unfortunately, confusion often arises when machine learning algorithms (that implement VC-theoretical framework) are presented/interpreted via classical statistical framework. Practical importance of VC-theoretical methodology for data mining applications. These practical aspects include: (a) formalization of application domain requirements, (b) parameter tuning (aka model complexity control) and (c) interpretation of predictive (black-box) models. All philosophical and methodological points presented in this talk will be illustrated using application examples ranging from image recognition to financial engineering and life sciences.
Short Bio	Vladimir Cherkassky is Professor of Electrical and Computer Engineering at the University of Minnesota. He received Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin in 1985. His current research is predictive learning from data, and he has co-authored a monograph Learning From Data published by Wiley in 1998 (first edition) and 2007(second edition). He served on the Board of Governors of INNS in 1996-1997. He serves / has served on editorial boards of many journals including Neural Networks, IEEE Transactions on Neural Networks, Neural Networks, Neural Processing Letters and Natural Computing. He served on the program committee of major international conferences on Artificial Neural Networks. He was Director of NATO Advanced Study Institute (ASI) From Statistics to Neural Networks: Theory and Pattern Recognition Applications held in France, in 1993. He presented numerous tutorials and invited talks on neural networks and predictive learning from data. Prof. Cherkassky was active in promoting applications of predictive learning and artificial neural networks since late 1980’s. More recently, he organized and co-chaired several special sessions on Climate Modeling and Earth Sciences Applications at IJCNN 2005-2008. He was elected in 2007 as Fellow of IEEE for ‘contributions and leadership in statistical learning and neural networks’, and in 2008 he received The A. Richard Newton Breakthough Research Award from Microsoft Research for development and application of new learning methodologies

Contact

Robert Stahlbock
General Conference Chair

E-mail: conference-chair@dmin-2012.com

Robert Stahlbock. Gary M. Weiss

Programme Co-Chairs

E-mail: programme-chair@dmin-2012.com

This website is hosted by the Lancaster Centre for Forecasting at the Department of Management Science at Lancaster University Management School.