- Searching for patterns of interest of a particular representational form

Data Mining Tasks

- Classification
- learning a function that maps a data item into one of several predefined classes
- Regression
- learning a function that maps a data item to a real-valued prediction variable
- Clustering
- identifies a finite set of catagories or clusters to describe the data
- -- Probability density estimation
- estimates from data the joint multivariate probability density function of all variables in the dataset
- Summarization
- finds a compact description for a subset of data
- derivation of association rules (summary rules)
- multivariate visualization techniques
- discovery of functional relations between variables
- Dependency Modeling
- finds a model that describes significant dependency between variables
- Change and Deviation Detection
- discovers most significant changes in the data from previously measured or normative values structural: which variables are locally dependent on each other quantitative: strengths of the dependencies

Components of Data Mining Algorithms

- Model Representation
- the language for describing discoverable patterns
- too limited a representation can never produce an accurate model
- what are the representational assumptions of a particular algorithm
- more powerful representations increase the danger of overfitting and resulting in poor predictive accuracy
- more complex representations increase the difficulty of search
- more complex representations increase the difficulth of model interpretation

- Model Estimation
- estimates how well a particular pattern meets the goals of the KDD process
- evaluation of predictive accuracy uses cross-validation
- evaluation of descriptive quality involves predictive accuracy, novelty, utility, and understandability of the fitted model.
- can use both logical and statistical criteria (ie maximum likelihood principle to fit model parms)

- Search Method
- Parameter Search
- find paramters that optimize the model evaluation criteria given the obvserved data and a fixed model representation
- - can be "searchless", closed form for simple problems
- - greedy iterative search common (ie. backprop gradient descent)
- Model Search
- considers different instances of a model from its family of representations
- - iterated loop over parameter searchj
- - use heuristic search techniques

Data Mining Algorithm Families

- Decision Trees and Rules
- univariate splits ==> easy to understand
- simple threshold splits severly restrict functional form and approximation power
- expanding model space to more general expressions improves predictor power, but makes rules harder to comprehend
- use greedy search ot grow and prune tre structure; explores a super-exponential model space
- primarily for predictive (classification, regression) modeling, but cam be used for summary tasks

- Non-Linear Regression and Classification Methods
- fits linear and nonlinear combinations of basis functions (sigmoids, splines, polynomials) to combinations of the input variables
- e.g. FFNN, adaptive spline methods, projection persuit methods
- can be difficult ot interpred results (compare to threshold boundry)

- Example Based Methods
- use representative examples from database to approximate a model
- nearest neighbor classification and regression
- case-based reasoning systems
- requires a well defined distance metric for evaluating distance between examples
- difficult to find metric, especially when variables are catagorical or not of same units
- evaluated by cross-validation estimates
- powerful, but may be difficult to interpret: model implicit in data and not explicitly formulated
- e.g. kernel density estimation, mixture modeling

- Probabilistic Graphical Dependency Models
- specifies probabalistic dependencies underlying a particular model using a graph structure (Pearl 88, Wittacker 90)
- specifies which variables are directly dependent on each other (in simplest model)
- typically categorical, discrete variables. Special case extension to real-valued variables is possible
- Probabalistic expert system
- - experts supply structure and parameters (i.e. conditional probabilities)
- - recent work in AI/Statistics has been on methods to learn both the structure and parameters of the model from data
- Model Evaluation: Bayesian
- Parameter evaluation can be closed form or iterative
- Model Search: greedy hill-climbing
- Can use prior knowledge (i.e. casual relations) to reduce search space
- Graphic form of model lends itslef to human interpretation

- Relational Learning Models -- Inductive Logic Programming
- Uses first-order logic ==> more flexible than decision trees and rules which only use propositional logic
- Easily finds structures such as x=y (unlike decision trees)
- Extra expressive power comes at the price of significant computational demands in terms of search (Deseroski 96)

Issues

- Nature of Data
- dynamic data
- irrelevant data
- missing values
- noise and uncertainty
- missing fields
- Use of domain knowledge
- Privacy and Security