March 24, 1999 Meeting Notes

INTEL MEETING
March 24, 1999
1131 ENG II
12:00-1:00 p.m.

Misuse Detection In Relational DBMS

Christina Chung

Goal

Input

treat$symptom, patient$allergy, treat$diagnosis, drug$drugID, drug$drugName1,0,0,0,01,0,1,0,0

user accesses attribute symptom from entity treat in the first ‘select’ SQL
user accesses attributes symptom and diagnosis from entity treat from second ‘select’ SQL

Derive profiles by association rules

treat$symptom & patient$allergy => drug$drugID, sup=20, conf=0.8

Derive profiles by clustering

= shortest path between E1, E2 in E-R diagram

=number of audit records with E1$A1=1 during a session

= Abs(Freq(E1$A1)- Freq(E2$A2)) / Max(Freq(E1$A1), Freq(E2$A2)

= ERDistance(E1$A1, E2$A2)* FreqSim(E1$A1, E2$A2)

Thus, attributes that are accessed with similar frequency in a session and are closer in the E-R diagram would be formed in the same cluster.

Problems

Association rules do not consider the distance between attributes, but are very fast and scale well because they can be implemented by SQL.
Clustering algorithms do not scale well and computationally expensive, but distance measure can be adjusted in a flexible manner.
How to extend association rule and frequent itemsets with distance measure?
X1 ... Xn => Y1 ... Yn, sup, conf, dist
{I1 ... In}, sup, dist
How to combine frequent itemsets with clustering? Use SQL to implement clustering?

Detect misuse by matching audit records against profiles / policy

Association rules
derive association rules from audit records (R’)
match R’ against the rules in the profiles /policy for the corresponding user and role (R)
define mismatch score by

# of mismatch rules between R and R’
# of new rules in R’
# of missing rules in R
support, confidence, distance of the rules can be used to adjust the weight of mismatch for each rule

KL: One of the problems Steven Templeton ran into was defining sessions. It used to be one log in and log out constituted a session; now people never log out.
KL: Consider looking at other clustering algorithms (AutoClass)
KL: Todd Heberlein looked at a distance measure in terms of directories - so many hops from the work directory constituted inappropriate use.
KL: First profiles are generated independently, then combined. One rule may subsume another
Object hierarchy - Attributes à Entity à Views à Schemes
Suggested Reading

Machine Learning - Russel Norvik AI - induction, Decision Tree, Clustering (AutoClass)
Justin Doak's thesis
Ides
Haystack

Intel interested in configuration
Think of possible applications for Intel - they may be able to help add to application
IBM funding? Misuse detection, Freud management system