Thus, attributes that are accessed with similar frequency in a session
and are closer in the E-R diagram would be formed in the same cluster.
Problems
Association rules do not consider the distance between attributes, but
are very fast and scale well because they can be implemented by SQL.
Clustering algorithms do not scale well and computationally expensive,
but distance measure can be adjusted in a flexible manner.
How to extend association rule and frequent itemsets with distance measure?
X1 ... Xn => Y1 ... Yn, sup, conf, dist
{I1 ... In}, sup, dist
How to combine frequent itemsets with clustering? Use SQL to implement
clustering?
Detect misuse by matching audit records against profiles / policy
Association rules
derive association rules from audit records (R’)
match R’ against the rules in the profiles /policy for the corresponding
user and role (R)
define mismatch score by
# of mismatch rules between R and R’
# of new rules in R’
# of missing rules in R
support, confidence, distance of the rules can be used to adjust the weight
of mismatch for each rule
Frequent itemsets / clusters
derive frequent itemsets / clusters (F’) from audit records
match F’ against those in the profiles (F)
define mismatch score by
# of missing items
# of new items
support and dist of F and F’ can be used to adjust the weight of mismatch
Questions/Comments
KL: One of the problems Steven Templeton ran into was defining sessions.
It used to be one log in and log out constituted a session; now people
never log out.
KL: Consider looking at other clustering algorithms (AutoClass)
KL: Todd Heberlein looked at a distance measure in terms of directories
- so many hops from the work directory constituted inappropriate use.
KL: First profiles are generated independently, then combined. One rule
may subsume another
Object hierarchy - Attributes à Entity
à Views à
Schemes