Agenda for Misuse Detection Project Meeting: Monday 21-Oct-96, 5-6pm

Technical Paper (0:15) Raymond
    "Statistical Foundations of Audit Trail Analysis
    for the Detection of Computer Misuse", Paul Helman and Gunar Liepins

Milestones (0:15) Julie

Report on collaboration w/ Stanford (0:05) Karl

Debrief meeting with VTMH programmers (0:15) Brant

Med Informatics Survey (0:05) Steven

NSA Project milestones (0:05) Raymond

Topics for next agenda (0:05) Chris
        Chris' paper

Meeting Notes for Misuse Project - 10/21/96

Attendees: Chris, Brant, Raymond, Steven, Julie, Karl

Notes taken by Julie


Statistical Modeling

* N(t): generates a normal activity x at time t
* M(t): generates a misuse activity x at time t
* D(t): determine if transaction x at time t is normal/misuse
* N, M, D are pairwise independent => no temporal activities

Misuse Detector (MD)
* graded (a continue spectrum) / binary (0 or 1)

* define error as a weighted sum of overestimation and underestimation
  (i.e., if x is a misuse, how the MD will say about it)

Theorem 1: A binary MD minimizes the error if it is defined as 
MD = 0 if r(x) <= a lambda /(1 - lambda)b, and MD = 1 otherwise, where
lambda is the a priori probability of normal transaction being
generated, n(x) is the prob. of x being generated by normal process, 
and m(x) is the probaility of x being generated by misuse process. 

When these information is not available, we go for ranking the
transaction, according to the degree of suspicious; i.e., MD(x1) <
MD(x2) => x2 is more suspicious than x1;

define prioritization penalty/error as the error of misranking two

Theorem 2: a graded detector minimizes the prioritization error iff MDg
is consistent with r, i.e., r(x1) < r(x2) => MD(x1) < MD(x2)

still we need to know n(x) and m(x)

E.g., frequentist estimator of n(x) is number of time x occurs / total
number of transactions

for m(x), use a surrogate functions, e.g., uniform and independent
models. Uniform model treats each transaction occurs equally; independent
model treats as independent the distributions of the individual
attributes (L of them) of a transaction, where m(x) *= ni(x), for x=1, ..., L 

Overcome the sample size limitation by transforming the samples. Two
approaches: attribute selection, and value aggregtion (partition the
attribute domain). The goals are 

1) can still distinguish N and M process;

2) a good spread of r-values

3) preserve the structure of the space of the all the transactions, S 

4) small mass of unseen transactions

Theorem 3: it is a NP-complete to project a subset of the attributes,
but still maintain a certain number of singleton transactions (=>
resemble the S space)

Nonmodel Based Approach

rule-based (simple pattern matching)

Theorem 4: with the presence of nonmaximal rule (not covering all the
attributes in the transaction), it is possible that the scoring
function (sf) is not consistent with the ranking function (r) 

Meeting with VMTH


NSA Milestones

Items for Next Week's Agenda