Global Guard Meeting - November 24, 1998

GLOBAL GUARD MEETING
November 24, 1998
3085 ENG II
9:15 – 10:00

In attendance:
Karl Levitt (KL), David Klotz (DK), David O'Brien (DOB), Jeff Rowe (JR), and Steven Templeton (ST)
Todd Heberlein (TH), Chris Wee (CW), and Jason Schatz (JS) arrive near end.

TOPICS
False positive rates for Wall Street Journal
Statistical Correlation
Direction for the Project

False positive rates for Wall Street Journal

Karl mentioned that the reporter from WSJ was told by industry that there were no false positives
ST: If you consider it in terms of looking for anomalies, then there are no false alarms

Statistical Correlation

JR: Yes/No Type Interrupt Alarm – Yemini’s systems can handle variables/host-name
DOB: Let’s look at other correlations: Statistical Correlation – covariance matrix; countable random variables – how do we model/count them?

Large sampling over time
Time key measuring factor

KL: Attacks from sites
ST: Concept of Distance Measure/Similar to Clusters

New appropriate metric for distance

Correlate activity – method of distilling activity to binary features
Activity à Cluster Center; closer to cluster A than Cluster B

Activities in disparate places – find commonality

Correlating attribute values to normal
Bottom Level – counting

DOB: Stat 131 – stat correlation – will we use any of it?
DK: Packet maximum – Percentage of packets is high (in ratio to packet maximum) from certain hosts
KL: Example: Assume correlation, but not independent events near simultaneous – form of anomaly detection
ST: A à B correlation C enters, sniffing traffic
DOB: Codebook approach solution for Global Guard?

KL: Worth a try.

ST: Hamming Distance – why is this an appropriate distance measure?

Answer: unweighted and unbiased
KL: Reduction process – try to eliminate bias and weighting (but not all is eliminated)
DOB: Paper: A Coding Approach to Event Correlation

ST: Closest match calculate distance for your example with every reference?

DOB: Precomputes every combination. Vector 3 – enumerate 8 meaning with vector.
DK: Hash table to look-up quickly.

P= Problem	Code Book Vector	Incoming Vector
P1	110	000 à P2	P1 à 2	P2 à 2
P2	100	In à P1, P2	P1 à 2	P2 à 2
	Sparse	Not sparse

ST: Works for small vector
ST: Read Solomon – locality of where errors will occur. Burst as opposed to a lot. Arrange features to get things together.

KL: Yemini requires human modeling. With statistical methods, only get profiles
DOB: Correlation – decide whether to group events

DK: Numbers that give you statistical correlation
TH: Ad hoc statistical correlation
KL: Correlation vs. Anticorrelation: People who wear T-shirts in their 20s are millionaires in their 50s.
JR: Correlation without stats
ST: Accounting – move away from inferencing
TH: Statistical study of power lines and cancer rate more correlated to the poverty level than the power lines

Networks – To run one machine, another machine must be running

Direction for Project

JR: Symptoms – what extra information do we need?
JR: Yemini Yes/No Interrupt sensors on network is not enough. We need to generate our own interrupts
DOB: Model attacks
JR: Continuous variable may not know ahead of time
KL: Unknown attacks – anomaly detection
TH: Codebook -- flexible signature mechanism?