November 24, 1998
3085 ENG II
9:15 – 10:00
In attendance:
Karl Levitt (KL), David Klotz (DK), David O'Brien (DOB), Jeff Rowe (JR), and Steven Templeton (ST)
Todd Heberlein (TH), Chris Wee (CW), and Jason Schatz (JS) arrive near end.

False positive rates for Wall Street Journal
Statistical Correlation
Direction for the Project
    1. False positive rates for Wall Street Journal
      1. Karl mentioned that the reporter from WSJ was told by industry that there were no false positives
      2. ST: If you consider it in terms of looking for anomalies, then there are no false alarms
    2. Statistical Correlation
      1. JR: Yes/No Type Interrupt Alarm – Yemini’s systems can handle variables/host-name
      2. DOB: Let’s look at other correlations: Statistical Correlation – covariance matrix; countable random variables – how do we model/count them?
        1. Large sampling over time
        2. Time key measuring factor
      3. KL: Attacks from sites
      4. ST: Concept of Distance Measure/Similar to Clusters
        1. New appropriate metric for distance
          1. Correlate activity – method of distilling activity to binary features
          2. Activity à Cluster Center; closer to cluster A than Cluster B
        2. Activities in disparate places – find commonality
          1. Correlating attribute values to normal
          2. Bottom Level – counting
      5. DOB: Stat 131 – stat correlation – will we use any of it?
      6. DK: Packet maximum – Percentage of packets is high (in ratio to packet maximum) from certain hosts
      7. KL: Example: Assume correlation, but not independent events near simultaneous – form of anomaly detection
      8. ST: A à B correlation C enters, sniffing traffic
      9. DOB: Codebook approach solution for Global Guard?
        1. KL: Worth a try.
      10. ST: Hamming Distance – why is this an appropriate distance measure?
        1. Answer: unweighted and unbiased
        2. KL: Reduction process – try to eliminate bias and weighting (but not all is eliminated)
        3. DOB: Paper: A Coding Approach to Event Correlation
      11. ST: Closest match calculate distance for your example with every reference?
        1. DOB: Precomputes every combination. Vector 3 – enumerate 8 meaning with vector.
        2. DK: Hash table to look-up quickly.
        3. P= Problem Code Book Vector
          Incoming Vector
          P1 110 000 à P2 P1 à 2 P2 à 2
          P2 100 In à P1, P2 P1 à 2 P2 à 2
            Sparse Not sparse    

        4. ST: Works for small vector
        5. ST: Read Solomon – locality of where errors will occur. Burst as opposed to a lot. Arrange features to get things together.
      12. KL: Yemini requires human modeling. With statistical methods, only get profiles
      13. DOB: Correlation – decide whether to group events
        1. DK: Numbers that give you statistical correlation
        2. TH: Ad hoc statistical correlation
        3. KL: Correlation vs. Anticorrelation: People who wear T-shirts in their 20s are millionaires in their 50s.
        4. JR: Correlation without stats
        5. ST: Accounting – move away from inferencing
        6. TH: Statistical study of power lines and cancer rate more correlated to the poverty level than the power lines
          1. Networks – To run one machine, another machine must be running
    3. Direction for Project
      1. JR: Symptoms – what extra information do we need?
      2. JR: Yemini Yes/No Interrupt sensors on network is not enough. We need to generate our own interrupts
      3. DOB: Model attacks
      4. JR: Continuous variable may not know ahead of time
      5. KL: Unknown attacks – anomaly detection
      6. TH: Codebook -- flexible signature mechanism?