Rationale and Design Goals of DOVES


1. Introduction

Recent compromises of computer systems and programs have made
everyone aware of techniques to violate computer security policies.
For example, many HTML-oriented mailers fail to check the length
of the application attribute of messages. This enables attackers
to execute arbitrary programs on remote systems. From this, they
acquire unauthorized privileges on those systems. Most people do
not realize that these problems are not new. They are as old as
the computer industry itself.

A vulnerability is an error or inconsistency that allows an entity
to gain unauthorized privileges or access or alter data in an
unauthorized manner. The failure to check the length of the
application attribute is a vulnerability in the browser software.
Techniques such as structured design reviews [Fairley] and robust
programming [153 handout] reduce the number and seriousness of
vulnerabilities. However, these principles guide the creation of
too little software.

Attacks or exploits  are actions, or sequences of actions, that
use vulnerabilities to acquire unauthorized access or privileges.
They leave traces, called signatures, composed of information from
various sources and showing views of the actions making up the
attack (or of the effects of such actions). Data making up signatures
includes traces from network connections (such as from tcpdump) or
from information in log files or system status listings.

Most studies propose models, technologies, or procedures to reduce
or eliminate vulnerabilities, or limit their effects. Few studies
have examined the underlying causes of vulnerabilities [PA, RISOS,
Aslam, Krusl], and none focus on the history of a particular
vulnerability or vulnerabilities. The previous studies classify
(or taxonomize) the vulnerabilities, but often inconsistently or
arbitrarily (see for example [Bish & Bailey]).

The UC Davis Vulnerability Project has five goals:

* To describe the vulnerabilities in a form useful to intrusion
  detection mechanisms

* To present techniques for finding these vulnerabilities

* To present techniques to inhibit or eliminate exploitation
  of those vulnerabilities

* To exhibit similarities between instances of vulnerabilities

* To provide a record of known vulnerabilities.

Meeting these goals requires a database of vulnerabilities, attacks,
and signatures. Specifically:

* Typically intrusion detection mechanisms examine signatures
  of attacks. Ideally, intrusion detection systems could take
  the signatures as raw data and generate the signature
  detection mechanism automatically.

* Developing techniques to find, inhibit, and/or eliminate exploitation
  of vulnerabilities requires testing proposed methods. Known
  vulnerabilities and attacks are ideal to validate the methods of
  detection and elimination.

* Inhibiting or eliminating exploitation of known vulnerabilities
  requires fixing, or ameliorating, those vulnerabilities. But is it
  possible to inhibit attackers from exploiting unknown vulnerabilities?

* Developing a measure of similarity among vulnerabilities, and
  making it meaningful, requires test data; the database provides
  this.

* A record of known vulnerabilities provides references for teaching
  computer programming [WECS talk] and computer security (especially
  the flaw hypothesis methodology [FHM] and a posteriori penetration
  testing techniques).

The database is organized into three distinct sections, with
cross-referencing. One section contains vulnerability data, another
attack tools and data, and the third signature data.

1.1 Vulnerability Data

The vulnerability database contains data about vulnerabilities.
This data is experimentally derived (by observation or experience).
It can be used to test hypotheses about classification, analysis
mechanisms, or derivation of other vulnerabilities as well as
theories or models of vulnerability existance, creation, or detection.
It can also be used to provide examples of errors or inconsistencies
in interface and system design for software engineering classes.

The vulnerability database contains details of vulnerabilities on
a number of systems. In many cases, the information is incomplete
(and is updated whenever possible). The information in the database
provides enough detail to classify the technical nature of the
vulnerability according to a variety of schemes. The data falls
into six categories:

1. Reference information describes the cataloguing of the vulnerability
   data.

2. Descriptive information tells about the vulnerability.

3. Exploit information tells about exploits, attacks, and signatures.

4. Repair information describes fixes and detection of attacks using
   this vulnerability.

5. Classification information describes the classification(s) of the
   vulnerability. Multiple classification schemes give multiple entries.

6. Bibliographic information lists sources, relevant papers, history
   of reportage, and related vulnerabilities.

The reference information simply identifies the vulnerability entry
so it can be cross-referenced. This way, other entries can refer
to this one.

Descriptive information presents enough background and detail for
an analyst to understand the nature of the vulnerability and how
and when it arises. It includes the system type, any environmental
conditions and programming conditions that must hold for the
vulnerability to exist, a description of the vulnerability, and
all programs and inputs involved in the vulnerability. The verifiers
or reporters for each configuration are listed.

Exploit information contains information about how to exploit the
vulnerability. This includes attack tools, and may be a pointer
into the exploit database.

Repair information describes how to detect and fix the vulnerability,
both at the system administration level (i.e., without source code)
and at the programmatic level (i.e., with source code). This may
include pointers to vendor patches when appropriate. Ideally, this
section would include instructions in a formal language to automate
detection and repair. (Ideally, vendors would supply this part in
a common formal language.)

Classification schemes organize vulnerability data to exhibit
specific properties. The same vulnerability may be classified in
many different ways using many different schemes. The classification
information section contains the classification of the vulnerability
under various schemes.

Bibliographic information includes who reported it, when and where,
and pointers to related vulnerabilities and advisories from vendors
and/or incident response teams. It also says who validated the
vulnerability.

1.2 Exploit Data

The exploit database contains data about exploits, such as attack
tools and methods.  This is useful when detritus of an attack is
discovered. The analyst can locate tools, determine their intended
use and which vulnerabilities they exploit. This aids further
analysis. The analyst can also trace geneology of tools, as variants
are in the database.  This information may help understand where
a tool comes from, and how it spreads throughout the attack community.

The data is composed of attack tools and methods for exploiting
vulnerabilities on particular systems. The tools are tied to
particular vulnerabilities in the vulnerabilities databases, and
can be used to test for those vulnerabilities. The distribution
files are also available. Whenever possible, variants are included
[OBrien]. The data falls into five categories:

1. Reference information describes the cataloguing of the
   attack tool or method.

2. Descriptive information tells about the attack tool or
   method.

3. Exploit information tells about vulnerabilities and
   signatures.

4. Classification information describes the classification(s)
   of the exploits. Multiple classification schemes give multiple
   entries.

5. Bibliographic information lists sources, relevant papers,
   history of reportage, and related exploits.

The reference information simply identifies the exploit entry so
it can be cross-referenced. This way, other entries can refer to
this one.

Descriptive information presents enough background and detail for
an analyst to understand how the tool works, any necessary
preconditions, and how to compile and execute it. It includes the
system type, the compiler or interpreter version (if relevant),
libraries, a brief description of the algorithm of the tool, and
all system programs and inputs involved. The verifiers or reporters
for each configuration are listed. If the tool must be changed to
work on a particular system, the original attack tool is kept (for
comparison purposes) and notes about the modification are stored.

Exploit information contains information about the vulnerability
that the tool or method exploits, and the signature that the tool
produces. These may be pointers into the vulnerability and signature
databases.

Classification schemes organize attack tools and methods to exhibit
specific properties.  The same attack tool or method may be classified
in many different ways using many different schemes. The classification
information section contains the classification of the entry under
various schemes.

Bibliographic information includes from where the tool was obtained,
the author (if known), and pointers to related attack tools and
advisories from vendors and/or incident response teams.

1.3 Signature Data

The signature database collects signatures of attack tools and
methods on different systems. It is a resource that maps traces of
attacks into attack tools, methods, and vulnerabilities. An
investigator can collect logs, traces, and other data from an
attack, and use this database to determine what tools may have been
used, what vulnerabilities may have been exploited, and other
information..

The data is composed of signatures collected from different systems.
When possible, pointers to tools and vulnerabilities show what
might have caused the signature. (Note that multiple tools may show
the same traces; for example, the traces from a network sniffer
look the same for any port scanners. The signature data is sanitized
to conceal the particular hosts or sites involved. The data is also
reduced to exclude anything unrelated to the attack. The data falls
into six categories:

1. Reference information describes the cataloguing of the attack
   tool or method.

2. Descriptive information describes the data composing the signature,
   possibly with examples.

3. Exploit information tells about attack tools and methods, and
   vulnerabilities.

4. Classification information describes the classification(s) of
   the signatures. Multiple classification schemes give multiple
   entries.

5. Bibliographic information lists sources, relevant papers, history
   of reportage, and related signatures.

The reference information simply identifies the signature entry so
it can be cross-referenced. This way, other entries can refer to
this one.

Descriptive information identifies the system(s) from which the
signature was obtained, and any details about that system (for
example, if certain logging capabilities must be turned on, or if
a specific package - such as Sun's BSM [BSM] - was used). The entry
cross-references other signatures from the same attack but obtained
on different systems. Ideally, the database will use a little
language [Bentley] to describe the signature.  This would allow
intrusion detection systems to translate the signature into their
own internal format. The signatures will be minimal, in the sense
that no unnecessary informaton will be put into the signature.
However, establishing minimality is a research problem, so errors
will undoubtedly occur here.

Exploit information contains information about the attack tool or
tools that produce the signature. These may be pointers into the
exploit databases.

Classification schemes organize signatures to exhibit specific
properties. The same signature may be classified in many different
ways using many different schemes. The classification information
section contains the classification of the entry under various
schemes.

Bibliographic information includes from where the signature was
obtained, the originator (if known), and pointers to related
vulnerabilities, signatures, attack tools and advisories from
vendors and/or incident response teams.

2. Details of the Database

The goal of the database is to provide a comprehensive facility to
aid analysts in dealing with break-ins and other compromises. This
suggests several requirements:

1. The database must be in a non-proprietary database format that
   can be used on any system. The database should be accessible to
   users of multiple systems (such as DOS, Windows, UNIX systems, and
   the Macintosh).

2. The database must allow searching by arbitrary field. Different
   characteristics of an attack, or attack tool, may be detected, and
   so must be used to look for information in the database.

3. The database must be easily extensible. As investigators and
   researchers gain experience with computer investigations, some data
   may assume unanticipated importance. Creating a new field in each
   record must be quick and simple.

4. The database must handle incomplete information. Many times,
   signatures have no associated attack tools (because they have not
   been found). The effects may not be known. The ability to search
   these records should not be impaired by the missing information.

5. Updating the database must be simple. This allows users to keep
   up to date with the latest known attack tools, methods, signatures,
   and vulnerabilities.

6. The interface that provides search capabilities must function
   on various systems, such as UNIX systems, Windows NT, and Macintoshes.

These requirements lead to classification techniques for attack
tools and attack signatures. The use of SGML will provide the needed
flexibility and interoperability. The next sections discuss these.

2.1 Classifications

Classification schemes are organizations of data for a particular
purpose. As the goal of this database is to locate attack tools
and signatures based on detritus and data gathered from attacks,
the classification scheme for attack tools and signatures will
focus on the effects of the tools and of the signatures. The scheme
will use these characteristics of attacks to provide a search
mechanism. One possibility is a decision tree; another, a vector
of characteristics.

Suppose a version of rootkit is found on a system. (The analysts
might first notice the file /dev/ttyp; using this as an "effect"
would take them to the rootkit entry.) This attack tool is placed
on a system only when the attacker has obtained superuser privileges.
The next step in the analysis of the system would be to determine
how the attacker acquired these privileges. So, the characteristic
in question is "obtaining superuser privileges." If the classification
scheme is a vector of characteristics, one could simply ask that
all attack tools and signatures with that characteristic be reported.
If the classification scheme is based on a decision tree, the tree
would provide a set of questions designed to reduce the possible
attack tools and signatures.

Classification requires an agreed-upon vocabulary. Because of the
diversity of terms, two approaches will be explored. The first one
is to develop a thesaurus to map terms to a canonical set. The
second approach is to develop a "little language" to describe
signatures and attack tools and use that to locate relevant entries.
Both approaches have merit.

2.2 Data Representation

The database will consist of entries using the Simple Graphics
Markup Language, or SGML. SGML is a metalanguage that uses markup
tags, like HTML. Unlike HTML, which has tags that focus on
representation (such as <b> for emboldening), SGML's tags are purely
descriptive. For example, the tag <attack_tool_src_files> means
that what follows is a list of source files for an attack too; no
formatting information is imparted. Further, the interpretation of
the tag is up to the interpreter. In the context of a database,
the tags will introduce fields and records and identify what each
contains. Because new tags can be created easily, SGML provides
the flexibility needed to augment the database records with new
fields.

All native SGML files are stored in ASCII, and hence can be examined
on almost any computer. Two different translation languages (FOSI
and DSSSL) allow SGML input to be transformed into more conventional
representations, such as HTML, XML, rtf (for Microsoft Word), MIF
(for Adobe's FrameMaker), and ASCII. Tools to perform these
translations are easily available (both commercially and for no
cost). It can also be translated into commands to enter data in
more conventional databases, such as Oracle.

2.3 Putting This All Together

To summarize, each tool and signature will have an entry (record),
and each entry will have many fields (as described above). Each
entry will be a single file of SGML text, and each field will be
delimited by an SGML tag. The entries can be searched by any text
searching tool (as SGML is simply text), although SGML-oriented
tools would be (slightly) more effective.

The research issues are:

1. How do we classify attack tools and attack signatures for easy
   identification by non-experts? What characteristics are most
   important?

2. How do we precisely describe signatures and attack tools? Can
   we develop a formal "little language" to automate the analysis of
   an incident?

3. How can we effectively distribute updates to the database over
   the World Wide Web?  Public key cryptography is one obvious approach,
   but key management raises several issues (especially those of
   protecting cryptographic keys on multi-user machines).

4. To whom is the database to be distributed? Is there any problem
   with it falling into the hands of attackers?

5. How are new signatures and attacks gathered to be entered into
   the database?


References

[Bentley]
	J. L. Bentley, Programming Perls, Addison-Wesley, Reading, MA
	(1985).
[Fairley]
	R. E. Fairley, Software Engineering Concepts,  McGraw-Hill,
	New York (1985).
[153 handout]
	M. Bishop, "Robust Programming," handout for ECS 153,
	Introduction to Computer Security, Department of Computer Science,
	University of California at Davis, Davis, CA (1998).
[PA]
	R. Bisbey II and D. Hollingsworth, "Protection Analysis Project
	Final Report,'' ISI/RR-78-13, DTIC AD A056816, USC/Information
	Sciences Institute (May, 1978).
[RISOS]
	R. P. Abbott, J. S. Chin, J. E. Donnelley, W. L.  Konigsford,
	S. Tokubo, and D. A. Webb, "Security Analysis and Enhancements of
	Computer Operating Systems," NBSIR 76-1041, Institute for Computer
	Sciences and Technology, National Bureau of Standards (Apr. 1976).
[Bish & Bailey]
	Matt Bishop and Dave Bailey, "A Critical Analysis of Vulnerability
	Taxonomies," Technical Report 96-11, Department of Computer
	Science, University of California at Davis (Sep. 1996).
[WECS talk]
	Matt Bishop, "Teaching Computer Security," position paper for the
	Workshop on Education in Computer Security, Monterey, CA (Jan. 1997). 
[OBrien]
	D. O'Brien , "Recognizing and Recovering from Rootkit Attacks,"
	Sys Admin 5(11) (Nov. 1996), pp. 8-20.
[BSM]
	Sun Microsystems, Inc., "Installing, Administering, and Using the
	Basic Security Module," Mountain View, CA (April 1992).
[FHM]
	R. R. Linde, "Operating Systems Penetration," 1978 National Computer
	Conference, AFIPS Conference Proceedings  44 pp. 361-368 (Nov. 1975).