Rationale and Design Goals of DOVES 1. Introduction Recent compromises of computer systems and programs have made everyone aware of techniques to violate computer security policies. For example, many HTML-oriented mailers fail to check the length of the application attribute of messages. This enables attackers to execute arbitrary programs on remote systems. From this, they acquire unauthorized privileges on those systems. Most people do not realize that these problems are not new. They are as old as the computer industry itself. A vulnerability is an error or inconsistency that allows an entity to gain unauthorized privileges or access or alter data in an unauthorized manner. The failure to check the length of the application attribute is a vulnerability in the browser software. Techniques such as structured design reviews [Fairley] and robust programming [153 handout] reduce the number and seriousness of vulnerabilities. However, these principles guide the creation of too little software. Attacks or exploits are actions, or sequences of actions, that use vulnerabilities to acquire unauthorized access or privileges. They leave traces, called signatures, composed of information from various sources and showing views of the actions making up the attack (or of the effects of such actions). Data making up signatures includes traces from network connections (such as from tcpdump) or from information in log files or system status listings. Most studies propose models, technologies, or procedures to reduce or eliminate vulnerabilities, or limit their effects. Few studies have examined the underlying causes of vulnerabilities [PA, RISOS, Aslam, Krusl], and none focus on the history of a particular vulnerability or vulnerabilities. The previous studies classify (or taxonomize) the vulnerabilities, but often inconsistently or arbitrarily (see for example [Bish & Bailey]). The UC Davis Vulnerability Project has five goals: * To describe the vulnerabilities in a form useful to intrusion detection mechanisms * To present techniques for finding these vulnerabilities * To present techniques to inhibit or eliminate exploitation of those vulnerabilities * To exhibit similarities between instances of vulnerabilities * To provide a record of known vulnerabilities. Meeting these goals requires a database of vulnerabilities, attacks, and signatures. Specifically: * Typically intrusion detection mechanisms examine signatures of attacks. Ideally, intrusion detection systems could take the signatures as raw data and generate the signature detection mechanism automatically. * Developing techniques to find, inhibit, and/or eliminate exploitation of vulnerabilities requires testing proposed methods. Known vulnerabilities and attacks are ideal to validate the methods of detection and elimination. * Inhibiting or eliminating exploitation of known vulnerabilities requires fixing, or ameliorating, those vulnerabilities. But is it possible to inhibit attackers from exploiting unknown vulnerabilities? * Developing a measure of similarity among vulnerabilities, and making it meaningful, requires test data; the database provides this. * A record of known vulnerabilities provides references for teaching computer programming [WECS talk] and computer security (especially the flaw hypothesis methodology [FHM] and a posteriori penetration testing techniques). The database is organized into three distinct sections, with cross-referencing. One section contains vulnerability data, another attack tools and data, and the third signature data. 1.1 Vulnerability Data The vulnerability database contains data about vulnerabilities. This data is experimentally derived (by observation or experience). It can be used to test hypotheses about classification, analysis mechanisms, or derivation of other vulnerabilities as well as theories or models of vulnerability existance, creation, or detection. It can also be used to provide examples of errors or inconsistencies in interface and system design for software engineering classes. The vulnerability database contains details of vulnerabilities on a number of systems. In many cases, the information is incomplete (and is updated whenever possible). The information in the database provides enough detail to classify the technical nature of the vulnerability according to a variety of schemes. The data falls into six categories: 1. Reference information describes the cataloguing of the vulnerability data. 2. Descriptive information tells about the vulnerability. 3. Exploit information tells about exploits, attacks, and signatures. 4. Repair information describes fixes and detection of attacks using this vulnerability. 5. Classification information describes the classification(s) of the vulnerability. Multiple classification schemes give multiple entries. 6. Bibliographic information lists sources, relevant papers, history of reportage, and related vulnerabilities. The reference information simply identifies the vulnerability entry so it can be cross-referenced. This way, other entries can refer to this one. Descriptive information presents enough background and detail for an analyst to understand the nature of the vulnerability and how and when it arises. It includes the system type, any environmental conditions and programming conditions that must hold for the vulnerability to exist, a description of the vulnerability, and all programs and inputs involved in the vulnerability. The verifiers or reporters for each configuration are listed. Exploit information contains information about how to exploit the vulnerability. This includes attack tools, and may be a pointer into the exploit database. Repair information describes how to detect and fix the vulnerability, both at the system administration level (i.e., without source code) and at the programmatic level (i.e., with source code). This may include pointers to vendor patches when appropriate. Ideally, this section would include instructions in a formal language to automate detection and repair. (Ideally, vendors would supply this part in a common formal language.) Classification schemes organize vulnerability data to exhibit specific properties. The same vulnerability may be classified in many different ways using many different schemes. The classification information section contains the classification of the vulnerability under various schemes. Bibliographic information includes who reported it, when and where, and pointers to related vulnerabilities and advisories from vendors and/or incident response teams. It also says who validated the vulnerability. 1.2 Exploit Data The exploit database contains data about exploits, such as attack tools and methods. This is useful when detritus of an attack is discovered. The analyst can locate tools, determine their intended use and which vulnerabilities they exploit. This aids further analysis. The analyst can also trace geneology of tools, as variants are in the database. This information may help understand where a tool comes from, and how it spreads throughout the attack community. The data is composed of attack tools and methods for exploiting vulnerabilities on particular systems. The tools are tied to particular vulnerabilities in the vulnerabilities databases, and can be used to test for those vulnerabilities. The distribution files are also available. Whenever possible, variants are included [OBrien]. The data falls into five categories: 1. Reference information describes the cataloguing of the attack tool or method. 2. Descriptive information tells about the attack tool or method. 3. Exploit information tells about vulnerabilities and signatures. 4. Classification information describes the classification(s) of the exploits. Multiple classification schemes give multiple entries. 5. Bibliographic information lists sources, relevant papers, history of reportage, and related exploits. The reference information simply identifies the exploit entry so it can be cross-referenced. This way, other entries can refer to this one. Descriptive information presents enough background and detail for an analyst to understand how the tool works, any necessary preconditions, and how to compile and execute it. It includes the system type, the compiler or interpreter version (if relevant), libraries, a brief description of the algorithm of the tool, and all system programs and inputs involved. The verifiers or reporters for each configuration are listed. If the tool must be changed to work on a particular system, the original attack tool is kept (for comparison purposes) and notes about the modification are stored. Exploit information contains information about the vulnerability that the tool or method exploits, and the signature that the tool produces. These may be pointers into the vulnerability and signature databases. Classification schemes organize attack tools and methods to exhibit specific properties. The same attack tool or method may be classified in many different ways using many different schemes. The classification information section contains the classification of the entry under various schemes. Bibliographic information includes from where the tool was obtained, the author (if known), and pointers to related attack tools and advisories from vendors and/or incident response teams. 1.3 Signature Data The signature database collects signatures of attack tools and methods on different systems. It is a resource that maps traces of attacks into attack tools, methods, and vulnerabilities. An investigator can collect logs, traces, and other data from an attack, and use this database to determine what tools may have been used, what vulnerabilities may have been exploited, and other information.. The data is composed of signatures collected from different systems. When possible, pointers to tools and vulnerabilities show what might have caused the signature. (Note that multiple tools may show the same traces; for example, the traces from a network sniffer look the same for any port scanners. The signature data is sanitized to conceal the particular hosts or sites involved. The data is also reduced to exclude anything unrelated to the attack. The data falls into six categories: 1. Reference information describes the cataloguing of the attack tool or method. 2. Descriptive information describes the data composing the signature, possibly with examples. 3. Exploit information tells about attack tools and methods, and vulnerabilities. 4. Classification information describes the classification(s) of the signatures. Multiple classification schemes give multiple entries. 5. Bibliographic information lists sources, relevant papers, history of reportage, and related signatures. The reference information simply identifies the signature entry so it can be cross-referenced. This way, other entries can refer to this one. Descriptive information identifies the system(s) from which the signature was obtained, and any details about that system (for example, if certain logging capabilities must be turned on, or if a specific package - such as Sun's BSM [BSM] - was used). The entry cross-references other signatures from the same attack but obtained on different systems. Ideally, the database will use a little language [Bentley] to describe the signature. This would allow intrusion detection systems to translate the signature into their own internal format. The signatures will be minimal, in the sense that no unnecessary informaton will be put into the signature. However, establishing minimality is a research problem, so errors will undoubtedly occur here. Exploit information contains information about the attack tool or tools that produce the signature. These may be pointers into the exploit databases. Classification schemes organize signatures to exhibit specific properties. The same signature may be classified in many different ways using many different schemes. The classification information section contains the classification of the entry under various schemes. Bibliographic information includes from where the signature was obtained, the originator (if known), and pointers to related vulnerabilities, signatures, attack tools and advisories from vendors and/or incident response teams. 2. Details of the Database The goal of the database is to provide a comprehensive facility to aid analysts in dealing with break-ins and other compromises. This suggests several requirements: 1. The database must be in a non-proprietary database format that can be used on any system. The database should be accessible to users of multiple systems (such as DOS, Windows, UNIX systems, and the Macintosh). 2. The database must allow searching by arbitrary field. Different characteristics of an attack, or attack tool, may be detected, and so must be used to look for information in the database. 3. The database must be easily extensible. As investigators and researchers gain experience with computer investigations, some data may assume unanticipated importance. Creating a new field in each record must be quick and simple. 4. The database must handle incomplete information. Many times, signatures have no associated attack tools (because they have not been found). The effects may not be known. The ability to search these records should not be impaired by the missing information. 5. Updating the database must be simple. This allows users to keep up to date with the latest known attack tools, methods, signatures, and vulnerabilities. 6. The interface that provides search capabilities must function on various systems, such as UNIX systems, Windows NT, and Macintoshes. These requirements lead to classification techniques for attack tools and attack signatures. The use of SGML will provide the needed flexibility and interoperability. The next sections discuss these. 2.1 Classifications Classification schemes are organizations of data for a particular purpose. As the goal of this database is to locate attack tools and signatures based on detritus and data gathered from attacks, the classification scheme for attack tools and signatures will focus on the effects of the tools and of the signatures. The scheme will use these characteristics of attacks to provide a search mechanism. One possibility is a decision tree; another, a vector of characteristics. Suppose a version of rootkit is found on a system. (The analysts might first notice the file /dev/ttyp; using this as an "effect" would take them to the rootkit entry.) This attack tool is placed on a system only when the attacker has obtained superuser privileges. The next step in the analysis of the system would be to determine how the attacker acquired these privileges. So, the characteristic in question is "obtaining superuser privileges." If the classification scheme is a vector of characteristics, one could simply ask that all attack tools and signatures with that characteristic be reported. If the classification scheme is based on a decision tree, the tree would provide a set of questions designed to reduce the possible attack tools and signatures. Classification requires an agreed-upon vocabulary. Because of the diversity of terms, two approaches will be explored. The first one is to develop a thesaurus to map terms to a canonical set. The second approach is to develop a "little language" to describe signatures and attack tools and use that to locate relevant entries. Both approaches have merit. 2.2 Data Representation The database will consist of entries using the Simple Graphics Markup Language, or SGML. SGML is a metalanguage that uses markup tags, like HTML. Unlike HTML, which has tags that focus on representation (such as for emboldening), SGML's tags are purely descriptive. For example, the tag means that what follows is a list of source files for an attack too; no formatting information is imparted. Further, the interpretation of the tag is up to the interpreter. In the context of a database, the tags will introduce fields and records and identify what each contains. Because new tags can be created easily, SGML provides the flexibility needed to augment the database records with new fields. All native SGML files are stored in ASCII, and hence can be examined on almost any computer. Two different translation languages (FOSI and DSSSL) allow SGML input to be transformed into more conventional representations, such as HTML, XML, rtf (for Microsoft Word), MIF (for Adobe's FrameMaker), and ASCII. Tools to perform these translations are easily available (both commercially and for no cost). It can also be translated into commands to enter data in more conventional databases, such as Oracle. 2.3 Putting This All Together To summarize, each tool and signature will have an entry (record), and each entry will have many fields (as described above). Each entry will be a single file of SGML text, and each field will be delimited by an SGML tag. The entries can be searched by any text searching tool (as SGML is simply text), although SGML-oriented tools would be (slightly) more effective. The research issues are: 1. How do we classify attack tools and attack signatures for easy identification by non-experts? What characteristics are most important? 2. How do we precisely describe signatures and attack tools? Can we develop a formal "little language" to automate the analysis of an incident? 3. How can we effectively distribute updates to the database over the World Wide Web? Public key cryptography is one obvious approach, but key management raises several issues (especially those of protecting cryptographic keys on multi-user machines). 4. To whom is the database to be distributed? Is there any problem with it falling into the hands of attackers? 5. How are new signatures and attacks gathered to be entered into the database? References [Bentley] J. L. Bentley, Programming Perls, Addison-Wesley, Reading, MA (1985). [Fairley] R. E. Fairley, Software Engineering Concepts, McGraw-Hill, New York (1985). [153 handout] M. Bishop, "Robust Programming," handout for ECS 153, Introduction to Computer Security, Department of Computer Science, University of California at Davis, Davis, CA (1998). [PA] R. Bisbey II and D. Hollingsworth, "Protection Analysis Project Final Report,'' ISI/RR-78-13, DTIC AD A056816, USC/Information Sciences Institute (May, 1978). [RISOS] R. P. Abbott, J. S. Chin, J. E. Donnelley, W. L. Konigsford, S. Tokubo, and D. A. Webb, "Security Analysis and Enhancements of Computer Operating Systems," NBSIR 76-1041, Institute for Computer Sciences and Technology, National Bureau of Standards (Apr. 1976). [Bish & Bailey] Matt Bishop and Dave Bailey, "A Critical Analysis of Vulnerability Taxonomies," Technical Report 96-11, Department of Computer Science, University of California at Davis (Sep. 1996). [WECS talk] Matt Bishop, "Teaching Computer Security," position paper for the Workshop on Education in Computer Security, Monterey, CA (Jan. 1997). [OBrien] D. O'Brien , "Recognizing and Recovering from Rootkit Attacks," Sys Admin 5(11) (Nov. 1996), pp. 8-20. [BSM] Sun Microsystems, Inc., "Installing, Administering, and Using the Basic Security Module," Mountain View, CA (April 1992). [FHM] R. R. Linde, "Operating Systems Penetration," 1978 National Computer Conference, AFIPS Conference Proceedings 44 pp. 361-368 (Nov. 1975).