OVAL Schema Working Group

1:00-2:30 p.m. EDT, 21 September 2004


Raffael Marty - ArcSight
David Waltermire - Center for Internet Security
Kent Landfield - Citadel
Matthew Wojcik - MITRE
David Proulx - MITRE
Jonathan Baker - MITRE
Andrew Buttner - MITRE
Robert Martin - MITRE
Anton Chuvakin - netForensics
Dan Bezilla - Secure Elements

Meeting Summary

MITRE in the process of reviewing all schemas, including the OVAL Results and System Characteristics schemas. The SC and OR schemas grew out of development on the Reference Interpreter, and may be too closely tied to that implementation. There's a strong desire for those formats to be more widely useful.

It's definitely possible to get a lot of information from the current structure, but they are clumsy in some ways. Many problems are due to the test ID being carried across the three files: test ID from an oval definition is used in the SC file to identify objects found which correspond to that test, and similarly the OR file uses that same ID to index results.

There are some advantages to the current format: parsing is generally easy, in terms of linking data between the files. Assessment against an SC file consists of going through each definition in the OVAL file, and for each test, looking for that test ID in the SC file to find what matching objects were found on the system. No pattern matching or component resolution is necessary against the contents of an SC file currently. An OR file consists of a true or false for each test ID, and is very easy to parse.

Another advantage is the limited redundancy between files: information is not replicated between the definition file, the SC file, and the results file. An SC file doesn't contain what vulnerable values are being looked for, but only what was found, and the OR file doesn't have either of those, but only whether the condition matched or not. This means the OR file in particular is fairly lightweight, while still providing a detailed, test-by-test scorecard.

Aside: we do realise that some might want a more bare-bones result format, with a simple list of OVAL-IDs and true/false flags. But the detailed results are seen as a big benefit to the end-user.

Clear disadvantages to the current format exist, though. An SC file may contain significant redundancy, with the same object appearing with all data under many test IDs. This would be compounded in the case of tests with pattern matches that match many objects!

Related, because of the unique test IDs, an object which does appear in an SC file (perhaps a historic one) might be missed by a new test which references it because that test did not yet exist at data collection time. So test IDs can actually lead to data mismatch.

Also, one cannot have an SC file which holds a "complete" snapshot of a machine without knowing what OVAL definitions will be run against it, because you won't know what test IDs to use to reference the data objects.

Chuvakin: Isn't that last problem a major drawback?

Woj: Yes, we're starting to think so.

Buttner: It depends on the functional requirements for the file. If it's desirable to support that sort of usage, ie holding a snapshot of a machine's configuration in the SC file independent of what definitions might be run against it, then yes. If that's not a requirement, and the goal is only to store data collected for a specific set of OVAL Definitions, then it's not necessarily a problem.

Chuvakin: But if that's not a requirement, I don't see the point of an SC file. It seems that the reason to have the SC file at all is to support that kind of use.

Woj: That's probably true. The SC file does have uses in the current format; there are tests which get re-used by multiple definitions, and the SC file means you don't need to re-evaluate them, etc. But clearly it has some significant limits. Again, the format reflects the development process of the reference interpreter, and we're realising how constraining that is.

Another point to remember is that defining what data to collect is important. It's not feasible to collect the full state of the machine, of course, and we cannot constrain what objects an OVAL definition can refer to. In other words, we can't define a fixed subset of machine state that OVAL needs, because it will eventually be inadequate (and often would contain information unneeded by any definition as well).

The working requirements when we originally designed the format were to contain exactly the data necessary to evaluate a known set of OVAL definitions. It's just become clear that we want to allow wider functionality, and the current format would limit that.

Waltermire: Another disadvantage is that there is no unique object ID in the SC file. If a particular test contains a pattern match, for example in a file path component, there's no unique ID for each of the files which matched. This makes advanced reporting difficult if not impossible: there's no way to link a result in the OR file back to the particular object(s) which actually matched the test.

Baker: That's absolutely right, and it is a problem.

It seems like the right approach is to get rid of the test IDs, and instead have a single entry for each object in the SC file. This would eliminate the redundancy problem. Giving each object an ID which would be unique in the context of the SC file (and its paired OR file) would address the issue Waltermire raised, as well. We've also discussed grouping all data collected for an object under that single entry, even if it's currently used by different test types in the OVAL Definition schema.

As a minor enhancement, grouping like objects would also aid readability and ease parsing. Currently tests are not ordered in the SC file.

This proposed format would allow for a "generic snapshot," while still supporting the approach the Reference Interpreter currently uses (ie only collect data on objects referenced by at least one definition in an OVAL file). Any object collected could be used by any test without worrying about the test ID match.

Waltermire: How would we handle specialization of objects? For instance, all files under Windows will share certain characteristics, but different types of files may have different metadata associated with them. How would the schema deal with that?

Baker: Current approach has been to include elements to hold all data that might be associated with an object type; if some data is not available for a specific object, it's just not included. So the Windows file object would include a place to store version information as used by a DLL, and an instance of a text file simply wouldn't include that information.

Woj: It's a good point, though, and we may need to look carefully at the various tests for each object type to be certain we can really group all data together.

The real drawback of this proposed structure is that data retrieval from the SC file actually becomes much harder. Any kind of pattern matching or component resolution which occurs in OVAL definitions will have to be done again to find the appropriate object(s) from the SC file. It essentially means doing the whole data resolution step twice: once on the machine during data collection, and again against the SC file in preparation for doing the analysis.

Waltermire: A different problem I see with the current structure is that having information split across three different files makes it impossible to use stylesheets to easily produce advanced reporting. XSLT really expects a single input file and a single output. It would be better for stylesheet processing if there was a way to combine all the data into a single file.

Woj: Another way of stating that problem is that it would be helpful for some applications to have all the information available in the OR file: what was looked for, what was found, and whether it matched. But I suspect this is an area where different people will have very different preferences. Some would want a very bare-bones results format, with simple OVAL-ID<->yes/no entries, while others would want everything together, and many will be somewhere in between.

It will be difficult to satisfy everyone. You don't want so little information that the results are barely useful, but you don't want to ship around huge results files or increase the processing overhead by including a lot of information most people won't use.

This moved the discussion on to the topic of the OVAL Results format.

Two Board members shared their experience with either current or planned use of the OR files, and seemed to bear that idea out. Typical initial use would be just looking to identify vulnerabilities found, but both saw potential for using the data to go further, for example to do software inventory type functions. So the current format might have more information than is needed for basic use, but not enough for more advanced applications.

Proulx: One approach might be to structure the OR file with a summary section and a details section. Would this aid parsing and processing?

Marty: What's the disadvantage of having more information in the OR file, or in a combined file? Is it just the size of the file, for transmission and storage, is it the increased difficulty in parsing? The summary section/details section idea wouldn't address the file size concerns.

Woj: Would allow people to store or transmit only the summary section if that were all that was required.

Landfield: It depends on what sort of information is in the two sections...if the detailed section is just an expanded version of the summary, it might not be valuable, but if it's different types of information, that's a different story.

Woj: People have definitely asked for a very simple results format, CVE-ID found/not found, and that could potentially be the summary section, and the details could include individual test results, or even more.

Waltermire: Since the summary is just a subset of the detailed results, maybe we should just have two separate results formats, two different files. A downstream application could choose which file it wanted, or the analysis application could be set to generate one or the other, or both.

Buttner: We could expand the standard to include a master file format and a number of transforms which would produce a "Results.A" or "Results.B" file, etc. These would be different subsets of the master file. Downstream apps could support a particular version, and the formats and transforms could be standardized. The analysis app could even just produce the master file, and then run the transforms based on what was wanted.

Woj: My concern with going too far down that path is that it could seriously hinder interoperability between OVAL-compatible applications. You essentially don't have one OR standard any more, but several, and there's no guarantee that two apps would actually be able two work together. The SC and OR formats as standards seem to me to be primarily for interoperability, and if there were too many options, we wouldn't make any progress there.

Waltermire: Also, if you have a real-time application, instead of a batch application, adding additional processing can slow down the real-time capability.

Baker: Seems to me we may need to make the OR schema more flexible, and have some portions of it be optional. Applications could run in a verbose mode or not, and print out various levels of detail based on that.

Martin: Regarding the concern about fracturing the standard, compatibility could be measured against the master format. So a compatible product must be capable of producing the combined file, but could support the subsets as well.

Woj: The problem with that is that it requires some kind of communication protocol between compatible applications, to allow various tools to communicate with each other what they can support, what they prefer, etc. That seems very undesirable.

Waltermire: You'd end up with tools which produce a limited subset for their normal use, or for interoperability with tools from the same vendor, and have maybe the additional ability to produce the full set as an option.

Woj: We've essentially been talking about a few different options. There's the current format. A bare-bones format of true or false for every OVAL-ID, either as a replacement, an option, or perhaps as a summary section. Or some kind of master or index file, that would combine all of the information from the definitions, SC, and results into one.

The master file should actually be straightforward to construct. Particularly if the SC format changes to remove test IDs and moves to an object-centric structure, with object IDs, there's very little actual overlap or conflict between the formats. The binary result attribute could be added directly into the structure of the definition file, and the SC information added as a separate section. A results summary section could be added as well. The master file could even be constructed from three individual files as some sort of transform.

Creating that file should be very straightforward; it would just be a large amount of data.

Waltermire: That approach provides a lot of flexibility, and could address a lot of the concerns on all sides.

Woj: We've had a lot of discussions about some of this at MITRE, and have gone back and forth a lot. For some of these decisions, there are really valid arguments on both sides, and it's hard to see any deciding reasons. Once we have details on proposed changes, we would really benefit from the Board's input in terms of how they would impact potential use cases.

Bezilla: An issue with rolling everything together into one file is that each of the current files has different consumables, so putting everything together would make it harder to deal with just one category.

It seems that regarding the SC file, the concerns are mostly to do with efficiency. Organizing things by object would definitely be more efficient. Having references to tests included with the objects would help. We'd be able to work on efficiencies ourselves, but it would leave the flexibility in the files, so we could leverage the data as we wanted to.

Regarding the results, keeping everything in one file does help with compatibility; letting it be optional allows us to pick and choose the parts we want to consume or produce to share with third files. I think the objectives with the OR format should be for compatibility, so transport issues shouldn't be a big issue. Interoperability is the key with the results.

Schema Revision Process

As mentioned at the Board meeting on 19 August 2004, it's clear that schema revisions will have to be handled more carefully in the future than they have in the past. As more people are using OVAL, and have compatible tools or processes, schema changes will have much greater impact than in the past, and time will need to be allowed for more careful review and to allow implementation of new schema versions.

A proposal is to move to a three-phase system. An open discussion phase would start whenever a member of the OVAL community raised concerns about the current accepted schema, or had proposals for new test types, etc. When some consensus is reached that changes are necessary, and details largely worked out, MITRE would publish XSD schema documents defining the proposed new schemas.

These new schemas would then enter a beta-testing phase. During this time, MITRE would work on converting the Reference Interpreters and other tools and processes to the beta schema. Others with OVAL compatible tools or processes would be strongly encouraged to start their conversions as well. The beta period would be used to gain working experience with the new schemas, and hopefully find any issues with the proposals. Comments and change suggestions would be accepted during the beta period. The beta would have to be at least a month, likely even longer, just judging by the amount of work MITRE will need to do to update code and processes.

At the end of the beta period, the schema would be frozen and enter a release candidate phase. Compatible products and services would use this period to implement any changes necessary. This phase would clearly need to last some months.

Marty: Regarding compatibility, if I have a parser for the new version of the results schema, for example, and someone's running an old interpreter that produces the old version, how do we handle that?

Woj: There are clearly a lot of issues with compatibility between versions. It's complicated partly because there are so many different pieces. If there are changes to the definition schema, and they're significant, not only is there a problem with old applications not being able to read definitions in the new format, but MITRE has to convert all of the existing definitions to the new format once the RC period is over.

What do we do with the old format? Do we keep supporting the old version for some amount of time? Are new definitions written in both formats? What if there are new tests in the new version that we can't represent in the old format? Do we just keep however many definitions we had as of the switchover date available in the old format instead?

Similar problems arise with the SC and OR formats, with incompatibilities arising between versions. That's the idea between the freeze or release candidate period, to allow people to update their code. Since the files do contain what schema version they're compatible with, at least it's possible to key off of that.

Landfield: Related to the work that's involved with supporting a new version, what about the SQL format for OVAL? OVAL is currently supporting both XML and SQL formats, but there are already some things that can't be done in SQL, for example with pattern matches. Is there a reason we're supporting SQL, and don't just focus on XML?

Woj: We're essentially still supporting SQL for historical and backwards-compatibility reasons. At the moment, it's also not a lot of work to produce SQL, because it's all done with stylesheets from the XML. When we move to a new version of the schema, it would be a significant amount of work (updating stylesheets and scripts, the SQL interpreters, etc) that we could avoid if we decided to abandon SQL support.

I do have some concerns because the web logs seem to show more downloads for the SQL-based interpreters than the XML ones, but that may well just be from web crawler traffic.

Landfield: I was just wondering if supporting SQL was going to be part of MITRE's burden for the beta phase.

Marty: Why not just pull the plug on SQL and see if anyone complains?

Woj: XML is clearly the focus at this point, and no one has really come to us in person and expressed a preference for SQL, so it's likely that with the next schema version, SQL will go away.

Chuvakin: On another topic, it should be easy to produce an XML conversion routine to convert an old XML format to the new one. That could be a solution for people who are concerned with their agents being unable to read various versions.

Woj: In a lot of cases that should be true, and we should consider it. Whether MITRE has the resources to provide conversion code is unclear.

Waltermire: A major problem will be timing OVAL version releases with various vendors' product release cycles, making sure that the degree of changes is kept minimal in each incremental revision. Instead of making 10 changes in each release, it might be better to limit it to two or three, and have more releases. Or have more major changes, but with much more infrequent releases.

Woj: We were actually leaning towards the latter, with more significant changes, but letting them pile up between schema versions, and have new versions rarely. We thought that would be easiest (partly in thinking about the work MITRE has to do).

Definition Schema Version 4

Woj gave a brief introduction to the most sweeping idea for changes in the new version of the Definition Schemas. MITRE has been thinking about separating the object being examined from the attributes of that object which are being tested.

For example, file tests would have an object portion which specified the path of the file(s) to collect information about, and a data section that specifies the test conditions on the attributes of the file(s). This would impact the SC and OR schemas as well, and details will be proposed shortly.

More working group teleconferences may be helpful in discussing the changes for version 4, if people could attend.

Back to top

Page Last Updated: February 07, 2008