On in-model constraints, external constraints, and IDs #2101

aj-stein-gsa · 2025-02-07T05:21:14Z

aj-stein-gsa
Feb 7, 2025

Apologies for the delay, but I wanted to separate this out from PR review, @iMichaela. I thought it would be better to separate this explanation into a discussion post.

So in regards to #2090, you as a NIST maintainer posed some valid concerns without further explanation.

This means that, unless a GRC tool or platform is leveraging the OSCAL metaschema definitions, nobody else needs those IDs. And IF there are GRC tools/platforms using metaschema definitions, they will ignore the IDs today unless they implement the functionality to use them. Bottom line, this enhancement serves only FedRAMP process today. How can CSPs or agencies consume/benefit from the IDs?

So I will talk a little bit about the benefits of constraint IDs, how software can use them for NIST, FedRAMP, and other use cases, and why it is beneficial to all of us with my desired roadmap for core OSCAL models.

Refresher

NIST, not just FedRAMP, uses the constraints mechanism to programmatically and declaratively define data requirements. Modern data schemas tools, XML Schema 1.1 and JSON Schema Draft 7 respectively, only support of subset of completeness and integrity checking requirements with minimal robustness. Everyone should be using Metaschema, even if it does not serialize in those schema types; it is more robust. Given years of work on this architecture, I thought that was a given with NIST maintainers and did not need discussion, but it bears repeating. Per the Metaschema specification, different constraint types have a @target and @test, and error messaging, with or without IDs, will only meaningfully show up in output when these fields do not evaluate to true. Additionally, XML Schema and JSON Schema have simple path-based targeting in-situ for a document model, Metaschema's Metapath for @target and @test are much more expressive (this capability becomes important later in the explanation too re IDs). Therefore, there is not a case where they would ever meaningfully show up in such schema like posited below.

In case the intension is to propagate the IDs to the XSD and JSON schemas, then the PR is not complete, because it is not accomplishing what and @telos use case described in the #2088 is not supported.

So all constraint types have this optional @id attribute. Yes, it does pose a maintenance burden. So why have them and how do we as a community benefit? I will explain those themes below.

Developer Experience

Adding or changing constraints and developer flow

One of the key benefits of giving each constraint an ID is for out-of-band communication between developers and for a sole developer editing the models. As it stands, you must known of one multiple possible locations relative to the model in <constraint> for a given assembly one or more constraints reside. By having a stable ID, a developer can easily search in an IDE. If multiple developers are cooperating, they can communicate to one another an ID to allow another developer to search for their constraint simultaneously. Especially given the open-source OSCAL community, this capability has a lot of value for communication around the models. Again, in-model constraints can be homed in one place, but apply everywhere. So stable ID lookup vastly simplifies developer flow and communication. None of this benefit is FedRAMP-specific.

Collision Detection During Development and Testing

As it pertains to the previous point and the following point, it is in our interest to give each constraint a stable ID during development for external constraints. If not, reliably detecting two similar or identical constraints and determining their precedence becomes very impractical without stable IDs. Our ability to work with the community and allow them to maintain their own independent constraints is complicated by the fact multiple constraint violations cannot be disambiguated without a specific stable identifier.

User Experience

Analysis and filtering of stable IDs

When constraints are violated and a conformant Metaschema processor outputs the default or optional custom <message/> value, the more metadata about each constraint violation, the better. As it stands, a tool can emit one or more Metapath locations in a document instance, the level, and the message. In the presence of dozens or hundreds of errors, this approach does not allow for complete analysis and filtering of error messages to the fullest. It is possible to some extent, but less granular. With stable @ids, one can filter constraint violations in development, ad hoc testing, or production use of a conformant Metaschema processor to incrementally handle targeted problems and not "boil the ocean."

As it stands today, in the metaschema-framework community implementation of oscal-cli we provide the @id when present for command-line output and nothing if undefined. When outputting SARIF (an OASIS standard for static analysis results), we cannot have a stable @id for those internal to OSCAL models because they are not defined but the SARIF file specification requires a "ruleId" (so, an identifier to the violated constraint in this instance). We currently generate a UUID when outputting a SARIF file because there are not more optimal approaches. Again, constraints can be homed in one place, and target many other places, so dynamically computing a "stable" ID from the location in the model or other approaches is not feasible or less likely to be human-readable.

Because the IDs are not stable, like we as community can support with intentionally defined @ids on all constraints as a stylistic requirement, IDs exist in some use cases but are not stable as they could be. This problem is exactly what community members reported to us, for both NIST RMF and FedRAMP use cases separately, but using the same tooling.

Realistic Testing with Minimal Viable Sample Documents

Related to the above, NIST or any other entity should be able to test document instances against their respective models with the minimal data required. A SSP or POAM is a complex document, and maintaining full documents for testing to surpress errors unrelated to the specific scope of testing and filtered output adds burden, for NIST maintainers and FedRAMP maintainers.

FedRAMP details challenges around whole document maintenance for test harnesses in this decision record, and the ability to filter on constraint by a stable ID and the number of constraint violations is not scalable otherwise.

Long-term Goals Dependent on Stable IDs

Self-Documenting Constraints like the Models

One of the boons of NIST's architecture for model and documentation maintenance is to provide inline documentation for the fields, flags, and assemblies of models across modules and generating the documentation site directly from these inline documentation fields. Similarly, we can do so with constraints. (As it stands, separate of @ids, constraints can support a formal-name, description, and remarks fields optionally. FedRAMP usage of underlying Metaschema model supports that now. By starting with @ids as a beachhead and improvements to NIST XSLT, Java, and other community conformant Metaschema processors, external constraints can be managed identically. This approach could allow for unified or even heterogenous documents that cross-link constraints and models published by NIST and other community extensions, not just FedRAMP. I updated this description to say this approach is a desirable long-term goal for the community, and searchable IDs are a very effective gateway to this goal, to search inter or intra documentation sites, based upon less explicit long-term vision in #2101 (reply in thread), but something is worth calling out.

Collision Detection and Filtering Allow Migration to External Constraints

To sum up all the benefits above, the most important long-term benefit based on the capabilities with stable IDs is our ability, NIST and the larger community, to incrementally externalize all RMF-based constraints, FedRAMP constraints, and all other use-case-specific constraints to be independent from the core models. Doing so would give all community users, regardless of one or multiple use cases, to more effectively pick and choose which set of constraints and requirements they meet at runtime. If they remain inside each respectively model, one can only extend or customize for a use case by making pull requests to the NIST models in usnistgov/OSCAL, rebuild tooling, rinse and repeat. Without the ability to filter test outputs and detect collisions as we (and I mean as a community) externalize constraints, we further risk that work and further hinder a long-term goal for "minimalist core, but flexible requirements per use-case" approach you and others have expressed desire for. I formalized this goal in #2050 as part of a long-term roadmap, but stable identifiers are a necessary prerequisite to derisk the development process. That, or the community will need to publish derivative models elsewhere and thoroughly test only in production use cases without due care in advance. I think we all want to avoid that, and we all have this shared long-term vision.

Conclusion

I tried to break down the short-term benefits into developer experience, user experience, and long-term capabilities for which this work is a dependency. I hope this explanation clarifies much of the detail and can help you make a decision on accepting the PR.

Answered by aj-stein-gsa

Feb 7, 2025

Well, it would seem re #2090 (comment) you are going to merge the related change in, aside from this technical explanation. I will mark this thread resolved for now.

View full answer

wandmagic · 2025-02-07T13:01:17Z

wandmagic
Feb 7, 2025
Collaborator

I think that adding formal name field on all constraints would be another great addition. The benefits of having identifiers present for constraints will help all organizations that leverage the constraints directly to develop oscal content.

0 replies

iMichaela · 2025-02-07T14:42:45Z

iMichaela
Feb 7, 2025
Maintainer

@aj-stein-gsa - Thank you for the explanation. I would like to clarify that NIST (core) OSCAL is defining constraints but we are not using IDs for the constraints today. NIST generates the OSCAL schemas and the promise and encouragement form day one of OSCAL release was to chose the format of choice. 80% of our adopters prefer JSON so pushing them to use metaschema (XML format) will do nothing but derail their current adoption process. Encouraging and providing other mechanisms for consuming a minimum number of core constraints with an ability to enforce domain specific constraints under specific namespaces is a better direction coming from NIST and we are looking forward to working with the community to mature the concept and implement best approach.
An OSCAL Extension model could cover constraints definitions. At that point, all constraints will be taken out of the models and provided externally - a solution @wendellpiez kept proposing and which I also endorse after discussing it with many global adopters and regulators.
In this context, namespaces for the constraints are also conceptually supported, but it might make more sense to to move forward only when they can be consumed by all OSCAL adopters (JSON, YAML, XML). Please note - the order of the doormats is based on the community preferences so far.
I would appreciate hearing from JSON OSCAL adopter.

1 reply

aj-stein-gsa Feb 7, 2025
Author

@aj-stein-gsa - Thank you for the explanation. I would like to clarify that NIST (core) OSCAL is defining constraints but we are not using IDs for the constraints today.

No one uses them at definition, they only use them in output, I hope that was clear.

NIST generates the OSCAL schemas and the promise and encouragement form day one of OSCAL release was to chose the format of choice. 80% of our adopters prefer JSON so pushing them to use metaschema (XML format) will do nothing but derail their current adoption process.

I am not sure I understand this point unless we conflate data format and schema, but I will try to speak to this point: Metaschema definitions can be in XML, JSON, or YAML, just like the documents we define and the schema outputted. Historically, they have been written in XML, but we can switch to JSON or YAML at any time. In fact, we are close to releasing an updated version of our FedRAMP website with a search box driven by the constraints file serialized into the JSON format, and it powers that search menu. When it is public, I can demo it here. So if JSON first is a consideration, I would say we are on the same page and that is easy to switch and no derailment is an issue.

If you want to see all those files, it took me more time to put file names and copy-paste the buffers than to convert the files into JSON. Here is the OSCAL SSP Metaschema model module, FedRAMP external constraints, and FedRAMP external allowed values all in JSON.

https://gist.github.com/aj-stein-gsa/10481990ae581c764a7b93e841fb7965

As far as schemas go, constraints are still used to define some of the more basic utility elements that are serialized into XML Schema and JSON Schema. We all use them. Ignoring them means finding an alternative method to encode and serialize them, that will also derail adoption (I am thinking specifically of matches and has-cardinality, for type subclassing and cardinality, respectively; to not manage them properly means manual management of those salient features in downstream schemas by hand, and at that point you might as well throw away much of the OSCAL progress to reinvent alternative solutions; we can discuss this point further if you'd like with examples but it is very important to not miss).

Encouraging and providing other mechanisms for consuming a minimum number of core constraints with an ability to enforce domain specific constraints under specific namespaces is a better direction coming from NIST and we are looking forward to working with the community to mature the concept and implement best approach. An OSCAL Extension model could cover constraints definitions.

Excellent, glad we are on the same page here. I hope given the explanation above, it becomes clear a JSON first transition would be vastly benefitted my including IDs on constraints (model elements like assemblies, fields, and flags have @names after all), so this is the on disparity where incremental relocation and migration could be expedited.

At that point, all constraints will be taken out of the models and provided externally - a solution @wendellpiez kept proposing and which I also endorse after discussing it with many global adopters and regulators. In this context, namespaces for the constraints are also conceptually supported, but it might make more sense to to move forward only when they can be consumed by all OSCAL adopters (JSON, YAML, XML). Please note - the order of the doormats is based on the community preferences so far. I would appreciate hearing from JSON OSCAL adopter.

Unless there are some other non-public community discussions, it would seem my colleagues and I are the most prevalent adopters and we are JSON heavy in our team too, even if you see XML artifacts in the repository. I too would like to hear from others, but to return to IDs and the necessity of this explanation: I did all this because of one community adopter thus far. I would like to hear from more though.

iMichaela · 2025-02-07T14:56:52Z

iMichaela
Feb 7, 2025
Maintainer

I am supper excited we have consensus in terms of taking the constraints out of the models and better identifying the core ones, vs RMF ones, vs FedRAMP ones. Unfortunately, we have very different opinions of how to do it and the global community's perspective will be crucial. NIST is committed to support all OSCAL adopters (JSON, YAML, XML) not just XML/Metaschema ones, because this is the promise we made when we launched OSCAL and the reason for NIST team developing the Metaschema as the internal mean of delivering consistent XML, JSON and YAML schemas from one central set of definitions.

4 replies

aj-stein-gsa Feb 7, 2025
Author

Unfortunately, we have very different opinions of how to do it and the global community's perspective will be crucial. NIST is committed to support all OSCAL adopters (JSON, YAML, XML) not just XML/Metaschema ones, because this is the promise we made when we launched OSCAL and the reason for NIST team developing the Metaschema as the internal mean of delivering consistent XML, JSON and YAML schemas from one central set of definitions.

So you want to maintain the same functionality and capabilities across the data formats, just without Metaschema? So far, the community has said nothing on this topic, but I would like to hear more from you and others about it. If you want NIST to move forward with a different architecture hypothetically, that is a large undertaking and contrary to the resources you had and have. We are very much aligned, I think, about the way forward. The serialization format (be it JSON YAML or XML) is the easiest part to manage. Can you explain how we, be it AJ the individual or FedRAMP, do not support all three formats simultaneously with specific detail? I feel I must be missing something obvious.

iMichaela Feb 7, 2025
Maintainer

Unfortunately, we have very different opinions of how to do it and the global community's perspective will be crucial. NIST is committed to support all OSCAL adopters (JSON, YAML, XML) not just XML/Metaschema ones, because this is the promise we made when we launched OSCAL and the reason for NIST team developing the Metaschema as the internal mean of delivering consistent XML, JSON and YAML schemas from one central set of definitions.

So you want to maintain the same functionality and capabilities across the data formats, just without Metaschema?

I wrote: " NIST is committed to support all OSCAL adopters (JSON, YAML, XML) not just XML/Metaschema ones, because this is the promise we made when we launched OSCAL and the reason for NIST team developing the Metaschema as the internal mean of delivering consistent XML, JSON and YAML schemas from one central set of definitions." If anyone choses to use the OSCAL metaschema definitions, we have no reason to not support them, but we will never push the JSON and YAML adopters to shift to the OSCAL metaschema definitions. Their choice speaks loud to us that XML is not their strength, preference, or supporting technology. It does not matter that I go to XML first. It matters to NIST what the global community needs/wants.

aj-stein-gsa Feb 7, 2025
Author

OK I will also repeat the part I did not say verbatim last time. It seems your position rests upon a fundamental misunderstanding of the Metaschema specification, the NIST version or any other.

Metaschema is not an XML data format, it is an information modeling format that supports all three data formats at this time: XML, JSON, and YAML to cross-reference them. Whether you see it as an internal mechanism or not, that does not change the fact there is higher fidelity data in the constraints. If anything, if you do not encode those constraints in a machine-readable way and expect others to implement against them, at the very least you want to generate documentation that is structured or machine-readable for others to code their own implementations. I will update my original explanation to include one of our other stretch goals: directly generating constraint documentation from the constraints themselves, like the models.

If anyone choses to use the OSCAL metaschema definitions, we have no reason to not support them, but we will never push the JSON and YAML adopters to shift to the OSCAL metaschema definitions.

Again, this comment is distracting from the main issue. Metaschema definitions are serializable in JSON YAML or XML. Did you look at the examples above? It would seem not, but I want to make sure. If you only provide XML Schema and JSON Schema and want to obscure Metaschema source data to the community, there is a significant burden to hand-write much of the documentation for models that is not extant in the NIST documentation site. How would that work?

Their choice speaks loud to us that XML is not their strength, preference, or supporting technology. It does not matter that I go to XML first. It matters to NIST what the global community needs/wants.

Again, Metaschema is not "just XML" and I am not sure how to make that more clear. If the global community wants and needs something else, they can speak up, but the community has fallen relatively silent in the last year. I am trying to help build a mechanism for us to work on a shared goal on their before. If serializing things to JSON or YAML first would help, I am all for that, but there is a fundamental misunderstanding to what Metaschema does for NIST maintainers and downstream consumers. Perhaps we need to discuss that further.

Is any of this helping motivate the PR review and will move forward the work?

aj-stein-gsa Feb 7, 2025
Author

Well, it would seem re #2090 (comment) you are going to merge the related change in, aside from this technical explanation. I will mark this thread resolved for now.

Answer selected by aj-stein-gsa

iMichaela · 2025-02-07T17:13:37Z

iMichaela
Feb 7, 2025
Maintainer

I do not have time to entertain any argument with you over your understanding of my understanding of Metaschema. I stated that it is used to define JSON, and XML schemas from a single source which defines OSCAL schemas, and I disagree with your statement that everyone needs to use Metaschema definition files if they want to use OSCAL. THIS IS A DISCUSSION ABOUT CONSTRAINTS' IDs (PERIOD). I would appreciate it if you keep the dialog focused on the technical issues of the topic at hand.
If you want to continue the dialog focused on me - please send a direct email, and I will respond when and if I find it constructive.

1 reply

aj-stein-gsa Feb 7, 2025
Author

I am confused by the replies and the tone with caps and bold sentences. Where did I say everyone must use not, as opposed to can or may, use it? I am asking about the qualifier, not just my statement about using it. I have tried thoroughly explain why the constraint IDs, even if they do not end up in schemas, have material value to near-term and long-term management of information modeling and maintenance of the models by NIST maintainers. Most of my responses were to address feedback about underlying assumptions and explain the maintenance cost, with or without IDs, is complicated by downplaying the value of the constraints for maintainers and contributors to the models.

So, to get back to the task at hand, has the explanation actually helped motivate a complete review of IDs and a final decision? If you do not want to receive them, that is fine. We are a community, they can get published elsewhere. I am fine with that approach if it would make you more comfortable. I really try to make the discussion of these topics constructive for you, as a NIST maintainer, to understand burden of decisions. If that is not welcome, that is fine. I wrote a very detailed explanation to aid you in a review at your request, but it seems most of what I do as a community member here is not seen as constructive, I can take it elsewhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On in-model constraints, external constraints, and IDs #2101

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

On in-model constraints, external constraints, and IDs #2101

aj-stein-gsa Feb 7, 2025

Refresher

Developer Experience

Adding or changing constraints and developer flow

Collision Detection During Development and Testing

User Experience

Analysis and filtering of stable IDs

Realistic Testing with Minimal Viable Sample Documents

Long-term Goals Dependent on Stable IDs

Self-Documenting Constraints like the Models

Collision Detection and Filtering Allow Migration to External Constraints

Conclusion

Replies: 4 comments · 6 replies

wandmagic Feb 7, 2025 Collaborator

iMichaela Feb 7, 2025 Maintainer

aj-stein-gsa Feb 7, 2025 Author

iMichaela Feb 7, 2025 Maintainer

aj-stein-gsa Feb 7, 2025 Author

iMichaela Feb 7, 2025 Maintainer

aj-stein-gsa Feb 7, 2025 Author

aj-stein-gsa Feb 7, 2025 Author

iMichaela Feb 7, 2025 Maintainer

aj-stein-gsa Feb 7, 2025 Author

aj-stein-gsa
Feb 7, 2025

Replies: 4 comments 6 replies

wandmagic
Feb 7, 2025
Collaborator

iMichaela
Feb 7, 2025
Maintainer

aj-stein-gsa Feb 7, 2025
Author

iMichaela
Feb 7, 2025
Maintainer

aj-stein-gsa Feb 7, 2025
Author

iMichaela Feb 7, 2025
Maintainer

aj-stein-gsa Feb 7, 2025
Author

aj-stein-gsa Feb 7, 2025
Author

iMichaela
Feb 7, 2025
Maintainer

aj-stein-gsa Feb 7, 2025
Author