Tuesday, July 20, 2010

New Graft Feature: XSD Exports

Graft now has the ability to generate XSDs for any model. Visit the export tab of your chosen model.

THE DECISION

We have chosen to generate a ComplexType and an accompanying element of the same name for each class that doesn't represent an xsd primitive type. Each class association is represented within the ComplexType definition as either an element or an attribute depending on whether or not the association's type is one of the xsd primitive types. We optionally support a namespace, but at least at this stage further external namespaces will need to be added to the schema post generation.

THE DEBATE

XSD generation is a feature that has caused a fair amount of debate at Jodoro. This is because the XSD 1.1 standard provides many different ways to represent the same concepts.

For example, the Global Justice Data Model defines all of its entities and their associations in ComplexTypes and then defines an element of the same name to take each type. Each ComplexType tends to contain a ComplexContent which in turn may extend an appropriate ComplexType and/or define associations to elements of the defined ComplexType. This model provides maximum flexibility because a valid xml schema could contain any chosen subset of the defined elements. This is particularly useful when many different types of software systems need to consistently communicate with each other.

In contrast the CellML and FieldML schemas define very few top level elements (in fact one), and tend to fully define associations between ComplexTypes within the ComplexTypes themselves. This approach allows for a more formally structured approach to "valid" xml structures, which can be helpful in sharing information between very similar systems. Even these two very similarly structured xsds differ as FieldML does not use any namespacing and treats everything as a ComplexType while CellML utilizes the "cellml" namespace and defines both ComplexTypes and SimpleTypes.

The Schools Interoperability Framework (SIF) defines an element for each ComplexType, but demonstrates yet another structurally different way to build an xsd by defining very few named ComplexTypes and creating many nameless ComplexTypes in their definitions. This directs the consumer's focus to the named ComplexTypes, but comes at a cost of comprehension of the ComplexType definitions (and results in many nameless classes in the domain model). Like CellML, the SIF standard also defines SimpleTypes and attributes differentiating them from ComplexTypes and elements by whether or not they extend the xsd primitive types.

Beyond these structural differences we also needed to contemplate whether and how we support concepts such as enumerations. The issue for us is that this concept blurs the borders between meta-data and data. Technically each enumeration value is one of the valid instances of the enumeration type. It is tempting (and very common) to define enumerations within xsd schemas, particularly when the values are unlikely to change. However we would argue that even if the values won't change a better approach is to define the enumeration as a code of type string or integer, and to store and maintain the valid values outside of the schema. This provides a cleaner separation between structure and business rules.

We also discussed and debated many other commonly used xsd concepts such as "pattern", "maxlength", "union", "choice", "key", "any" and "all". At this stage we have chosen to leave all of these out as they are defining concepts that we are not currently explicitly representing within our domain modelling tool, and because we feel that, like enumerations, many of their uses are often business rules and arguably shouldn't be defined in the schema. If you have suggestions or issues with our current approach, please email myself or support@jodoro.com.

Doug - @douglasenglish .

No comments: