ChiMu  
 
Menu Edge About   Products   Services   Projects   Publications  
  Projects > MONDO       TOC          

MONDO for SGML Developers

v0.1 [mlf-971118]

Table of Contents

Overview

This document provides an introduction to MONDO for people familiar with SGML. It will focus on what MONDO does in terms of SGML and how MONDO differs from SGML. This document is supplemental to the MONDO architecture and specification documents, and it should be read in conjunction with them.

MONDO has three major subsystems: the DomainModel, the ObjectBuilder, and the ObjectEncoder. These three subsystems have corresponding parts in SGML as the GroveModel, the GroveBuilder, and the GroveWriter (conceptually). The following sections will compare and contrast MONDO’s DomainModel with SGML’s GroveModel, and MONDO’s ObjectBuilder with SGML’s Parser and GroverBuilder.

Undisplayed Graphic


DomainModels, DTDs, and Groves

Both MONDO and SGML have models of the information they work with. The core differences are the generality of those models and what type of information they can usefully work with.

SGML actually has two models. The first is the DTD itself. This describes and constrains the possible elements in a document and their relationships. The second is the Grove model. This provides an abstract data model for both DTDs and their instances. The Grove formalizes how SGML applications can be written to work with SGML document information. Both of these models are document-oriented in that certain characteristics of document-oriented information are assumed within the model. The most salient examples are the pure containment hierarchy among elements and having only one attribute (‘content’) allowed to have complex information as its value. When dealing in the domain of documents, these restrictions are not particularly inconvenient.

MONDO uses general object-oriented information models instead of a domain specific model. Object-oriented information models can be very general, expressive, and understandable. This allows them to model many types of information equally well, which has resulted in an abundance of good analysis, patterns, and specific models being done with them. MONDO wants to model both document-oriented information models and more general models, and using OO modeling works very well for those needs.

Undisplayed Graphic

MONDO uses a very abstract information model: the only assumption is that the domain model will be based on objects and that MONDO is building (or encoding) an instance of that domain model. The domain model could be identical to SGML’s simple grove model or it could be composed of many associations and sophisticated semantics. MONDO provides interfaces for working with a general object model and in no way restricts the sophistication of the model. MONDO’s information model is any object model and MONDO can take advantage of all the tools and techniques for analyzing, designing, and implementing those models.

Why a more general model?

The most important reason MONDO uses a more general model is to better support working with information. Many types of information are difficult to encode and process with the document-oriented restrictions in SGML. At the minimum they are contorted and unnatural, so during both the encoding (e.g. human entry) and processing stage extra work has to be done which would not be required if the modeling capabilities were better. This becomes especially noticeable as complexity (breadth and depth of knowledge) goes up, but can also be shown with the simpler examples.

Assume we have a concept of a Date and we can create Date objects for a specific date through:

<Date iso="1997-11-03">

Next we decide we want to create Periods which are between two dates. In an OO model this should be as simple as:

<Period start=<Date ...> end=<Date ...> >

The important part isn’t the syntax, but the easy composition of smaller pieces and simpler models into larger and more sophisticated ones. Because SGML is document-oriented and presentation-model focused it makes these more sophisticated models harder to build.

Presentation Models vs. Domain Models

SGML has been crucial to enabling document-oriented applications to progress away from processing-oriented markup to information markup. MONDO tries to continue this progress by going beyond SGML’s focus on a Presentation Model to a full information Domain Model.

A Presentation Model describes how information looks and how you can modify those looks. A Domain Model describes the semantics of the information itself: the objects, operations, associations, and rules that are independent of any particular view. Presentation Models need to be derived from the Domain Model so that a particular view does not restrict the potential uses for the information. Conversely Domain Models must satisfy the requirements for what information is needed by all the different presentation models.

Undisplayed Graphic

Domain models have a larger focus and tighter informational constraints. For example, a domain model must be concerned with normalization: any piece of information should only be stored in exactly one place. This allows changes to the information without the risk of inconsistency (a Company having an out of date name in a particular context). Presentation models are concerned with the needs and restrictions of presenting information, which much be traded with the purely informational needs.

An example of a presentation model is a Form. The model behind a Form will order and consolidate information from many different places so it is easy to present to the user and for the user to interact with. This certainly is definitely not processing oriented: we are only describing the information necessary for the Form which can then be used for multiple applications behind the scenes. But the actual information model is not the Form’s model. For example, the Form may pull together multiple fields for a person’s information, a description of the item to be purchased, and the salesperson’s name. The domain model might have a PurchaseOrder that is associated with a Person, the item, and the salesperson. Most of the fields in the Form are really views onto information derived from these associations and none of these fields are owned by the PurchaseOrder itself.

Undisplayed Graphic

Available Techniques and Tools

A tremendous benefit of using an OO information model is all the tools and techniques available for working with these types of systems. These include modeling tools, analysis patterns, presentation frameworks, databases, CORBA, and OO programming languages. Although SGML has produced a number of excellent tools, it can not take advantage of main-stream tools without SGML-specific customization. Having a more limited information model causes needless segregation.

Validation

Because MONDO focuses on the information model over the specific encoding, validation must occur during the building stage. Only the DomainModel can know if data satisfies the model’s semantic rules, and any preliminary checking by the PresentationModel is only posing presentation-specific constraints. This can be both overly restrictive and under-restrictive. For example, if the previously mentioned period must have ‘start’ and ‘end’ parameters that are of a Type date, any objects that satisfy that Type can be used. It does not matter what "tags" they use:

<Period start=<Date ...> end=<Date ...> >

or

<Period start=<TimePoint> end=<Reference> >

as long as the constructed object qualifies as a Type of Date. That is all the information model requires but presentation restrictions (e.g. from an SGML DTD) could limit what tags were used without understanding the implications.

As the other, under-restrictive, example, a Period constructor of:

<Period start=<Date iso=’1997-11-05’> end=<Date iso=’1494-11-05’> >

is probably wrong in spite of satisfying the simple presentation restriction of using <Date> tags.

Summary

MONDO uses very general and descriptive object-oriented information models instead of the document-oriented DTD and grove models. This provides more expressive capabilities, focuses on the information instead of the presentation, and allows the information rules to be placed within the domain model instead of the presentation restrictions. By generalizing the model, MONDO can have more potential capabilities and also immediately have more high-quality tools to support those abilities.

Recipes and Architectural Forms

MONDO uses the term recipe to describe the instructions for building information. All the information that is placed into the DomainModel by MONDO is the result of building recipes. By formalizing recipes we separate the encoding of information (e.g. whether it is human readable and how to parse it) from what information is in the encoding and how to use that information to construct the knowledge in a form we want to work with.

MONDO’s recipes can be viewed as an SGML architecture with only two core tags:

<!ELEMENT   BuildObject - - (Parameter)*>
<!ELEMENT   Parameter   - - (BuildObject)>

<!ATTLIST   BuildObject
    type    CDATA       #REQUIRED
>

<!ATTLIST   Parameter
    name    CDATA       #REQUIRED
>

So all documents reduce to the form:

<BuildObject type="">
    <Parameter name="">
        <BuildObject type="">
            <Parameter name="">
    <Parameter name="">
        <BuildObject type="">

As an architectural form, the MONDO recipe structure is encoded as attributes:

<!ATTLIST anObjectElement
    mondoElementType (BuildObject | Parameter)      #FIXED BuildObject
    mondoTypeName    CDATA       "TypeName"
>
<!ATTLIST aParameterElement
    mondoElementType (BuildObject | Parameter)      #FIXED Parameter
    mondoParameterName    CDATA       "ParameterName"
>

And with actual tags (where a lowercase element is a parameter) we would have the following encoding and the corresponding recipe:

<Period>
    <start>
         <Date>
    <end>
         <Date>

Undisplayed Graphic

This separation of Objects from Parameters provides a structure that makes general information representation easier. We can consistently reuse Object tags in many different contexts and the Parameter tags specify the relationships among the information. This information can then go into the build process and be used in creating very flexible and descriptive models.

Primitive Data Types

Although the BuildObject and Parameter tags are sufficient for encoding very general information they are inconvenient for two common types of information: Strings of characters and Lists of Objects. Because these types are so common we will probably want element types for them:

<!ELEMENT   BuildObject - - (Parameter)*>
<!ELEMENT   BuildList   - - (BuildObject)*>
<!ELEMENT   BuildString - - (#PCDATA)*>

<!ELEMENT   Parameter   - - (BuildObject | BuildList | BuildString)>

This allows us to encode the following more easily:

<List>
    <Holiday>
         <name><String>New Year</String>
         <days><Period><start><Date><end><Date>
    <Holiday>
         <name><String>Summer</String>
         <days><Period><start><Date><end><Date>

Attributes as Parameters

Notice that

    <Holiday>
         <name><String>New Years</String>

looks very similar to:

    <Holiday name="New Years">

For MONDO’s builder they should be treated identically. MONDO unifies attribute values with "parameterized" ‘content’ so it does not have to worry about how information was encoded, just what that information means.

Content Models in terms of Parameters

By separating Objects from Parameters we have dramatically simplified the possible content models available in an SGML document. Because content models are the main description of the information model in SGML, this simplification seems like it would cause information to be lost. Generally that is not the case and the recipe will contain the same amount of information either in the original form or in a newer form (possibly with a new tag). Another possibility is that the SGML DTD described presentation/input restrictions that had nothing to do with the actual information. MONDO recipes and DomainModels rarely contain presentation restrictions and so they would be removed without losing the information itself.

Content models are composed of tokens, connectors (‘&’ ‘|’ ‘,’), groupings, and occurrences (‘?’ ‘*’ ‘+’). These combined in different ways can describe very different information models. The following sections will give default translations to the MONDO recipe form of the information for each of these types of connectors and occurrences. This will provide an example of how the encodings are related at the information level.

And ‘&’

The SGML and (‘&’) connector (or the desire to use it in XML where it is not available) is frequently the sign of a parameter type of relationship and will remain in that form with MONDO. In the example:

<Period>
    <start><Date>
    <end><Date>

The ordering of the parameters is irrelevant. It may be convenient to consider ‘start’ to be first, but no less information is available if the order is reversed.

<Period>
    <end><Date>
    <start><Date>

In MONDO recipes, parameter order is never significant. This makes the ‘&’ connector the natural choice, but it also does not prevent arbitrary ordering being required (for possibly user consistency). You may need to add parameter tags to identify the roles of the and-ed elements in the construction of the containing object/element.

Or ‘|’

The or (‘|’) connector can be among parameters or among objects. Among parameters it is usually just a simplification: instead of using an and and an optional indicator you specify an or relationship and allow a parameter to be specified multiple times but expect them only to be specified once.

Among objects, the or connector is frequently the sign of multiple data types being available for a given parameter. Or said differently, it is the sign of an abstract data type that encompasses multiple more-concrete data types. For example, if we had both a <Date> and a <TimePoint> (where a TimePoint is a precise Date and Time), we could want <Period> to work for both:

<Period>
    <start><TimePoint>
    <end><TimePoint>

MONDO resolves type validation at the Build stage, not the encoding stage. This allows different recipes that build the same type of object to be interchanged without having them all be explicitly enumerated. SGML is more restrictive at the encoding stage (unless used with ANY content), so frequently you will have to provide abstract type entities that enumerate (and or together) all the concrete elements.

<!ENTITY % Type.Date    "Date | TimePoint | DateReference">

<!ELEMENT Period                   - - (start & end)>
<!ELEMENT start                    - - (%Type.Date)>
<!ELEMENT end                      - - (%Type.Date)>

Seq ‘,’

A seq (‘,’) connector could either be an arbitrary decision over an and connector for user consistency or it may be part of a complex content model that has unnamed internal object. In the first case, the seq connector can still be used but is among paramaters as in the and connector, and will be ignored by MONDO. In the second case, it would be better to replace the unnamed object with an explicit one. For example in the DocBook DTD we have forms like:

(((%component.gp;)+, RefEntry*) | RefEntry+), (%nav.gp;)*)

And for MONDO you would normally want to explicitly represent the inner groups so the information is more fully described and workable within the DomainModel.

Opt (‘?’)

The opt (‘?’) occurrence indicator is usually a sign of and applied to parameters. Parameters are frequently optional because the builder can construct the object with a default value if a parameter is missing or the builder may construct completely different objects depending on what parameters are present. An example of the later would be <Date> and <Date iso="1997-10-13">. A recipe for a <Date> without any parameters would indicate to build today’s date as opposed to a specified date.

Rep and Plus (‘*’ ‘+’)

The rep (‘*’) and plus (‘+’) occurrence indicators are part of lists.

Full conversion example:

As a full conversion example, consider the DocBook element for Bibliography:

<!ELEMENT Bibliography - - (DocInfo?, (Title, TitleAbbrev?)?,
      (%component.gp;)*, (BiblioDiv+ | BiblioEntry+)) >

This might be converted into the following Object and Parameter separations:

<!ELEMENT Bibilography - - (info? & title? & pretext? & entries)>

<!ELEMENT info- - (DocInfo)>
<!ELEMENT title- - (%Title.Types;)>
<!ELEMENT pretext - - (ComponentList)>
<!ELEMENT entries - - (%Type.BiblioItemLists;)>

<!ENTITY % Type.BiblioItemLists "BiblioDivList | BiblioEntryList">
<!ELEMENT BiblioDivList - - (BiblioDiv+)>
<!ELEMENT BiblioEntryList - - (BiblioEntry+)>

<!ELEMENT ComponentList - - (%component.gp;)*>

Obviously this expanded the original specification quite a bit. Most of this expansion was the result of implicit information in the DTD that an application would have to make explicit to work with the document. For example that a "BiblioDiv+" is a list of BiblioDiv items. There is nothing in the DTD that requires that interpretation (e.g. the first BiblioDiv could be just as well attached to the preceding Title element), but we should probably not leave that up to an application to guess at. The Object and Parameter form is much more explicit about information, which requires more elements when viewing any local structure.

Viewed in a larger context the Object and Parameter form may not be as significantly expanded. The ‘title’, ‘info’, and ‘pretext’ elements could be reused in multiple contexts, which would leave only the ‘entries’ parameter and the explicit ‘Lists’ as new additions. Finally, the ‘&’ connectorsare optional and could still be done as ‘,’ connectors to force an order on the entry (or for XML).

<!ELEMENT Bibilography - - (info? , title? , pretext? , entries)>

Summary

MONDO’s parsing and building process can be considered an SGML application with a very simple architectural form. This architectural form separates elements that specify to build objects from the elements which describe parameters (a.k.a. ingredients) required for building objects. This organizes and more explicitly documents the information model. It allows reusing Object tags in many different contexts and uses the Parameter tags to specify the relationships needed among the objects during the building process.

This document is exposing some of the concepts of MONDO from an SGML perspective and not describing how MONDO actually works with SGML/XML as the parser/encoder front end. MONDO has to work with existing SGML documents, which requires functionality that is not described here. For example, MONDO uses a default model for SGML documents where each object has a ‘content’ attribute. Similarly, SGML Elements have to have identical content models, which leads to conflicts with simple parameter names. The full description of how MONDO uses SGML/XML as a front end parser is in the architecture and design documents.

Conclusion

MONDO is a small but significant continuation of the progress from processing-oriented markup to information modeling that SGML made happen. MONDO continues this progress by:

  • Using very general and descriptive object-oriented information models
  • Focusing on domain rules instead of presentation restrictions
  • Separating information "recipes" from the encoding used to represent them

MONDO might best be described in relation to SGML as converting Charles Goldfarb’s quote to:

"Markup should describe information rather than specify the processing to be performed on it."

Note that the HTML version of this document does not have footnotes. Please see the PDF version for the references.

 
Projects > MONDO  TOC