What’s this thing called CMIS? Part 2
Domain Model I, Repositories, Types and Properties
In the first part of the series we learned about the general structure of the standard, its idea and purpose. Now it is time to take a closer look at the core of the standard the CMIS Domain Model. The Domain model describes the services, their methods and the fundamental data structures.
Older articles in this series are:
Part 1: Overview, domain model and bindings
If you want to use CMIS you always the first object you have to deal with is a repository. A repository is the topmost object in CMIS and can be seen as a data store. One CMIS implementation can maintain multiple repositories. Different repositories can contain different kind of data, or they can contain data stored in different physical locations. Different repositories may also have different capabilities (for example because they are optimized for a specific need). They even can be a virtual entity unifying different physical repositories under one umbrella. All this is up to the vendor and CMIS implementer, the standard makes no assumption about where one repository differs from another. But each CMIS implementation must have at least one repository (identified by a repository id). Repositories are accessed in CMIS using the RepositoryService. A client always needs a URL to access a CMIS server. For the WS (web service) binding you need to get a URL for each CMIS (web) service. For Atom you only get one URL identifying the AtomPub service document. For AtomPub you can easily start just using a browser. For example if you setup OpenCMIS from the Apache Chemistry project you get a default repository for testing purposes that you can deploy as a .war application in a servlet container like Tomcat. (Well to be precise there are two different servers: one that acts on a file system and the other is just an in-memory server). If you build OpenCMIS you will find a file chemistry-opencmis-server-inmemory-0.1-incubating-SNAPSHOT.war. For easier handling we will rename it to opencmis.war and copy this to the webapp application directory of the Tomcat installation.
Now open a browser and type in the address field:
If everything is okay the browser will reply with a dialog box what to do with the data of type application/atomsvc+xml. Just save the response to a file named opencmis.xml and open it in a text editor. You will notice a standard XML document containing entries for each repository (in this case only one). This XML contains a number of links that you can use to access the information in this repository and to make use of the other CMIS calls. If you like to continue along this path there is an excellent tutorial from Jeff Potts: Getting Started with CMIS.
If you want to test the web service try to enter the following URL in your browser:
You will get the WSDL for the RepositoryService as response. You can use this WSDL to import it in a tool of your choice that typically will generate a client stub for a target platform.
Great, now you have made your first CMIS calls, let’s look a bit deeper what is available!
You also can ask a repository about its capabilities. Not all repositories provide the same functionality, but a CMIS client should be able to discover if it can interact with a given repository. CMIS supports a wide range of potentially very different repositories (even a file system can be mapped to CMIS). But many clients will require a minimum set of functionality to operate. We will explain the supported capabilities when we discuss the corresponding functionality in the next sections.
Now we have access to a repository. If we want to be able to store millions of documents there and efficiently retrieve them later we need to define some structure to manage all this information. Most repositories use types as one mechanism to structure this information. If you look at the local hard disk of your computer you see the familiar hierarchical file system consisting of files and folders. A typical ECM system provides more mechanisms to differentiate between various kinds of files and folders. Files and folders (and there are even more objects) have a type. A type indicates what kind of an object contained in a repository is and it has properties that can be filled with values. You already know some properties from the file system like creation date, or the time it was last modified. These system properties are supported by CMIS as well. In addition a user (or in most cases some kind of administrator) can assign user defined properties to an object. Think about an email for example: An email has a subject, a sender and a list of recipients. An invoice in an ERP system might have an invoice number, an associated order number, an amount and a customer number. Because all emails are similar and all invoices have the same structure we can define a type email and a type invoice. Such a type definition can enforce constraints. For example we can define that every email must have a sender and may have a subject. For the invoice we may even be more restrictive and define that it must have an invoice number and that this number must not be shorter than a min and must not exceed a max length. Each property has a type and a set of possible additional constraints. CMIS supports the following property types:
Each property type has a set of possible constraints. Each type definition contains the actual constraints for its properties. In addition a property can be assigned to be mandatory (must be provided by a client), to be read-only or modifiable. The queryable attribute on a property indicates whether you can use it in a query. CMIS also supports that a property value must be one element of a fixed or an extensible list of predefined values (like property color must have a value of red, green, yellow, blue or black). These field values can even be hierarchical like in fruits/apples, cherries, strawberries; vegetables/tomatoes, cucumbers, beans. A property can be single-valued or multi valued (for example an author property of a type book may allow multiple authors). Each property and each type has an id, a name and a description. This list is not complete, property and type definitions are more complex than described here, but you should have got an impression what you can do with types and properties in repositories. Types can enforce certain rules on your data in your repository and they also provide means to efficiently find objects in the repository (which we cover later). For further details about types and properties take a look at the CMIS specification.
It is important to note that CMIS in the current version does not allow you to create type definitions from a client. You either have to use the vendor specific API of your repository to create a type definition or you have to use an administrative user interface to create a type definition. CMIS does however expose all the available types that currently exist in a repository. The methods to do this are contained in the RepositoryService (for example getTypeDefinition). Each type definition has a unique id within the repository and each property definition has a unique id within the type definition.
CMIS also has some predefined types. There is for documents cmis:document and for folders cmis: folder. Those types are generic and do not have any additional properties (except the standard properties that exist on every type). Other predefined (optional) types like cmis:relationship and cmis:policy will be covered in one of the next articles. The cmis: prefix indicates that these type definitions are part of the standard. This namespace is reserved for future extensions of the standard and must not be used for any other purpose.
Imagine you have stored thousands of invoices in your repository and now the government introduces a new law that each invoice must have a tax number included. No problem you just add a new property “TaxNumber”. But on one hand you would like to make the new property mandatory because it is enforced by law, on the other hand you already have thousands of invoices from the past without such a number. You could create a new type definition for the new invoice but this makes things complicated if you search for invoices. You suddenly have to deal with two invoice types instead of one which is confusing for end users. For situations like this (and many others) it is convenient that you can derive a type from an existing type. So you can introduce a type “InvoiceWithTaxNo” that inherits everything from “Invoice” but adds an additional mandatory field for the tax number. Because an object of type InvoiceWithTaxNo behaves exactly like Invoice a query for invoices can include the subclasses. This mechanism is called type inheritance and CMIS allows types to be inherited. The structure describing the type definitions indicates the parent types. The topmost parent is called base type and is always one of the predefined types cmis:document, cmis:folder, cmis:relationship or cmis:policy.
Folders are a second mechanism to structure information in a repository. Files and folders are a very common structure from file systems. Each folder in a repository has exactly one parent folder with the exception of the root folder. The id of the root folder can be retrieved from getRepositoryInfo(). The navigation service contains methods to browse the folder hierarchy. Compared to a file system folders in CMIS are typed (see above) and so have properties. A folder also can impose additional constraints, for example allow only objects of certain types as children. CMIS does not enforce that all children in a folder have a unique name. This is up to an implementation to enforce this constraint or not.
Documents are the main entity managed by CMIS. Each document has a type. Whether a document must be contained in a folder or not is implementation specific (unfiling support). In contrast to a folder a document can be contained in more than one folder. Again this is an optional feature (multi filing support) of an implementation. Whether multi-filing or unfiling is supported can be determined from the capabilities in a getRepositoryInfo() call. A document can have content but this is not a requirement. The document type indicates whether content can be attached or is mandatory (contentStreamAllowed property). The document type also indicates whether a document can exist in multiple versions (versionable property). Repositories behave differently in regards how content of an existing document can be changed. This information can be retrieved from the capabilities returned from a getRepositoryInfo() call (capabilityContentStreamUpdatability).
Some repositories also support more than one content stream per document. This can be used for more complex documents (e.g. different chapters), multiple scanned pages or other additional information like signatures, annotations, etc. This is currently not supported by CMIS (there is one exception which we will cover in the rendition chapter). A content stream has a stream id, a MIME type and a length.
Documents and folders together with other objects are created, accessed, manipulated and deleted with the ObjectService in CMIS.
Filed under: CMIS, ECM | 11 Comments
Tags: CMIS, ECM, Repository