What’s this thing called CMIS?
The CMIS standard is now official; many companies have announced supporting this initiative including all leading vendors of ECM systems. There is circulating a lot of information around the standard about who supports it, how it will influence the ECM industry, potential use cases and so on. But there is very few information available about what the standard is, what you can do with it and how it technically works. There is the official specification of course, but this not the right level for getting a first impression and not everyone is willing to read through the 236 pages. So I guess it is time for an article that focuses on the technical aspects without digging too much into details of wire protocols or source code (we might use it here and there for clarification but this is not the focus). This first part is the start of a series of giving an overview about CMIS. Hopefully you get an idea at the end what you can do with CMIS and whether it makes sense for you to buy into it.
Enjoy it and of course I am interested in getting feedback. Here an overview about the parts (subject to change)
Part 1: Overview, domain model and bindings
Part 2: Domain model part 1, Repository, Types, and Properties
Part 3: Domain model part 2, Versions, Relationships, Query
Part 4: Domain model part 3, Renditions, Permissions, ACLs, Policies
Part 5: Extensibility and Protocol differences, Compatibility
Part 6: Summary, Future Enhancements and Outlook
Part 1: Overview, Parts of the standard, model and protocols
CMIS is about interoperability between ECM systems. So let’s first take a step back and have a look what an ECM system is: Wikipedia defines it as “strategies, methods and tools used to capture, manage, store, preserve, and deliver content and documents related to an organization and its processes”. What does this mean? Often we differentiate between structured and unstructured information. Structured information consists of data that repeatedly occurs in very similar or identical forms. If structured data are processed paper-based, often forms are used. They keep their structure over long periods of time. Addresses and postal codes are examples, an inventory management that structures storage locations according to different capabilities and their physical location or a catalog of goods dealing with articles, article numbers, availability, prices and so on. Access to data often is done by queries that occur over and over again in the same form. Structured information can be handled efficiently in relational databases and over decades software has been built around managing all kinds of such structured information.
Many business processes however deal with data that is much more unstructured, contained in documents or images, voice and video. Data in form of natural language, pixels, audio samples or video frames is much harder to process. They need to be stored efficiently and should be findable and retrievable even after years. ECM systems deal with processing and management of such kind of information. A relational database alone is often not sufficient to manage these kinds of data. ECM is about managing your content. CMIS tries to standardize storing, retrieving and finding content and therefore allows exchanging information between different ECM repositories. The nature of lacking structure implies that ECM repositories follow very different approaches how to achieve this goal. Often they are specialized on only a few aspects, scenarios or data types that they can handle efficiently. Many special features exist that just make sense in a specific area, resulting in a wide heterogeneity of systems. Trying to standardize something in such an environment is a challenge. You need to find the right balance: On one hand it should be possible for a standard to be fulfilled by as many systems as possible. On the other hand you want to cover as much functionality as possible so that the standard is useful in a wide range of business cases and scenarios. CMIS tries to focus on the common parts that exist in most of the available systems today. When in doubt if a feature makes sense or can’t be implemented by most of the systems the TC (OASIS Technical committee) rejected it in favor of keeping the standard simple and easier to implement. So it is likely that you will miss one or the other feature of your favored system.
What all systems have in common is the capability to bring structure in the information with metadata that go along with the content. They assist you in finding the information by various means and they try to control who can see and modify the information. All this is part of the CMIS Domain Model. We will take a closer look at the domain model in the next parts of the series. The domain model is the core of the standard.
The domain model describes the objects, methods and what you can do with them. It does not describe how you technically get access to them and perform actions on them. There are several possible ways to approach this: You could for example describe an API in a programming language, you could describe a file format or you can agree on a network protocol to exchange information. You even can omit this completely and define it as outside of the scope of the standard. Each approach has its strengths and weaknesses and there is no simple answer to what the best approach is. CMIS decided to standardize on protocol level and calls this bindings.
The reasons for this are that the TC has seen interoperability between as many systems as possible as fundamental. In today’s world there is much more agreement on network protocols then on APIs. An API is always tied to a platform and programming language. But ECM systems are available on very different platforms and implemented in a wide range of different programming languages. It was essential that the Java and Microsoft platforms can be covered adequately as they are the most important platforms today. The downside of this approach is that it is much easier to implement an API for a programmer than to deal with all the details of a network protocol. If more than one protocol needs to be supported things get significantly harder.
In the current release CMIS defines two different bindings: The RESTful AtomPub binding and a SOAP binding for web services. To cover a wide range of systems the TC decided to define two different bindings with different characteristics. The SOAP binding supports the widely used web service standard as transport mechanism. Web services are supported by many different platforms, they are available for a long time and good tooling support is available. Web services also offer a wide range of advanced functionalities like encryption, transaction handling and more on top of the basic protocol. But web services are also criticized because the standard is complex, the protocol not very efficient on the wire and it hardly can be used without any tool support. To offer an alternative the CMIS TC decided to standardize in addition a second protocol following a different (more RESTful) paradigm. This is based on the Atom Publishing Protocol (AtomPub or APP) specification. AtomPub is defined in RFC4287 by the IETF. The AtomPub protocol is XML transferred over http. By using AtomPub simple CMIS requests can even be handled using a web browser.
Fully CMIS compliant repositories have to support both bindings. From what we can see today (which is pretty early of course) it seems that the AtomPub protocol is the more popular one.
Having a library available that offers interfaces, classes and methods can make implementing the standard much easier. It can guarantee the conformance on wire level, provide familiar data types and method conventions and most important: offer support for both bindings transparently to the programmer (if designed in the right way). Over time it is expected to see a number of available libraries for different platforms. Announced or available are for example Android, iPhone, Java, Python and PHP. The most complete functionality today is available in Java in the Apache Chemistry/OpenCMIS project. Chemistry offers support for client and server for SOAP and AtomPub and covers the full scope of the specification.
CMIS is not the first attempt for a standard in the ECM area. Other initiatives like JSR 283, JSR 170, WebDAV and some older initiatives like DMA and ODMA either are related to CMIS or overlap. Within the remainder of these series we may take a look here and there how they compare, but an in-depth comparison is out-of-scope.
This introduction should give you a first overview about the CMIS standard. Lets’ look forward to the next part introducing the domain model and giving you a handle to more concrete things you can do with CMIS.
OASIS CMIS Technical Committee: http://www.oasis-open.org/committees/cmis/
CMIS 1.0 Specification: http://www.oasis-open.org/committees/download.php/36486/CMIS-cd07.zip
AtomPub Specification: http://www.ietf.org/rfc/rfc4287.txt
Web Services: http://www.w3.org/standards/webofservices
Apache Chemistry: http://incubator.apache.org/chemistry/
Filed under: CMIS, ECM | 30 Comments
Tags: Chemistry, CMIS, ECM