Data Types

It's all about data

Data

Every application is -in one way or the other- concerned with data: showing, persisting, transforming, validating, interpreting, interacting… Everything from a video codec to a stock exchange is data-related in some way.

As such, data in all its forms is also at the core of Nabu, there are two particularly important concepts:

  • data types : we use data modeling to describe what the data should look like, do we have strings or numbers, lists or singular values?
  • service : a service is anything that you can give data to and which (upon execution) returns data

There is a lot to be said for each of them and in this article we'll dig a bit into data types.

Data Modeling

There are many tools and standards for data modeling. In most programming environments the deliverable of such a tool is given to developers who would proceed to generate code from it either via some tool or manually. In java this generally means developers transform the data model to java beans.

In this code generation there are often approximations as the programming language does not fully support all the concepts available in the design language. In most cases any updates to the model can not be cleanly regenerated into the code which means the code becomes the true master of the data model as that is the only truth of what is actually running in production.

In Nabu we focus on supporting the design language itself, making the deliverable the single source of truth. To allow us to do this, we have created a types API that layers over existing formats for data modeling. This allows us to support java beans as you would find in a traditional programming environment, but also XML Schema, UML, JSON Schema…

Types API Diagram

The visual representation of a type in Nabu is always the same, no matter what the source is. UML, XSD etc are generally not editable in Nabu Developer as they are assumed to be the end deliverable of an external design tool.

If you want to model data in nabu itself we usually use structures for that. The structure implementation of the types API is a fully fledged alternative to other modeling tools and can be edited in Nabu Developer.

An example of a structure definition:

Employee Structure

The icons in front of the field tell you the data type, for instance the "id" is an integer, "department" is a string, profile is a nested structure and jobs is also a nested structure but a list, visually differentiated by the same icon being repeated. The green dot in front of profile and some other fields means it is optional, everything else is mandatory.

Most data types have additional properties where you can finetune your definition, add validations or manage other metadata. For example started and stopped are dates and in the properties you can further define the specificity of the date (timestamp, day, month,…) and other attributes like the timezone, time period…

Date Properties

We often use UML at the core of our applications, here is an example of what the XMI file of a UML model looks like in Nabu Developer:

UML Example

The only difference with a structure is that the UML model is read-only in developer.

Fun Fact: Nabu (like many modeling tools) supports single inheritance but it allows you to do so cross-type, for example you could extend a java bean with a structure.

For more information check out this introduction video for structure editing:

Flexible Typing

In the programming world we make a clear distinction between "design time" which is when you are actually building the application and "run time" which is when you are done designing it and it is actually running.

Some programming languages focus on "design time guarantees" which means the application's behavior is as predictable as possible during the design phase whereas others focus on getting to runtime as fast as possible and testing the behavior while the application is already running.

One of the biggest differences in these two approaches is how you work with data definitions. If you go for design time guarantees, you want very clearly defined data types which have an expected behavior. If you go for runtime validation, you want the runtime to try and figure out the best way to deal with any given scenario.

As a concrete example of this suppose you have some logic that simply does a + b . The result of this operation is highly dependent on what a and b are. If they are numbers, we would expect a mathematical addition. If they are strings we might expect string concatenation. If they are objects we might expect some operator overloading to happen.

If you go for design time guarantees, you want to specify at design time what a and b should be so the operation being executed is always predictable. If you go for runtime validation, you want the runtime to do whatever is best given a random a and b.

We believe there are a few very big drawbacks to runtime validation:

  • predictability : being able to predict what an application will do is paramount to understanding the edge cases that might arise, the problems that people might encounter, the security holes that might form. If in the above example you define a and b to be integer numbers, there are exactly two things that could happen: mathematical addition or an error. If however you don't define the type, we might also have string concatenation, operator overloading, we might have a number and a string with variable results. The number of happy paths has gone from 1 to many. Multiply this with the code base of an entire application and we have an exponential growth of possible outcomes which leads to an infinitely unpredictable application as a whole.

  • readability : an important part of understanding an application is understanding the intent of those who built it. The line between a bug and a feature is often the intent behind it. Readability of an application is limited to the available specificity in the platform you built it with. Without concrete data types, how deep do you have to dig to figure out what that a + b is actually doing? Look at all the call sites? Maybe they are parameterized as well so recursively check their call sites? Understanding a part of an application should not require understanding the whole.

  • efficiency : by moving all validation to the runtime, we add an additional step in the process of building an application which is much less efficient than simply making it more predictable.

We believe that having strong design time guarantees is especially important in larger applications but we try to balance that with ease of use. For example when you drag a line from b to a (see image below), Nabu Developer will -while dragging- calculate whether it can automatically convert one into the other at runtime and if so allow you to create the line. You get a design time guarantee with automatic runtime conversion.

Mapping Example

Realizing however that the world is imperfect and some use cases are simply complex, we have extensive support for working with data that is not predefined, data that is incorrect (e.g. counterparties not following the spec), changing definitions of a data instance while running…

Validation

There are degrees of correctness when it comes to data. We generally distinguish between the data type itself and additional validations. For example you could say the variables a and b in the above example are integer numbers. This already makes the logic very transparent but maybe you want to go further and require that a is always a positive number and b should never be larger than 10.

Specificity is a good thing, it allows you to more narrowly define exactly what you want to happen. It's also important to remember this general design rule: loosening restrictions over time is very easy but adding new restrictions is usually impossible.

To this end Nabu supports a wide range of validation options, mostly stemming from XML Schema but a few have been added from other type systems as well.

There is a ton more to be said about data types and how we handle them in Nabu but I hope this serves as a high level introduction into our take on data modeling.

June 16, 2019
Alexander Verbruggen