Tags
October 22, 2023
by
Alexander VerbruggenRead more about this author

Query syntax

Blox (the visual orchestration language in the Nabu Platform) and Glue (the scripting language) share the same custom execution engine.

I drew inspiration from XPath and added a query syntax similar to that found in XPath. To explain how it works I will start with glue where you can actually see the full syntax. Blox hides part of that syntax in an effort to make it easier to use.

Basic Glue querying

Assume we have a basic data set of dogs:

dogs = [
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Charlie",
        "age": 3,
        "breed": "beagle"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

In pretty much any programming language you can perform indexed access on an array to fetch a particular item, for instance let's get the second dog (the engine uses 0-based indexes):

charlie = dogs[1]
{
    "name": "Charlie",
    "age": 3,
    "breed": "beagle"
}

But what if you want all dogs that are at least 5 years old? Most languages will require you to write a filter function and apply it to the array in some way. In the Nabu execution engine you can use a XPath-like query:

dogsOfInterest = dogs[age >= 5]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

Internally the engine will loop over all dogs and find those with the requested age restriction. You can also filter on multiple fields, for instance suppose we want to specify that we are only interested in bulldogs:

dogsOfInterest = dogs[age >= 5 && breed == "bulldog"]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }
]

Nested queries

Suppose we have a slightly more complex data structure with nested arrays:

owners = [
    {
        "name": "John Smith",
        "country": "be",
        "age": 7,
        "dogs": [
            {
                "name": "Sparky",
                "age": 5,
                "breed": "bulldog"
            },
            {
                "name": "Charlie",
                "age": 3,
                "breed": "beagle"
            }
        ]
    }, {
        "name": "Jack Smooth",
        "country": "nl",
        "age": 7,
        "dogs": [
            {
                "name": "Byron",
                "age": 8,
                "breed": "cane-corso"
            }
        ]
    }
]

Suppose we just want a list of all dogs, we can do this:

allDogs = owners/dogs
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Charlie",
        "age": 3,
        "breed": "beagle"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

This will return a flat list of dogs that looks exactly like our first dataset. Note that we use the same / separator as XPath for object access rather than the . syntax found in most programming languages.

We could also apply the same query as we did before:

dogsOfInterest = owners/dogs[age >= 5 && breed == "bulldog"]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }
]

Indexed access on subarrays works but maybe not in the way you might think:

dogs = owners/dogs[0]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

The result may surprise you because there are two ways to interpret that query:

  • combine all dogs for all owners and give me the first one
  • combine all first dogs for all owners

The engine takes the second approach.

In the final example suppose we want all beagles for owners living in Belgium:

belgianBeagles = owners[country == "be"]/dogs[breed == "beagle"]
[
    {
        "name": "Charlie",
        "age": 3,
        "breed": "beagle"
    }
]

Variable context

In the queries we have run so far, we have used variables that exist at the level we are querying. When we execute this query:

dogsOfInterest = dogs[age >= 5]

We access the age property that exists in each iteration of the dog object.

However, sometimes you may want to access variables that exist outside the scope of where you are now querying, for instance let's say you want an age filter for dogs but you want to make the target age a variable that you can determine elsewhere:

# determine the target age in some fancy way
age = 5
dogsOfInterest = dogs[age >= /age]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

By starting the second variable with a / you are actually telling the system not to look in the dog object but jump to the root of the execution context to find the proper value. This is what we call absolute variable access whereas without a leading slash it is called relative variable access.

Through relative access we can also access the parent context by jumping up with ../, for instance because our dog is only one context deeper than the root context, the following yields the same result:

age = 5
dogsOfInterest = dogs[age >= ../age]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

Suppose however that we want to find all the dogs that are older than their owners, we could do this:

age = 5
dogsOfInterest = owners/dogs[age >= ../age]
[
    {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

In this case our relative access is no longer to the root because the direct parent is an owner who has an age. We could still access the root by jumping up two levels:

age = 5
dogsOfInterest = owners/dogs[age >= ../../age]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

Of course if you want to access the root, it is much better to use absolute variable access.

Querying in Blox

Blox uses the same execution engine but uses a more graphical approach. However, even in this graphical environment you can use the same query syntax, for example if I just draw this line:

You can see in the top right that Nabu developer automatically inserted an index 0 for owners. It doesn't know (yet) what you want to do, so it assumes a few things based on the circumstances:

  • you are linking a list to a list, we assume you want the full list
  • you are linking from an element inside another list (owners), so let's default to the first iteration

On the left is the result from the service, on the right is the persisted XML of the blox service:

Line result

<from>owners[0]/dogs</from> <to>output/dogs</to>

We can remove the index alltogether:

No index result

<from>owners/dogs</from> <to>output/dogs</to>

If we select the line and reindex (shortkey F8), Nabu will insert indexes for all lists involved, this included target lists. We can then adapt those indexes for more complex filtering:

As you can see, you can set the same type of filtering but in blox you fill in each index separately rather than typing the full query by hand.

When we run it:

Reindexed result

<from>owners[country == "be"]/dogs[age >= /age]</from> <to>output/dogs</to>

Two additional things you might have noticed:

  • the line itself turns blue once you add a query to it, this is to visually alert the user that it there is a non-numeric index set somewhere on the line
  • there are two tiny blue indicators on the line that indicate the multiplicity of the source and the target, this is either 1 or *

The multiplicity can be a quick indicator to see what the line would do at runtime. For instance if you map a list (indicated by *) to a singular element, this is actually allowed if the list contains exactly one element at runtime.

This can be the intention or it might be accidental.

Validation

Blox will perform validation on the queries making sure that you are referencing fields that actually exist and are performing operations on types that support them.

Suppose we accidently mistype a variable name, blox will highlight the stap as containing an error:

Differences with actual XPath

There are some key differences in the implementation of this engine versus what you might expect from XPath.

1) In XPath array access is 1-based, in this engine array access is 0-based. Accessing the first element would entail writing dogs[1] in XPath and dogs[0] in this engine.

2) XPath allows combining different nodes into a single resultset. This is not supported by the engine because design time guarantees are very important for the Nabu Platform meaning it needs to be able to calculate the return value of a query at all times.

3) The operators used in this engine are more in line with programming than XPath. The engine also includes some unique operators and accessors but that is beyond the scope of this article.

4) In XPath a particular query does not have a guaranteed result multiplicity, this engine however takes a very predictable approach: once you have a query, you will always get a list back. Even when it's empty or contains exactly one element.

Advanced Glue querying

Because Glue is a scripting language, it expects an audience that is slightly more familiar with programming. That means it allows for some more advanced features like executing functions and lambdas as part of the queries:

filter = lambda(age, age >= 5)
dogsOfInterest = owners/dogs[filter(age)]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Byron",
        "age": 8,
        "breed": "cane-corso"
    }
]

You can access the entire object you are currently evaluating by using $this:

stage = lambda(dog, when(dog/age >= 8, "senior", "adult"))
dogsOfInterest = owners/dogs[stage($this) == "adult"]
[
    {
        "name": "Sparky",
        "age": 5,
        "breed": "bulldog"
    }, {
        "name": "Charlie",
        "age": 3,
        "breed": "beagle"
    }
]