Tags
November 02, 2023
by
Alexander VerbruggenRead more about this author

Introduction

Series are at the heart of glue v2 which is why it comes with a number of tools to manipulate them.

Every series is an Iterable that can provide a series of data points on demand. You can think of it as a list or an array but where a list and an array need to be fully defined beforehand, an iterable does not.

Series are generally lazily evaluated which means they are only resolved when it is requested. This lazy behavior can exist at two different levels:

  • a series has only requested elements up to the point that it is necessary and not beyond. This allows for concepts like infinite lists. Note that not all lists are infinite.
  • a series only calculates elements as needed, this allows for things like parallel resolving of lists for highly computational elements. Note that not all elements are lazy.

Basics

Let's start simple though. In glue I prefer not to add new syntax unless it is absolutely necessary. This means creating and manipulating series is done with functions.

You can create a basic series:

simple = series(1, 2, 3)

If you want to reverse the series so it contains [3, 2, 1] you could do:

reversed = reverse(simple)

Getting the first element and the last element of a series is equally easy:

first = first(simple)
last = last(simple)

Note that the last method will have to run over the entire series so does not work on infinite series. It will not execute lazy elements in the list along the way.

You can always access a specific entry in a series, note that series are 0-based.

simple = series(1, 2, 3)
echo(simple[1])
2

You can also get the size of a list:

simple = series(1, 2, 3)
size = size(simple)
echo(size)
3

Infinite series

Infinite series can be created in a number of ways but it is often done with a generator, for example let's generate a fibonacci sequence:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))

Because the series is infinite, this piece of code will run forever printing out numbers:

for (item : fibonacci)
    echo(item)

Note that each subsequent number is only calculated when it is required by its context, for instance in this case the for loop is asking for the next item.

You can access an infinite series with indexed access, it will not be resolved beyond the requested index:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
echo(fibonacci[5])
5

As long as glue can determine that the result is an integer, you can also use variable statements as numeric index without the risk of evaluating the rest of the infinite list:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
a = 5
echo(fibonacci[/a + 4 - (2 * 2)])
5

Series functions

One of the design goals of glue is to provide a limited set of powerful reusable functions rather than a large set of very specific functions. Check out the classic utilities article to see how we can use these to build the functions you might find in other languages.

Limit/Offset

Suppose you only wanted to print out the first 10 elements of fibonacci, you could do:

for (item : limit(10, fibonacci))
    echo(item)
0
1
1
2
3
5
8
13
21
34

If we wanted to start at a certain offset, we could do:

fibonacciWithOffset = offset(10, fibonacci)
for (item : limit(10, fibonacciWithOffset))
    echo(item)
55
89
144
233
377
610
987
1597
2584
4181

The offset can also be negative at which point it will take off items at the end:

series = series("a", "b", "c", "d")

for (value : offset(-2, series))
    echo(value)
a
b

From/To

You can select a starting point and an endpoint based on a lambda, for example let's select the fibonacci sequence numbers that are between 10 and 10000:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
series = from(lambda(x, x > 10), fibonacci)
series = to(lambda(x, x > 10000), series)
echo(series)
[13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765]

Merge

You can merge multiple series together:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
merged = merge(
    limit(5, fibonacci),
    reverse(limit(5, fibonacci)))

for (value : merged)
    echo(value)
0
1
1
2
3
3
2
1
1
0

Repeat

You can repeat a series, note that this repeated series becomes infinite:

series = series(1, 2, 3)
repeated = repeat(series)
echo(limit(9, repeated))
1
2
3
1
2
3
1
2
3

Sort

You can sort a series but to do that properly the series needs to be fully resolved. As such it can not be run on infinite series.

For example let's reverse sort this list of integers:

series = series(1, 2, 3, 4)
sorted = sort(lambda(a, b, b - a), series)
echo(sorted)
4
3
2
1

Filter

Suppose we want to take the first 100 items in the fibonacci sequence and only use those that are between 25 and 50:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
filtered = filter(lambda(x, x >= 25 && x < 50), limit(100, fibonacci))

for (value : filtered)
    echo(value)
34

Because the engine has native support for querying, you can also write this:

filtered = limit(100, fibonacci)[$this >= 25 && $this < 50]

One of the advantages of using the filter method is that you can more easily access the index if it is a necessary part of your filter logic:

series = repeat(1, 2)
series = limit(10, series)
series = filter(lambda(x, i, i % 2 != 0), series)
echo(series)
[2, 2, 2, 2, 2]

Derive

You can derive a new series from one or more input series, for example let's create the multiple of the fibonacci and lucas series:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
lucas = generate(lambda(t2: 2, t1: 1, t2 + t1))

multiple = derive(lambda(x, y, x * y), fibonacci, lucas)

for (value : limit(10, multiple))
    echo(value)
0
1
3
8
21
55
144
377
987
2584

Derive can be used to create powerful derivatives of series. However when you are doing basic derivations, you can also use basic math directly on series. This would yield the same result:

multiple = fibonacci * lucas

You can use most classic operators on series of data, note that these new series are also lazily executed. For example let's take a series and add 1 to each element in it.

simple = series(1, 2, 3) + 1
echo(simple)
[2, 3, 4]

This can also be used for example for string concatenation:

strings = series("a", "b", "c") + "d"
echo(strings)
[ad, bd, cd]

When building more complex calculations, the operators follow the standard order:

simple = series(1, 2, 3)
result = simple - simple * 2

for (value : result)
    echo(value)
-1
-2
-3

Aggregate

Derivations focus on creating new series based on existing series where each derived element is based on a singular element from all input series. Aggregation however is concerned with creating an aggregate over multiple items.

This means while derivations can be done lazily, aggregations can not as they depend on the specific order of execution being followed.

Sum

Suppose we want to make an aggregate series that returns the sum of another series at any point in time:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
sum = aggregate(lambda(current: 0, new, current + new), fibonacci)

for (value : limit(10, sum))
    echo(value)
0
1
2
4
7
12
20
33
54
88

Average

You can use aggregation in combination with other methods to create more complex calculations, for example a naive average calculation could be:

fibonacci = generate(lambda(t2: 0, t1: 1, t2 + t1))
sum = aggregate(lambda(current: 0, new, current + new), fibonacci)
counter = generate(lambda(x: 1, x + 1))

average = sum / counter

echo(limit(10, average))
0
0
0
1
1
2
2
4
6
8

Min

You can calculate the minimum of a series using the aggregate function:

series = series(3, 2, 1, 2, 4, 3)
# a series that calculates at each position the min until then
min = aggregate(lambda(current, new, when(current == null || new < current, new, current)), series)
# the actual min of the entire series
echo(last(min))
1

Max

You can calculate the maximum of a series in the same way:

series = series(3,2,1,2,4,3)
# a series that calculates at each position the min until then
max = aggregate(lambda(current, new, when(current == null || new > current, new, current)), series)
# the actual min of the entire series
echo(last(max))
4

Explode

Explosion is yet another way to create new series based on existing series. The focus here however is to take an element in a series and generate 0 or more elements for the new series, for example:

series = explode(lambda(a, series(a, -a)), 1, 2, 3)

echo(series)
[1, -1, 2, -2, 3, -3]

You could remove values from the resulting list as well:

series = explode(lambda(a, when(a != 1, series(a, -a))), 1, 2, 3)

echo(series)
[2, -2, 3, -3]

Unique

You can generate a series where each element is unique:

series = series(1, 2, 3, 3, 2, 4, 5)
echo(unique(series))
1
2
3
4
5

Position

You can find the position of an arbitrary element, this will only resolve the list until the position is found. For example let's find the position of the element '2':

series = series(1, 2, 3, 4, 5, 2)
echo(position(lambda(x, x == 2), series))
1

Subsequent

If you need subsequent positions, there are a number of ways to get them, the easiest is to add a second parameter to your lambda which will contain the current index:

series = series(1, 2, 3, 4, 5, 2)

firstIndex = position(lambda(x, x == 2), series)
secondIndex = position(lambda(x, index, x == 2 && index > firstIndex), series);

echo("Indexes: " + firstIndex + ", " + secondIndex)
Indexes: 1, 5

Alternatively you can use offsets:

series = series(1, 2, 3, 4, 5, 2)

finder = lambda(x, x == 2)
firstIndex = position(finder, series)
secondIndex = position(finder, offset(firstIndex + 1, series))

echo("Indexes: " + firstIndex + ", " + (firstIndex + 1 + secondIndex))

This will echo the same result as the above.

Advanced

If you need to do a lot of position finding, it could be easier to use a lambda generator:

series = series(1, 2, 3, 4, 5, 2)
positionFinder = lambda(valueToFind, currentIndex: -1,
    lambda(x, index, x == valueToFind && index > currentIndex))

firstIndex = position(positionFinder(2), series)
secondIndex = position(positionFinder(2, firstIndex), series)

echo("Indexes: " + firstIndex + ", " + secondIndex)

Last Position

If you want the last position of a match, the list will have to be resolved fully anyway. This combined with the (in my experience) rare need for such a method means there is not a dedicated solution, there are however ways to get your last position:

series = series(1, 2, 3, 4, 5, 2)
lastIndex = size(series) - 1 - position(lambda(x, x == 2), reverse(series))

echo("Last index: " + lastIndex)
Last index: 5

Contains

There is no contains method as there is an operator that check for contains:

series = series(1, 2, 3)
# Check if 1 is in the series
echo(1 ? series)
# Check if 1 is not in the series
echo(1 !? series)
# Check if 4 is in the series
echo(4 ? series)
# Check if 4 is not in the series
echo(4 !? series)

This prints out:

true
false
false
true

Grouping

You can also group elements together using arbitrary keys. For example suppose if we want to create two groups of people, those under 30 and those over:

series = series(
    structure(name: "John", age: 25),
    structure(name: "Bob", age: 30),
    structure(name: "Jim", age: 35))

echo(group(lambda(person, person/age >= 30), series))
{false=[{name=John, age=25}], true=[{name=Bob, age=30}, {name=Jim, age=35}]}

In this case you see the key was a boolean, we could also opt for more expressive keys:

grouped = group(lambda(person, when(person/age >= 30, "old", "young")), series)
echo(keys(grouped))
[young, old]

You can also access the grouped result directly:

series = series(
    structure(name: "John", age: 25),
    structure(name: "Bob", age: 30),
    structure(name: "Jim", age: 35))
grouped = group(lambda(person, when(person/age >= 30, "old", "young")), series)
echo(grouped["young"])
[{name=John, age=25}]

An element can even belong to multiple keys by returning a series of keys from the lambda.

series = series(
        structure(name: "John", age: 25),
        structure(name: "Bob", age: 30),
        structure(name: "Jim", age: 35))
echo(group(lambda(person, series(person/age, person/age - 5)), series))
{35=[{name=Jim, age=35}], 20=[{name=John, age=25}], 25=[{name=John, age=25}, {name=Bob, age=30}], 30=[{name=Bob, age=30}, {name=Jim, age=35}]}

Maps

You can take one or more series of data and make a map out of them:

result = map("firstName", "lastName",
    series("John", "Smith"),
    series("Jim", "Smith"),
    series("Bob", "Smith"))
echo(result)
[{firstName=John, lastName=Smith}, {firstName=Jim, lastName=Smith}, {firstName=Bob, lastName=Smith}]

You can then loop over the map:

echo("Person is: " + result/firstName + " " + result/lastName)
[Person is: John Smith, Person is: Jim Smith, Person is: Bob Smith]