Skip to main content

Encode to Json

In your ELARA Solutions, you may want to encode data from an ELARA type to a common data-interchange format. This is typically useful when you want to export data from your running Solution on ELARA Platform.

ELARA supports several Pipeline Operations that transform collection data to a BlobType value encoded in a particular file format. These Operations include

.toCsv(), .toJsonLines() and .toXlsx().

In this tutorial, you will use several of these Operations in practical exercises. You will

  • encode tabular collection data to CSV format, and
  • encode nested collection data to JSON Lines format.

In this tutorial, you will continue to use the data/sales.jsonl sales transactions dataset introduced in the

previous module.

The starting point of code for this tutorial is the endpoint of the

previous tutorial. The code is available in the StackBlitz instance below for convenience.

If you wish to use the above code, simply download the Project, navigate to the Project directory and run npm install to set up the Project for development.

Encode to CSV Format

In the

previous tutorial, you defined a Pipeline Daily Difference in Revenue, which generates a Datastream of tabular structure - i.e. a DictType value with StringType keys and StructType values without any nested collections.

In this tutorial exercise, you will transform the output Datastream of your Daily Difference in Revenue Pipeline to a BlobType Datastream of CSV format. To do this, you will use the

.toCsv() method of PipelineBuilder().

First, initialise a new

PipelineBuilder() instance in your src/my_tutorial.ts Asset. Set the target Datastream as the output Datastream of your Daily Difference in Revenue Pipeline, defined in the previous lesson. Then call the .toCsv() method of PipelineBuilder()

import { ..., PipelineBuilder, Template } from "@elaraai/core"

...

const select_exercise_one = new PipelineBuilder("Daily Difference in Revenue")
.from(offset_exercise_one.outputStream())
.select({
...
})

const encode_exercise_one = new PipelineBuilder("Daily Difference in Revenue.CSV")
.from(select_exercise_one.outputStream())
.toCsv()

export default Template(
...
encode_exercise_one
)

As it's first argument,

.toCsv() expects a TypeScript object containing the configuration of the Operation. The configuration properties include:

  • selections, where you provide the fields to be translated to columns in the CSV-formatted output,
  • skip_n, where you define how many rows of data to skip,
  • delimiter, where you define the CSV delimiter, and
  • null_str, where you define a string value for how null values are encoded.

In your

.toCsv() method call, provide a configuration with:

  • selections including fields for your input Datastream's date and dailyChangeInRevenue fields
  • skip_n as 0n,
  • delimiter as ,, and
  • null_str as an empty string ("").

Your Pipeline definition should read:

...

const encode_exercise_one = new PipelineBuilder("Daily Difference in Revenue.CSV")
.from(select_exercise_one.outputStream())
.toCsv({
selections: {
Date: fields => fields.date,
"Daily Change In Revenue": fields => fields.dailyChangeInRevenue
},
skip_n: 0n,
delimiter: ",",
null_str: ""
})

...

You've now defined a

.toCsv() Operation. Launch a Solution and test your Pipeline's output Datastream "Pipeline.Daily Difference in Revenue.CSV" yourself to view the expected result. For convenience, a Project containing the code changes above is available in the StackBlitz instance below, ready to install and launch.

To read the output Datastream "Pipeline.Daily Difference in Revenue.CSV", you can use the edk stream get command, remembering to use the --output "bytes" option to allow you to read the BlobType output Datastream.

Encode to JSON Lines Format

In the

previous tutorial, you defined a Pipeline Daily Difference in Revenue by Product Code, which generates a Datastream of tabular structure with a nested collection - i.e. a DictType nested collection dailyChangeInRevenuePerProductCode

In this tutorial exercise, you will transform the output Datastream of your Daily Difference in Revenue Pipeline to a BlobType Datastream of JSON format. To do this, you will use the

.toJsonLines() method of PipelineBuilder().

First, initialise a new

PipelineBuilder() instance in your src/my_tutorial.ts Asset. Set the target Datastream as the output Datastream of your Daily Difference in Revenue by Product Code Pipeline, defined in the previous lesson. Then call the .toJsonLines() method of PipelineBuilder()

import { ..., PipelineBuilder, Template } from "@elaraai/core"

...

const select_exercise_two = new PipelineBuilder("Daily Difference in Revenue by Product Code")
.from(parse_products.outputStream())
...
.select({
...
})

const encode_exercise_two = new PipelineBuilder("Daily Difference in Revenue by Product Code.JSON")
.from(select_exercise_two.outputStream())
.toJsonLines()

export default Template(
...
encode_exercise_two
)

As it's first argument,

.toJsonLines() expects a TypeScript object containing the configuration of the Operation. There is only one configuration option available: selections, where you provide the fields to be translated to JSON-object properties on each line of your output JSON Lines file.

Define a selections property in your

.toJsonLines() configuration, and define the JSON-object properties to be mapped:

  • map the date field. This time, ensure you use the Print() Expression to parse the DateTimeType field to StringType, as .toJsonLines() does not support DateTimeType parsing.
  • map the dailyChangeInRevenuePerProductCode field. This field is a DictType field, which .toJsonLines() does not handle. However, .toJsonLines() does handle ArrayType and StructType fields, so you can convert dailyChangeInRevenuePerProductCode to an array of StructType values using the toArray() and Struct() Expressions from EDK Core. toJsonLines() translates this to an array of JSON objects.

Your Pipeline definition should read:

...

const encode_exercise_two = new PipelineBuilder("Daily Difference in Revenue by Product Code.JSON")
.from(select_exercise_two.outputStream())
.toJsonLines({
selections: {
date: fields => Print(fields.date),
dailyChangeInRevenuePerProductCode: fields => ToArray(
fields.dailyChangeInRevenuePerProductCode,
(value, key) => Struct({
productCode: key,
revenueDifference: value
})
)
}
})

...

You've now defined an

.toJsonLines() Operation. Launch a Solution and test your Pipeline's output Datastream "Pipeline.Daily Difference in Revenue by Product Code.JSON" yourself to view the expected result. For convenience, a Project containing the code changes above is available in the StackBlitz instance below, ready to install and launch.

To read the output Datastream "Pipeline.Daily Difference in Revenue by Product Code.JSON", you can run edk stream get, remembering to use the --output "bytes" option to allow you to read the BlobType output Datastream.

Next steps

In this tutorial, you learnt how to encode collection data to standard data-interchange formats returned in a BlobType Datastream.

In the

next module, you will learn how to export BlobType data to external systems using Datasinks.