Encode to Json
In your ELARA Solutions, you may want to encode data from an ELARA type to a common data-interchange format. This is typically useful when you want to export data from your running Solution on ELARA Platform.
ELARA supports several Pipeline Operations that transform collection data to a BlobType
value encoded in a particular file format. These Operations include
.toCsv()
, .toJsonLines()
and .toXlsx()
.In this tutorial, you will use several of these Operations in practical exercises. You will
- encode tabular collection data to CSV format, and
- encode nested collection data to JSON Lines format.
In this tutorial, you will continue to use the data/sales.jsonl
sales transactions dataset introduced in the
The starting point of code for this tutorial is the endpoint of the
previous tutorial. The code is available in the StackBlitz instance below for convenience.If you wish to use the above code, simply download the Project, navigate to the Project directory and run npm install
to set up the Project for development.
Encode to CSV Format
In the
previous tutorial, you defined a PipelineDaily Difference in Revenue
, which generates a Datastream of tabular structure - i.e. a DictType
value with StringType
keys and StructType
values without any nested collections.In this tutorial exercise, you will transform the output Datastream of your Daily Difference in Revenue
Pipeline to a BlobType
Datastream of CSV format. To do this, you will use the
.toCsv()
method of PipelineBuilder()
.First, initialise a new
PipelineBuilder()
instance in your src/my_tutorial.ts
Asset. Set the target Datastream as the output Datastream of your Daily Difference in Revenue
Pipeline, defined in the previous lesson. Then call the .toCsv()
method of PipelineBuilder()
import { ..., PipelineBuilder, Template } from "@elaraai/core"
...
const select_exercise_one = new PipelineBuilder("Daily Difference in Revenue")
.from(offset_exercise_one.outputStream())
.select({
...
})
const encode_exercise_one = new PipelineBuilder("Daily Difference in Revenue.CSV")
.from(select_exercise_one.outputStream())
.toCsv()
export default Template(
...
encode_exercise_one
)
As it's first argument,
.toCsv()
expects a TypeScript object containing the configuration of the Operation. The configuration properties include:selections
, where you provide the fields to be translated to columns in the CSV-formatted output,skip_n
, where you define how many rows of data to skip,delimiter
, where you define the CSV delimiter, andnull_str
, where you define astring
value for hownull
values are encoded.
In your
.toCsv()
method call, provide a configuration with:selections
including fields for your input Datastream'sdate
anddailyChangeInRevenue
fieldsskip_n
as0n
,delimiter
as,
, andnull_str
as an empty string (""
).
Your Pipeline definition should read:
...
const encode_exercise_one = new PipelineBuilder("Daily Difference in Revenue.CSV")
.from(select_exercise_one.outputStream())
.toCsv({
selections: {
Date: fields => fields.date,
"Daily Change In Revenue": fields => fields.dailyChangeInRevenue
},
skip_n: 0n,
delimiter: ",",
null_str: ""
})
...
You've now defined a
.toCsv()
Operation. Launch a Solution and test your Pipeline's output Datastream "Pipeline.Daily Difference in Revenue.CSV"
yourself to view the expected result. For convenience, a Project containing the code changes above is available in the StackBlitz instance below, ready to install and launch.To read the output Datastream "Pipeline.Daily Difference in Revenue.CSV"
, you can use the edk stream get
command, remembering to use the --output "bytes"
option to allow you to read the BlobType
output Datastream.
Encode to JSON Lines Format
In the
previous tutorial, you defined a PipelineDaily Difference in Revenue by Product Code
, which generates a Datastream of tabular structure with a nested collection - i.e. a DictType
nested collection dailyChangeInRevenuePerProductCode
In this tutorial exercise, you will transform the output Datastream of your Daily Difference in Revenue
Pipeline to a BlobType
Datastream of JSON format. To do this, you will use the
.toJsonLines()
method of PipelineBuilder()
.First, initialise a new
PipelineBuilder()
instance in your src/my_tutorial.ts
Asset. Set the target Datastream as the output Datastream of your Daily Difference in Revenue by Product Code
Pipeline, defined in the previous lesson. Then call the .toJsonLines()
method of PipelineBuilder()
import { ..., PipelineBuilder, Template } from "@elaraai/core"
...
const select_exercise_two = new PipelineBuilder("Daily Difference in Revenue by Product Code")
.from(parse_products.outputStream())
...
.select({
...
})
const encode_exercise_two = new PipelineBuilder("Daily Difference in Revenue by Product Code.JSON")
.from(select_exercise_two.outputStream())
.toJsonLines()
export default Template(
...
encode_exercise_two
)
As it's first argument,
.toJsonLines()
expects a TypeScript object containing the configuration of the Operation. There is only one configuration option available: selections
, where you provide the fields to be translated to JSON-object properties on each line of your output JSON Lines file.Define a selections
property in your
.toJsonLines()
configuration, and define the JSON-object properties to be mapped:- map the
date
field. This time, ensure you use thePrint()
Expression to parse theDateTimeType
field toStringType
, as.toJsonLines()
does not supportDateTimeType
parsing. - map the
dailyChangeInRevenuePerProductCode
field. This field is aDictType
field, which.toJsonLines()
does not handle. However,.toJsonLines()
does handleArrayType
andStructType
fields, so you can convertdailyChangeInRevenuePerProductCode
to an array ofStructType
values using thetoArray()
andStruct()
Expressions from EDK Core.toJsonLines()
translates this to an array of JSON objects.
Your Pipeline definition should read:
...
const encode_exercise_two = new PipelineBuilder("Daily Difference in Revenue by Product Code.JSON")
.from(select_exercise_two.outputStream())
.toJsonLines({
selections: {
date: fields => Print(fields.date),
dailyChangeInRevenuePerProductCode: fields => ToArray(
fields.dailyChangeInRevenuePerProductCode,
(value, key) => Struct({
productCode: key,
revenueDifference: value
})
)
}
})
...
You've now defined an
.toJsonLines()
Operation. Launch a Solution and test your Pipeline's output Datastream "Pipeline.Daily Difference in Revenue by Product Code.JSON"
yourself to view the expected result. For convenience, a Project containing the code changes above is available in the StackBlitz instance below, ready to install and launch.To read the output Datastream "Pipeline.Daily Difference in Revenue by Product Code.JSON"
, you can run edk stream get
, remembering to use the --output "bytes"
option to allow you to read the BlobType
output Datastream.
Next steps
In this tutorial, you learnt how to encode collection data to standard data-interchange formats returned in a BlobType
Datastream.
In the
next module, you will learn how to exportBlobType
data to external systems using Datasinks.