Wendelin Home Wendelin

    Wendelin - HowTo Transform Data

    HowTo Transform Data
    • Last Update:2020-08-14
    • Version:
    • Language:

    Agenda

    • Create Data Operation
    • Create Data Product
    • Create Transformation Script
    • Create Data Transformation
    • See Data Analysis
    • See Data Array

     

    This tutorial teaches how to transform raw data and write it in Data Arrays.

    Before doing this tutorial make sure you read and completed the following tutorial and have data in your wendelin instance:

     

    Data Operation

    Open your Wendelin dashboard. 

    In Modules click on Data Operations Module

    Add Data Operation

    Click on Add to add a new Data Operation.

    Add Data Operation

    Click Proceed to continue.

    Fill the Form

    Fill the form to create Data Operation.

    Title -  we name it Convert Raw Environment Data to Array

    Reference  - data-operation-convert-raw-data-to-array 

    Script ID DataAnalysisLine_convertEnvironmentDataStreamToArray : this script will do all the magic. It doesn't exist yet, we will create it later in this tutorial. 

    Item Types - Data Array : we choose Data Array as we will convert the raw data to Data Array

    Save

    Click Save to save the changes.

    Validate

    Click on Validate on left side panel to validate Data Operation.

    Confirm Validation

    Click Proceed to confirm validation.

    Data Product

     

    Next we need to create a new Data Product which will the output Data Product of the Transformation. 

     

    Create Data Product

     

    Create a new Data Product as described in HowTo Create Data Product tutorial with following values

    Title -  Environment Raw Array

    Reference - environment-raw-array

    Item Types

    • Data Stream
    • Progress Indicator
    • Data Array

     

     

    Portal Callables

    After Data Product is created and validated, navigate to page called Portal Callables by clicking on Callables  on the left side panel.

    There we will create and store a python script that will transform raw data to arrays. 

     

    Add Transformation Script

    Click on Add button to add a new script.

    Add Transformation Script Cont.

    Choose Python Script as Document Type and click on Create Document to create an empty python script.

    Fill The Form

    Define ID, Title and Reference of your script.

    We name it DataAnalysisLine_convertEnvironmentDataStreamToArray as we did in Data Operation.

    And click Save to save the changes.

    Parameters

    Next we define the parameters we will give to our script. 

    in_stream - the input Data Stream where raw data is stored

    out_array  - the output Data Array where data will be stored. 

    At the end click Save to save the changes.

    Transformation Script

    The script we write in the textbox area at the bottom of the page.

    Transformation Script Cont.

    
    import pandas as pd
    
    progress_indicator = in_stream["Progress Indicator"]
    in_data_stream = in_stream["Data Stream"]
    out_data_array = out_array["Data Array"]
    
    chunk_size = 20 * 10**6
    start = progress_indicator.getIntOffsetIndex()
    end = min(start+chunk_size, in_data_stream.getSize())
    
    unpacked, end = in_data_stream.readMsgpackChunkList(start, end)
    f = in_data_stream.extractDateTime
    df = pd.DataFrame((dict(**o[1]) for o in unpacked), dtype="float64", index=(f(o[0]) for o in unpacked))
    
    if df.shape[0] == 0:
      return
    
    df.index.name="date"
    
    ndarray = df.to_records(convert_datetime64=False)
    
    zbigarray = out_data_array.getArray()
    if zbigarray is None:
      zbigarray = out_data_array.initArray(shape=(0,), dtype=ndarray.dtype.fields)
    
    zbigarray.append(ndarray)
    
    if end > start:
      progress_indicator.setIntOffsetIndex(end)
    
    # tell caller to create new activity after processing if we did not reach end of stream
    if end < in_data_stream.getSize():
      return 1
    
    

    This script will simply write raw data to Data Array.  More sophisticated example like resampling data or data predictions are shown in following tutorials. 

    First we extract data from the Data stream and read a chunk of data at a time. 

    As data is in msgpack format we unpack it using iterUnpack method of wendelin Data Stream which is based on  Unpacker of msgpack. 

    Next we create a data frame using pandas DataFrame

    After we convert it to ndarray and append it to zbigarray.

    We use Progress Indicator to keep track of how much raw data we already processed. 

    Data Transformations

    After your transformation script is ready, it's time to create Data Transformation.

    On Modules page click on Data Transformations.

    Add Data Transformations

    Click on Add to add a new Data Transformation.

    Create Data Transformation

    Click on Create Document to continue.

    Fill The Form

    Chose descriptive title and reference. For example

    Title : Convert Environment Raw Data

    Reference : convert-environment-raw-data

    For Initial Product we choose the Data Product we created in HowTo Create Data Product tutorial. 

    At the end click Save to save the changes. 

    Data Transformation Lines

    Click on Add button to add a Data Transformation Line.

     

    Create Operation Line

     

    Choose Data Transformation Operation Line for Document Type and click on Create Document. 

    Fill The Form

    Fill the form.

    For Data Operation put the name of the Data operation that we created at the beginning of this tutorial - Convert Raw Environment Data to Array.

    At the end click on Save to save the changes. 

    Back To Data Transformation

    Once needed fields are filled click Save to save the changes and go back to Data Transformations view by clicking on the upper panel.

    Data Transformation Lines

    Add another Data Transformation Line.

    Create Transformation Line

    Now for Document Type choose Data Transformation Resource Line and click on Create Document.

    Fill The Form

    Fill the form.

    For Data Product put the name of the Data Product that we created in HowTo Create Data Product tutorial.

    After filling in the Data Product name click Save to save the intermediate changes. 

    After saving a new field Item Types will appear. 

    Fill The Form Cont.

    Continue filling the form as shown on the screenshot

    At the end don't forget to save the changes.

    Add Output Line

    Head back to Data Transformation and add one more Transformation line.

    Create Transformation Line

    Once again for Document Type choose Data Transformation Resource Line and click on Create Document.

    Fill The Form

    Fill the form.

    For Data Product put the name of the Data Product that we created earlier in this tutorial - Environment Raw Array.

    Once filling in the Data Product name click Save to save the intermediate changes. 

    After saving a new field Item Types will appear. 

    Fill The Form Cont.

    Continue filling the form as shown on the screenshot

    At the end don't forget to save the changes.

    Data Transformation

    Head back to Data Transformation.

    Now you can see that we have 3 Data Transformation Lines.

    The first line defines what operation will be done. 

    The second line defines on what the operation will be done - the input.

    And the third line defines the output.

    Note the Quantity column of the Transformation Lines. Make sure the values are the same as shown on the screenshot.

    The very last step is to validate the Data Transformation. 

    Click on Validate on the left side panel to validate the Data Transformation.

    Confirm Validation

    Click on Validate to confirm the Validation.

    Data Analysis

    Go to Data Analyses module.

    Data Analysis

    After few minutes in Data Analysis Module a new Data Analysis called Convert Environment Raw Data will appear. 

    Data Array Module

    Go to Data Array Module to see the data.  

    Data Array

    Here we can see the newly created Data Array called Convert Environment Raw Data.

    Click on it to navigate to array.

    Data Array

    Click on Preview to see the data.

    Note it might take few minutes (<10) until data appears.

    Data Array

    Data is here!