Wendelin Home Wendelin

    HowTo ingest data with ebulk in to Wendelin

    A tutorial showing how to ingest data in to Wendelin using ebulk.
    • Last Update:2023-05-15
    • Version:002
    • Language:en

    Agenda

    • Install ebulk / Java 8

    • Configure ebulk client

    • Get example data set

    • Ingest data in to wendelin

    • Check your data is ingested at wendelin side

    This tutorial will teach you how to ingest data in to Wendelin platform using ebulk.

    In order to do so you must have already a Wendelin instance ready and know its URL and username / password to access.

    There is no need of additional configuration at Wendelin side as it comes already pre configured.

    You can read wendelin-HowTo.Install.Wendelin.Standalone to know how to install Wendelin.

    Please note that during installation you should have checked to install the proposed data lake functionality!

    Install ebulk / Java 8

    
    root@debian: ~$ apt-get install software-properties-common
    root@debian: ~$ gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys B38DB8D777BB9026
    root@debian: ~$ gpg --output /usr/share/keyrings/ebulk-ebulk-ppa-archive-keyring.gpg --export B38DB8D777BB9026
    root@debian: ~$ echo "deb [signed-by=/usr/share/keyrings/ebulk-ebulk-ppa-archive-keyring.gpg] http://ppa.launchpad.net/ebulk/ebulk-ppa/ubuntu xenial main" > /etc/apt/sources.list.d/ebulk-ebulk-ppa.list 
    root@debian: ~$ apt-get update
    root@debian: ~$ apt-get install ebulk
    
    

    Ebulk tool is a wrapper for Embulk, an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. It supports any kind of input file formats, parallel and distributed execution to deal with big data sets, transaction control to guarantee All-or-Nothing file transfer, and operation resuming.

    Ebulk is as easy as git to use, allowing the big data transfering to be done by using very few commands.

    Follow the command mentioned above to install it. 

    More information about Ebulk can be found here.

    Please note that ebulk relies on embulk which needs Java 8 binary to be installed. It is outside scope of this tutorial how you may install Java 8 for your platform still one can download old Java 8 binaries for Debian from http://snapshot.debian.org/package/openjdk-8/

    Configure ebulk client

    
    ivan@debian: ~$ ebulk set-data-lake-url
    
    ivan@debian: ~$ ebulk store-credentials
    
    

    Before this step you need to be aware of your Wendelin's instance URL, username and password.

    [1] If you installed wendelin following HowTo Install Wendelin Standalone tutorial then the url will be in following format: https://<ip_v4>/erp5

    [2] Otherwise, if you followed HowTo Install Wendelin on Webrunner tutorial, the url will be something like this: https://softinstXXXXX.host.vifib.net/erp5/  

    Similarly for username and password:

    If you installed wendelin using [1] you can find you username and password by using erp5-show -s command.

    If you used [2] way, this information you can find in webrunner in Connection Information section. 

     

    Example Data Set

     

    
    wget http://www.imageemotion.org/testImages_artphoto.zip
    
    unzip testImages_artphoto.zip
    
    
    

    If you want to upload data to wendelin but don't have proper test data set you can download one from here or using wget command 

    After downloading test data set unzip it.

    Ebulk Push

    
    ivan@debian: ~$ ebulk init <Your_BIG_Data_Set>
    
    ivan@debian: ~$ ebulk push <Your_BIG_Data_Set>
    
    

    The first step will prepare internally your folder with ebulk's metadata files inside.

    Then actually push data to wendelin.

     

    Check Data

    If you installed wendelin following HowTo Install Wendelin Standalone tutorial with "data lake" funtionality selected for installation a default data lake website user interface will be available under this URL

    https://<ip_v4>/erp5/web_site_module/default_wendelin_data_lake/

    Otherwise, if you followed HowTo Install Wendelin on Webrunner tutorial, the url will be something like this

    https://softinstXXXXX.host.vifib.net/erp5/web_site_module/default_wendelin_data_lake/

    Click on  Data Sets on the left side panel. Your newly uploaded data set should be there.