Blog do projektu Open Source JavaHotel

niedziela, 29 listopada 2015

Power BI, Java and IBM InfoSphere Streams

Introduction
Power BI is part of Microsoft Azure package enabling real-time analytics and visualization. Power BI can be fed by data from multiple sources. Very interesting feature is REST API interface, thereby allowing integration with any tool or application that supports REST calls.
IBM InfoSphere Streams
Although Streams does not contain any direct support for REST interface, it can be easily implemented by com.ibm.streamsx.inet toolkit.
Authentication using com.ibm.streamsx.inet toolkit
To start working with Power BI REST API, we have to authenticate and receive authentication token used later for other REST API calls. Authentication can be done through HTTPPost operator.

use com.ibm.streamsx.inet.http::HTTPPost ;
use com.ibm.streamsx.inet.http::* ;
use com.ibm.streamsx.inet.http::HTTPGetStream ;

boolean responseOk(HTTPResponse r)
{
 return(r.errorMessage == "OK" && r.responseCode == 200) ;
}

composite BiTest
{
 type
  INPUT = tuple<rstring username, rstring password, rstring client_id,
   rstring grant_type, rstring resource> ;
 graph
  (stream<HTTPResponse> HTTPPost_1_out0) as HTTPPost_1 = HTTPPost(Custom_2_out0
   as InputStreamName0)
  {
   param
    url : "https://login.windows.net/common/oauth2/token" ;
  }

  (stream<INPUT> Custom_2_out0) as Custom_2 = Custom()
  {
   logic
    onProcess :
    {
     println("Submit") ;
     submit({ username = "szczodry@szczerbek.onmicrosoft.com", password =
      "Szczuja123", client_id = "22dfcddc-d5b4-4e8a-8358-1658fabdad0b",
      grant_type = "password", resource =
      "https://analysis.windows.net/powerbi/api" }, Custom_2_out0) ;
     println("Done") ;
    }

  }

  (stream<rstring token> Custom_3_out0) as Custom_3 = Custom(HTTPPost_1_out0 as
   S)
  {
   logic
    onTuple S :
    {
     println(S) ;
     if(! responseOk(S))
     {
      println("Error") ;
      submit(Sys.FinalMarker, Custom_3_out0) ;
     }

     rstring token = getJSONField(S.data, "access_token") ;
     submit({ token = token }, Custom_3_out0) ;
    }

  }
}
To extract authentication token from JSON response, a simple Java function is used
/* Generated by Streams Studio: November 6, 2015 12:16:07 AM GMT+01:00 */
package application;


import java.io.IOException;

import com.ibm.json.java.JSONObject;
import com.ibm.streams.function.model.Function;

/**
 * Class for implementing SPL Java native function. 
 */
public class GetJSONFieldImpl  {

    @Function(namespace="application", name="getJSONField", description="", stateful=false)
    public static String getJSONField(String JSON,String field) throws IOException {
     JSONObject obj = JSONObject.parse(JSON);
     return (String) obj.get(field);
    }
    
}
Using HTTPost operator and wrapper Java function to extract JSON data we can implement all POST Power BI REST API calls.
So far so good, but what about GET method to receive, for instance, a list of all data sets in Power BI dashboard or DELETE method to remove all rows from Power BI table?
Unfortunately, I was unable to make GET REST API call through  HTTPGetStream operator. This is because the HTTPGetStream operator is designed to provide a constant flow of input data, it is not designed for a single request-response action. Although it is possible to make use of this operator, I found it very artificial and decided to go the other way. Needless to say, HTTP DELETE method is not available at all.
Next approach, Java callouts
Prior to InfoSphere Streams application, I implemented a small Java utility class for accessing Power BI REST API calls. To avoid any complications, Java class is designed as a set of static stateless methods, without class instances or local variables. It is the client responsibility to store and pass Power BI authentication token. This way Java methods can be executed in parallel, which is essential for Streams development.
Source code for Java class is available here. The only dependency is GSON library and Apache HttpComponents. Usage examples are available here.
Power BI table schema definition
Table schema is defined as a Map<String, String> table schema. Map key is column name and value is column data type. Data types available are enumerated in PowerBI.java.

/** PowerBI column types */
 public static final String INT64_TYPE = "Int64";
 public static final String STRING_TYPE = "string";
 public static final String DOUBLE_TYPE = "Double";
 public static final String BOOL_TYPE = "bool";
 public static final String DATETIME_TYPE = "DateTime";
Using Java map is very convenient, the only disadvantage we cannot control the order of columns. It is ordered by column names alphabetically.
List of rows to be uploaded to Power BI table
A single row cell is defined by TableValue class. The table constructor determines the cell types. Example for integer value:

public TableValue(long intvalue) {
   this.intvalue = intvalue;
   vType = valueType.isint;
   stringvalue = null;
   doublevalue = 0;
   boolvalue = false;
   timeValue = null;
   timeS = null;
  }
Null value is supported as well. One row is specified by Map<String, TableValue>. The list of rows, chunk of data to be loaded, is defined by list of rows List<Map<String, TableValue>> 
List of methods
  • getAuthToken Authentication and procuring the authentication token
  • getDataSets Get list of datasets
  • getDataSetId Get data set id of a particular dataset
  • getDataSetTables Get list of tables of a particular dataset
  • createDataSet Create data set if does not exist and create a table belonging to this dataset
  • addTableRows Push data into the table
  • clearTable Remove content from the table
  • updateTableSchema Update schema of existing table. Important: this REST API call seems not working
  • checkTableDataSet Check if dataset and table exist, creates if necessary
Remarks
  • Only one table per dataset is created.
  • Columns in table schema is ordered alphabetically
  • Update table schema is not working
Next steps
Create Streams operator for pushing data into Power BI table.