Blog do projektu Open Source JavaHotel

czwartek, 29 grudnia 2016

IBM InfoSphere Streams, BigSql and HBase

Some time ago I created Java methods to load data into HBase using format understandable by BigSql. Now it is high time to move ahead and to create IBM InfoSphere operator making usage of this solution.
The Streams solution is available here. The short description is added here.
JConvert operator
JConvert operator does not load data into HBase, it should precede HBASEPut operator.
JConvert accepts one or more input streams and every input stream should have corresponding output stream. It simply encodes every attribute in the input stream to blob (binary) attribute in the output stream. The binary value is later loaded to HBase table by HBasePut operator.
Very important factor is to coordinate JConvert input stream with target HBase/BigSql table. Neither JConvert nor HBasePut can do that, if attributes and column types do not match then BigSql will not read the table properly. Conversion rules are explained here.
This operator is used for testing, it also contains a lot of usage examples.
Simple BigSql/HBase loading scenario.

On the left there is the producer, then JConvert translates all input attributes into binary format and HBasePut operator load binaries to HBase table.
More details about TestHBaseN.

poniedziałek, 26 grudnia 2016

Oracle -> DB2 migration, useful Java program

Some time ago I created a set of useful awk scripts to ease Oracle to DB2 migration. But facing the same task, I decided to rewrite it as Java program. The main disadvantage of the previous solution is that awk analyzes input file line after line. So, for an example, if CREATE OR REPLACE clause is split into two or more lines, the problem starts to be very complicated.
Java program
Java program is available as ready to use Java package or as an Eclipse project to be cloned and updated.
Java solution and a brief description are available as an open source project here. The "two lines problem" is resolved by means of simple Tokenizer. It decomposes input file to separate entities across lines boundaries: words, special characters like (, ),; etc. Then Extractor can pick up words one after one and recognize the beginning and end of particular SQL objects. Tokenizer is also reused during source file fixing.
Oracle Compatibility Mode in DB2 allows executing PL/SQL almost out of the box. Nevertheless, some common adjustment should be applied. A typical example is VARCHAR2(32767) or VARCHAR2(MAX). In DB2 the MAX is not supported and boundary for VARCHAR2 is 32672. If there are several occurrences of it no problem to fix it manually, but it could be the challenge in the case of hundreds or thousands of them.
So after splitting the input Oracle source file into separate SQL object, a set of adjustment is executed over them. Every adjustment is a single Java class implementing IFix interface. Current adjustments are stored in org.migration.fix.impl package. Every class should be registered in MainExtract.
This way, assuming that project is cloned as Eclipse project, adjustments can be easily extended.
DB2 comparison
As a part of the solution not covered by previous awk implementation is DB2 comparison tool. It compares the list of Oracle objects against the list of objects migrated and deployed into DB2 database. In the case of hundreds or thousands of objects, it is easy to miss something.
I was using this solution during real migration and found it very useful. It was worth spending several days to implement it. The most important part is the feature allowing easy extension of the existing set of source file adjustment. I was able to quickly implement next adjustment (together with Junit test) immediately after finding next problem in source Oracle code.

sobota, 26 listopada 2016

Moving to Heroku

Finally, I decided to abandon Google App Engine and move on. The main reason for giving up this platform are:
  • Lack of support for Java 8, only Java 7 is available
  • "Whitelist" or black list. It sets a boundary and the software cannot grow. For instance: I'd like to add Scala support but Scala cannot be executed inside Google App Engine because it uses packages forbidden. And there is no way to bypass this restriction.
I came to the conclusion that being bound to this platform reduces my capabilities to grow. The glory of supporting one more environment does not prevail the losses.
In searching for alternatives my finger stopped at Heroku
I spent some time refactoring the application and removing redundant stuff and now the application is ready. I'm using free Heroku account and it can take some time until the application shows up.
Eclipse projects
I added the instruction how to set up Eclipse projects and run the sample application locally.
Tomcat deployment
I created build.xml file, so by running ant command the .war file is built ready for Tomcat. More information : Data source should be defined beforehand. The application was tested with Derby and Postgresql databases.
Heroku deployment
The same .war can be deployed as Heroku Tomcat application and run on Heroku Postgresql database. More information:
Heroku, Tomcat
Although Heroku supports Tomcat container, I discovered that I was unable to read environment variable from META-INF/context.xml file. It returns null all the time. So was forced to split the Heroku code from standard Tomcat code to retrieve some configuration settings.
The Heroku version is recognized by discovering webapp-runner.jar file in classpath.
  if (isJarOnPath(classPath, "webapp-runner.jar"))
   return ContainerType.HEROKU;
Then by virtue of guice it i easy to inject proper configuration properties, particularly related to datasource definition.
  IJythonUIServerProperties getServerProperties(IGetResourceJNDI getJNDI, IReadResourceFactory iFactory,
    IReadMultiResourceFactory mFactory, IGetEnvVariable iGet) {
   if (ContainerInfo.getContainerType() == ContainerType.HEROKU)
    return new HerokuServerProperties(iFactory, mFactory);
   return new ServerPropertiesEnv(getJNDI, iFactory, mFactory, iGet);

I also implemented GWT 2.8 and the latest gwt-polymer But finally, I can say : "Welcome to Java 8 world".

poniedziałek, 31 października 2016

Ubuntu 14.04, cups

I spent several hours trying to remove an unnecessary printer in Ubuntu 14.04. Unfortunately, I was unable to pass authentication required for cups administration tasks. It does not matter if WEB UI or Ubuntu System Preferences is used, both are using the same authentication mechanism. I spent a lot of time adding and removing lpadmin users and restarting cups service.
Log file: /var/log/cups/error_log reports constantly the same message:

E [31/Oct/2016:13:35:52 +0100] [Client 8] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:13:42:09 +0100] [Client 8] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:13:43:18 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:13:54:13 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:14:01:28 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:14:04:33 +0100] [Client 17] pam_authenticate() returned 3 (Error in service module)
E [31/Oct/2016:14:07:31 +0100] [Client 19] pam_authenticate() returned 3 (Error in service module)

More talkative was /var/log/syslog and /var/log/auth.log

Oct 31 10:50:18 sb-ThinkPad-W540 kernel: [ 3104.189523] type=1400 audit(1477907418.556:75): apparmor="DENIED" operation="signal" profile="/usr/sbin/cupsd" pid=7913 comm="cupsd" requested_mask="send" denied_mask="send" signal=term peer="unconfined
The solution is described here. The AppArmor Protection service is badly configured and blocks any attempt for cupds to authenticate lpadmin user.
The workaround is to disable AppArmor temporarily for cupsd

sudo aa-complain cupsd
execute all administrative tasks and when it is done to harden security again

sudo aa-enforce cupsd
I'm not very happy with that but finally I removed disliked printer from my desktop.

New features in Jython MVP framework, more Polymer

I implemented next bunch of Polymer web components from GWT Polymer show case. New components implemented: Header Panel, Icon Button, Item, Input, Material, Menu, Progress, Radio Button, Radio Group, Ripple, Spinner, Tabs.
The demo is launched from here, the source code is here.
Example screenshots

I also implemented ui:style element with CSS class name obfuscation. This way we can have a "private" CSS class name without the risk, that "local" class name will collide with other CSS class definitions. Also, the framework does not allow to use misspelled class names.
An example for UIBinder with ui:style element.
Pending problems
  • Popup dialogs do not work in Firefox browser
  • Not all web components render well, for instance: Tabs
  • Implementation of Progress component is still waiting for action for "Start" button
Next steps
Continue with Paper components. Having ui:style element implemented, makes copying and pasting UIBinder files from GWT Polymer show case easier.

czwartek, 29 września 2016

IBM InfoSphere Streams, Big SQL and HBase

Big SQL can run over HBase. IBM InfoSphere Streams does not have any mean to load data directly to Big SQL. Although it is possible to use general-purpose Database Toolkit (DB2), running INSERT statement for bulk data loading is very inefficient.
Another idea is to define Big SQL table as HBase table (CREATE HBASE TABLE) and load data directly to underlying HBase table. But HBase, unlike Hive,  is a schemaless database, everything is stored as a sequence of bytes. So the content of HBase table should be stored using format supported by Big SQL.
IBM InfoSphere Streams provides HBase toolkit, but HBasePut operator cannot be used directly. For instance, string value stored by HBasePut operator is not valid CHAR column in terms of Big SQL, it should be pre-formated beforehand to binary format.
The solution is to develop additional Streams operator to encode all SPL types to binary (blob) format and such binary stream to be consumed by HBasePut operator.
Instead of (example):
stream<rstring key, tuple<rstring title, rstring author_fname,
   rstring author_lname, rstring year, rstring rating> bookData> toHBASE =
    toHBASE : key = title + ":" + year, bookData = bookStream ;

  // Now put it into HBASE.  We don't specify a columnQualifier and the attribute
  // given by valueAttrName is a tuple, so it treats the attribute names in that 
  // tuple as columnQualifiers, and the attribute values 
  () as putsink = HBASEPut(toHBASE)
    rowAttrName : "key" ;
    tableName : "streamsSample_books" ;
    staticColumnFamily : "all" ;
    valueAttrName : "bookData" ;

stream<rstring key, tuple<rstring title, rstring author_fname,
   rstring author_lname, rstring year, rstring rating> bookData> toEncode =
    toHBASE : key = title + ":" + year, bookData = bookStream ;

 stream<blob key, tuple<blob title, blob author_fname, blob author_lname, blob year, blob rating> bookData> toHBase

  // Now put it into HBASE.  We don't specify a columnQualifier and the attribute
  // given by valueAttrName is a tuple, so it treats the attribute names in that 
  // tuple as columnQualifiers, and the attribute values 
  () as putsink = HBASEPut(toHBASE)
    rowAttrName : "key" ;
    tableName : "streamsSample_books" ;
    staticColumnFamily : "all" ;
    valueAttrName : "bookData" ;
HBaseEncode operator is to be implemented as Java operator. Before developing it I decided to create a simple Java project encoding a subset of Java types to the binary format accepted by Big SQL.
The Big SQL HBase data format is described here,so it looked as a simple coding task. But unfortunately, the description is not accurate, so I was bogged down by unexpected problems. For instance: NULL value is marked by 0x01 value, not 0x00. Also, the source code for Hive SerDe is not very useful, because Big SQL encoding diverges in many points.
So I ended up with loading data through Big SQL INSERT command and analyzing a binary content of underlying HBase table trying to guess the proper binary format.
Java project
The Java project is available here. It consists of several subprojects.
HBaseBigSql  (Javadoc) will be used by Streams operator directly. It does not have any dependency. Big SQL types supported are specified by enum BIGSQLTYPE. The most important class is class containing the result of painstaking process revealing HBase binary format for all Big SQL types.
HBasePutGet (Javadoc) subproject contains several supporting classes to put data into HBase table. It has HBase client dependency.
TestH is Junit tests. The test case was very simple. CREATE HBASE TABLE, load data to underlying HBase table and get data through Big SQL SELECT statement and compare the result, whether data stored in HBase equals to data retrieved by Big SQL.
Possible extensions
  • The solution only writes data to HBase in Big SQL format. Opposite is to code a methods to read data from HBase table.
  • Compound indexes and columns are not supported.
Next step
Develop IBM InfoSphere Streams HBaseEncode operator.

poniedziałek, 29 sierpnia 2016

Jednolity Plik Kontrolny, Standard Audit File

Jednolity Park Komtrolny  (JPK), Standard Audit File, is a new requirement of Polish Ministry of Finance to improve efficiency in tax control and to narrow the space for tax evasion.
The protocol for sending finance data to tax office via REST gateway is described here (in Polish).
The taxpayer should send tax reports using the specified format and procedure. Unfortunately, although the protocol comprises of well-known encrypting and signing methods, Ministry of Finance does not share any code sample how to accomplish the task.
In order to alleviate the burden, because creating everything from scratch could be a painstaking process, I decided to develop a simple Open Source project in Java, covering all area of encoding and data sending via REST protocol. The source code is available here.
The project can be utilized in two ways. As an API, Javadoc will be available soon, for Java solutions, or as a command line for a non-Java solution.
The project requires preparing a property file containing a set of parameters. The sample property file is available here. Following parameters should be defined.
  • conf Directory containing a set of artifacts necessary to run the solution. Sample directory is available here.
    • JPKMFTest-klucz publiczny do szyfrowania.pem :  Public key to encode the symmetric key
    • initupload-enveloped-pattern.xml : Pattern for creating InitUpload XML file used to initiate the data transmition. The file contains a number of position markers to be replaced by current values. I found this solution more applicable then creating XML file on the fly.
    • log.conf : JUL logging configuration. FileHandler is added automatically, only ConsoleLogger should be defined here.
    • : Certificate used to authorize access to the public gateway.
  • workdir : Working directory to keep temporary data between different phases of data transmission. It is  also a place to keep a log file. This directory is cleaned at the beginning of the first phase,  so it is a responsibility of solution user to backup this directory.
  • publickey : The name of the file with a public key in conf directory.
  • cert : The name of the certificate file (X.509) in conf directory.
  • url : The URL to send InitUpload.xml file, transfer initialization.
  • finish: The URL to signal the transmission completion.
  • get: The URL to receive UPO (Urzędowe Potwierdzenie Odbioru), Official Receipt Confirmation.
Solution structure
The solution is developed as several static API methods and corresponding main methods for command line application.
Sending financial data to gateway comprises several steps described in manual
1. Preparing initial data
During this step InitUpload.xml file is created and input financial data are zipped and encrypted. Preparing UnitUpload.xml requires several steps like generating the symmetric key, making MD5 and HASH-256 hash for symmetric key and input data. The procedure is described in comments embedded in source code.
API method : JPK.Prepare method.
Command line: Transform.main method.
2. Signing Initupload.xml
This step should be done manually. There are several public certification authorities in Poland and every one provides its own application for signing documents.
3. Uploading data to gateway
This step uses InitUploadSigned, PutBlob and FinishUpload REST API methods.
API method: UPLOAD.upload method.
Command line: Upload.main method.
4. Receiving UPO, Official Receipt Confirmation
This step uses Status REST API method. UPO is available for download after some time, so the method should be launched is some time intervals untill UPO is received.
API method : UPLAOD.getUpo method.
Command line: GetUPO.main method.