You are not logged in. Click here to log in.

Application Lifecycle Management

Search In Project

Search inClear

Tags:  not added yet

R Binding for OMS

The R binding is an OMS module that allows to run R scripts as OMS components. In this way, R OMS components can be integrated into bigger multi-languages modeling solutions.

Because still under testing, the R binding is temporarily implemented in a forked OMS repository only, available You must login to see this link. Register now, if you have no user account yet..

Intro

Rserve

The R binding is based on Rserve back-ends developed by Simon Urbanek as part of REngine(generic java interface to R).

Rserve supports remote connection, authentication and file transfer, because it is designed on a client-server concept. One Rserve instance can serve multiple clients (applications) simultaneously ensuring separated data spaces for each one. Because R is not multi-thread safe, the synchronization of multiple concurrent connections is managed by Rserve. These features allow distributed computing on multiple machines and CPUs. Furthermore, to gain speed while minimazing the amount of transferred data, Rserve communicates with applications in a binary form.

Requirements

The R binding is part of the OMS docker image. You need to install Docker and Git following this procedure to process R OMS compliant components

R OMS compliant components

How to modify an R-script to become an OMS-compliant component

A standard R script needs to be slightly modified in order to get parsed by the R binding module.

First, adjust the R script following these simple rules:

  1. Identify inputs and outputs of the R component;
  2. Declare inputs and outputs at the very start of the R script;
  3. The result must be assigned to each output variable through the double arrow assignment operator <<-;
  4. If the script IS NOT split in functions, bind it into a main function excluding input/output declarations and packages loading;
  5. If the script IS split in functions, identify the main function that has to be executed;
  6. Comment every print() or show() commands. Rserve doesn't handle them at the moment;
  7. Functions cannot be declared into the call of another function. For example: a <- compute(a, function(x) { x*3 }) doesn't work. You need to first declare the function and then pass it to the called function.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
library(raster);

# @Execute
process <- function() {

    ...R script...

    simpleComputation <- function(x) {
        x * 3
    }

    a <- compute(a, simpleComputation)

    ...R script...

}

Listing 1: Example about how to declare and call a nested function

Second, add OMS annotations into the modified script:

  1. Because the R language doesn't have data types, the optional field for @In @Out annotations is REQUIRED. The optional field specifies which Java data type that variable corresponds to (see the example below);
  2. The @Execute annotation must be located above the signature of the main function;
  3. No annotation required by the other inner functions and scripts called from the main one.

Available data types

The bi-directional data transfer between Java and R supports the following Java standard data types:

  • int
  • double
  • String
  • int[]
  • double[]
  • String[]

Moreover, it supports the transfer of some objects, mapped in the following table:

Java-side R-side
GridCoverage2D Raster
CoverageStack RasterStack
List<GridCoverage2D> list() of Raster

For implementing these features, the raster package has been considered R-side, while the GridCoverage2D class from Geotools has been fully integrated Java-side. The CoverageStack object has been implemented on purpose. The latter (as the J2R and R2J classes, which manages the bi-directional data transfer) is automatically generated into the src/ directory of the OMS project when both of the following events happens:

  • the source code of the OMS project is built through ant all command;
  • an R component involving a Raster object or a RasterStack object is recognized by the building system into the source path.

If an R component is recognized into the source path, but there is no input/output of Raster objects or RasterStack objects, the CoverageStack class and the J2R and R2J classes are not generated. In this case, just the Java wrapper for the R component is generated.

NA check

The Double.NaN value is correctly converted into NA while transferring double values from Java to R and vice-versa

The Main Example

This is a very basic example that shows how to write three R OMS compliant components and how to connect them. It is made available You must login to see this link. Register now, if you have no user account yet.. Suppose you already have a Java component which provides the raster reading and another one which provides the raster writing.

Flow chart

This is a simple flow chart of the modeling solution. The orange boxes represent Java components, the blue boxes represent R components.

DirectedAcyclicGraph plugin failed: Could not find plugin DirectedAcyclicGraph
(com.ecyrd.jspwiki.plugin.PluginException:Could not find plugin DirectedAcyclicGraph, java.lang.ClassNotFoundException:Class 'DirectedAcyclicGraph' not found in search path!)

stack.R

The first R component simply takes two input maps and return a RasterStack.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(raster)

# @In("GridCoverage2D")
inMap1;

# @In("GridCoverage2D")
inMap2;

# @Out("CoverageStack")
outStack;

# @Execute
process <- function() {

    outStack <<- stack(x=c(inMap1,inMap2))

}

Listing 2: First component stack.R

transform.R

The second R component transforms the NA values to 0 into each map of the RasterStack and returns a List or Rasters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(raster)

# @In("CoverageStack")
rasterStack

# @Out("List")
rasterList

# @Execute
process <- function() {

    rasterList <<- transform(rasterStack)

}

transform <- function(rasterStack) {
    tmpList <-c()
    for(layer in 1:nlayers(rasterStack)) {
        map <- rasterStack[[layer]]
        map[is.na(map)] <- 0
        tmpList <- c(tmpList, map)
    }
    return(tmpList)
}

Listing 3: Second component transform.R

split.R

The third component splits the input list into two Rasters

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
library(raster)

# @In("List")
rasterList

# @Out("GridCoverage2D")
raster1

# @Out("GridCoverage2D")
raster2

# @Execute
process <- function() {

    raster1 <<- rasterList[[1]]
    raster2 <<- rasterList[[2]]

}

Listing 4: Third component split.R

The Simulation file (.sim) excercises the components above

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import static oms3.SimBuilder.instance as OMS3

def home = oms_prj

/*
 * Two connected components.
 * Parameter passes from 'sim' file into 'Comp1', 'Comp1' 
 * modifies it and passes it further to 'Comp2' which prints it out.
 */
OMS3.sim {

    resource "${home}/lib"
    build()

    model {
        components {
            "reader1"     "edu.colostate.engr.alm.RasterReader"
            "reader2"     "edu.colostate.engr.alm.RasterReader"
            "stack"         "edu.colostate.engr.alm.stack"
            "transform"  "edu.colostate.engr.alm.transform"
            "split"          "edu.colostate.engr.alm.split"
            "writer1"     "edu.colostate.engr.alm.RasterWriter"
            "writer2"     "edu.colostate.engr.alm.RasterWriter"
        }
        connect {
            // componentname,outfieldname" -> "componentname,infieldname"
            "reader1.gc" "stack.inMap1"
            "reader2.gc" "stack.inMap2"

            "stack.outStack" "transform.rasterStack"

            "transform.rasterList" "split.rasterList"

            "split.raster1" "writer1.gc"
            "split.raster2" "writer2.gc"
        }
        parameter {
            // feed the beginning of the pipeline!
            "reader1.file" "path/to/inFile1.tif"
            "reader2.file" "path/to/inFile2.tif"

            // feed the end of the pipeline!
            "writer1.file" "path/to/outFile1.tif"
            "writer2.file" "path/to/outFile2.tif"

        }
    }
}

Listing 5: The simulation file.

Other examples

Other examples are made available at the You must login to see this link. Register now, if you have no user account yet. repository (where examples and documentation about the usage of Standard Data Types are gathered) and the You must login to see this link. Register now, if you have no user account yet. repository (where examples and documentation about the usage of Raster and RasterStack objects are gathered).

@TODO

  • Massive testing under each Operating System (still memory issues under Windows 7 operating system);
  • Rserve cannot handle print() and show() commands;
  • R2J & J2R => check robustness (e.g. the crs and proj of maps from/to Lists/Stacks must be the same for each map);
  • Input and output of the RUG model must read and write each raster format. Right now, GeoTiff is the only available;
  • Get R component line that returns error;
  • Redesign of R2J: better implementation of the R2J class is required for future developments;
  • Is there a better memory management R-side for parallel loops?;
  • @In @Out => same variable but different data type doesn't work because of the Java wrapper;
  • Testing of multithreaded requests to a single Rserve instance => This seems working. Other tests required;
  • Stop simulation button doesn't kill Rserve service and R threads. Furthermore, if an R component fails, a running parallel R component doesn't stop.