kettle 7.0导数据hbase1.2.7 (测试)

来源:互联网 发布:新纪元软件无法安装 编辑:程序博客网 时间:2024/05/17 06:06

Get started



    kettle有对应的hadoop版本,kettle 7.0默认是对应的hadoop2.4。可以在data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations找到对应的hadoop版本。虽然是2.4 ,但是我用这个hadoo shim连接2.7版本的hadoop也是可以的。 





3、kettle 中使用java脚本


import java.util.UUID;public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {  if (first) {    first = false;    /* TODO: Your code here. (Using info fields)    FieldHelper infoField = get(Fields.Info, "info_field_name");    RowSet infoStream = findInfoRowSet("info_stream_tag");    Object[] infoRow = null;    int infoRowCount = 0;    // Read all rows from info step before calling getRow() method, which returns first row from any    // input rowset. As rowMeta for info and input steps varies getRow() can lead to errors.    while((infoRow = getRowFrom(infoStream)) != null){      // do something with info data      infoRowCount++;    }    */  }  Object[] r = getRow();  if (r == null) {    setOutputDone();    return false;  }  // It is always safest to call createOutputRow() to ensure that your output row's Object[] is large  // enough to handle any new fields you are creating in this step.  r = createOutputRow(r, data.outputRowMeta.size());  /* TODO: Your code here. (See Sample)  // Get the value from an input field  String foobar = get(Fields.In, "a_fieldname").getString(r);  foobar += "bar";      // Set a value in a new output field  get(Fields.Out, "output_fieldname").setValue(r, foobar);  */  // Send the row on to the next step.  String s = UUID.randomUUID().toString();   s=s.substring(0,8)+s.substring(9,13)+s.substring(14,18)+s.substring(19,23)+s.substring(24);   String foobar = get(Fields.In, "Region").getString(r);  logBasic(s+":log>>>"+foobar+get(Fields.In, "Province").getString(r));  putRow(data.outputRowMeta, r);  return true;}
注意:如果要使用其他的jar,需要将jar放到kettle对应的lib中 {kettle_home}\data-integration\lib 。然后在代码中引入。


0 0