R Sys.setenv

来源:互联网 发布:seo帝国管理系统 编辑:程序博客网 时间:2024/06/06 02:21

 

 

http://stackoverflow.com/questions/17583846/failed-to-remotely-execute-r-script-which-loads-library-rhdfs

Failed to remotely execute R script which loads library “rhdfs”

up vote1down votefavorite

I'm working on a project using R-Hadoop, and got this problem.

I'm using JSch in JAVA to ssh to remote hadoop pseudo-cluster, and here are part of Java code to create connection.

/* Create a connection instance */Connection conn = new Connection(hostname);/* Now connect */conn.connect();/* Authenticate */boolean isAuthenticated = conn.authenticateWithPassword(username, password);if (isAuthenticated == false)throw new IOException("Authentication failed.");/* Create a session */Session sess = conn.openSession();//sess.execCommand("uname -a && date && uptime && who");sess.execCommand("Rscript -e 'args1 <- \"Dell\"; args2 <- 1; source(\"/usr/local/R/mytest.R\")'");//sess.execCommand("ls");sess.waitForCondition(ChannelCondition.TIMEOUT, 50);

I tried several simple R scripts, and my codes worked fine. But when it comes to R-Hadoop, the R script will stop running. But if I runRscript -e 'args1 <- "Dell"; args2 <- 1; source("/usr/local/R/mytest.R")' directly in remote server, everything works fine.

Here is what I got after taking Hong Ooi's suggestion:Instead of using Rscript, I used following command:

sess.execCommand("R CMD BATCH --no-save --no-restore '--args args1=\"Dell\" args2=1' /usr/local/R/mytest.R /usr/local/R/whathappened.txt");

And in the whathappened.txt, I got following error:

> args=(commandArgs(TRUE))> for(i in 1:length(args)){+      eval(parse(text=args[[i]]))+ }> source("/usr/local/R/main.R")> main(args1,args2)Loading required package: rJavaError : .onLoad failed in loadNamespace() for 'rhdfs', details:  call: fun(libname, pkgname)  error: Environment variable HADOOP_CMD must be set before loading package rhdfsError: package/namespace load failed for 鈥榬hdfs鈥?Execution halted

Well, now the problem is much clearer. Unfortunately, I'm pretty new to linux, and have no idea how to solve this.

share|improve this question
 
 
What error message(s) do you get with RHadoop? Are they Java or R errors?–Hong OoiJul 11 '13 at 4:53
 
@HongOoi The R script will automatically run in background in remote server, which means the command line user interface in remote server remains unchanged, thus I can't even know what exactly happened in remote server. Even if I add cat("blabla") to the R script, I'll not get any printed information in remote server. So I used a tricky method, generating txt files with name like "Inside xxx function" to see how far the script goes, which turns out it will stop every time when it  try to execute "library("whatever")"–Hao HuangJul 11 '13 at 17:02
 
You can usesink to redirect output to a file. That might help you diagnose what's going on.–Hong OoiJul 11 '13 at 17:05
 
@HongOoi Thanks for your advice! Check my question update, it shows more information. But I'm so new to linux, and I really don't know how to handle problems related to namespace things.–Hao HuangJul 11 '13 at 18:05
add comment

2 Answers

activeoldestvotes
up vote2down vote

Well, I solved this problem like this:

sess.execCommand("source /etc/profile; R CMD BATCH --no-save --no-restore '--args args1=\"Dell\" args2=1' /usr/local/R/mytest.R /usr/local/R/whathappened.txt");

The problem was caused by environment. SSH to the remote Hadoop cluster actually uses a different environment, so variables like $HADOOP_CMD will not be discovered. There are multiple ways to let the SSH session know how to pick the environment variables.

In my method, the "source /etc/profile" can tell the sshed environment where to find the environment virables.

share|improve this answer
 
 
add comment
No problem. We won't show you that ad again. Why didn't you like it?
Oops! I didn't mean to do this.
up vote2down voteaccepted

Well, I just found another solution by myself:

Instead of caring about env from outside Hadoop cluster, can set env in R scripts like:

Sys.setenv(HADOOP_HOME="put your HADOOP_HOME path here")Sys.setenv(HADOOP_CMD="put your HADOOP_CMD path here")library(rmr2)library(rhdfs)
share|improve this answer

 

0 0
原创粉丝点击