java.net.URISyntaxException: Relative path in absolute URI

来源:互联网 发布:mac装windows系统步骤 编辑:程序博客网 时间:2024/05/16 10:59


I was able to do some digging around in the latest Spark documentation, and I notice they have a new configuration setting that I hadn't noticed before:

spark.sql.warehouse.dir

So I went ahead and added this setting when I set up my SparkSession:

spark = SparkSession.builder \           .master('local[*]') \           .appName('My App') \           .config('spark.sql.warehouse.dir', 'file:///C:/path/to/my/') \           .getOrCreate()

That seems to set the working directory, and then I can just feed my filename directly into the csv reader:

df = spark.read \        .format('csv') \        .option('header', 'true') \        .load('file.csv', schema=mySchema) 

Once I set the spark warehouse, Spark was able to locate all of my files and my app finishes successfully now. The amazing thing is that it runs about 20 times faster than it did in Spark 1.6. So they really have done some very impressive work optimizing their SQL engine. Spark it up!



如果这篇文章无法解决你的问题,请看下面这篇转载的文章。


执行示例代码的 时候
遇到一个错误:
Relative path in absolute URI 
意思是相对路径出现在了绝对的统一资源定位符中
根据下面的参考:
http://stackoverflow.com/questions/38669206/spark-2-0-relative-path-in-absolute-uri-spark-warehouse
在构建SparkSession的时候,多传递一个一个路径参数的设置spark.sql.warehouse.dir
因为
pyspark.sql.utils.IllegalArgumentException: 'java.net.URISyntaxException: Relati
ve path in absolute URI: file:D:/software/spark-2.0.0-bin-hadoop2.7/examples/src
/main/python/ml/spark-warehouse'  
实际是读取当前路径下的spark.sql.warehouse.dir
这个设置应该是直接把这个做成了绝对路径
然后还需要把整个的data文件夹拷贝到当前的ml文件夹下
这样示例程序中原始的相对路径不用再修改了
因为我发现用../并不能从当前执行路径跳转到设置的data路径
0 0