spark+carbondata使用
来源:互联网 发布:淘宝主管岗位职责 编辑:程序博客网 时间:2024/06/01 09:34
一、部署
下载源码编译
修改配置文件
注意: 1.1.1 不支持spark2.2 会报错。
二、启动:
spark-shell –jars carbonlib/carbondata_2.11-1.1.1-shade-hadoop2.7.2.jar
三、使用
3.1创建上下文
import org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.CarbonSession._val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://127.0.0.1:9000/user/carbon/carbonstore")carbon.sql("show tables")
3.2创表
carbon.sql("CREATE TABLE dtwave_dev.carbon_tablename_new (name String, PhoneNumber String) STORED BY 'carbondata'")carbon.sql("insert into table dtwave_dev.carbon_tablename_new select * from dtwave_dev.spark_test")carbon.sql("select * from dtwave_dev.carbon_tablename_new").show
3.3 入库:
SQL:
CREATE TABLE tablename (name String, PhoneNumber String) STORED BY "carbondata"TBLPROPERTIES (...)LOAD DATA [LOCAL] INPATH 'folder path' [OVERWRITE] INTO TABLE tablename OPTIONS(...)INSERT INTO TABLE tablennme select_statement1 FROM table1;
DataFrame:
df.write.format(“carbondata").options("tableName", "t1")) .mode(SaveMode.Overwrite).save()
3.4 查询:
SELECT project_list FROM t1 WHERE cond_list
GROUP BY columns
ORDER BY columns
3.5 更新和删除
更新一列
UPDATE table1 ASET A.REVENUE = A.REVENUE - 10 WHERE A.PRODUCT = 'phone'Modify two columns in table1
更新两列
UPDATE table1 ASET (A.PRODUCT, A.REVENUE) =(SELECT PRODUCT, REVENUEFROM table2 BWHERE B.CITY = A.CITY AND B.BROKER = A.BROKER)WHERE A.DATE BETWEEN '2017-01-01' AND '2017-01-31'
3.6 插入
scala> carbon.sql("insert into table dtwave_dev.carbon_tablename_new select * from dtwave_dev.carbon_tablename_new")17/09/20 16:51:56 AUDIT rdd.CarbonDataRDDFactory$: [hulbdeMacBook-Air.local][hulb][Thread-1]Data load request has been received for table dtwave_dev.carbon_tablename_new17/09/20 16:51:56 WARN util.CarbonDataProcessorUtil: [Executor task launch worker-7][partitionID:new;queryID:164589401701872] sort scope is set to LOCAL_SORT17/09/20 16:51:56 WARN util.CarbonDataProcessorUtil: [Executor task launch worker-7][partitionID:new;queryID:164589401701872] batch sort size is set to 017/09/20 16:51:56 WARN util.CarbonDataProcessorUtil: [Executor task launch worker-7][partitionID:new;queryID:164589401701872] sort scope is set to LOCAL_SORT17/09/20 16:53:28 AUDIT rdd.CarbonDataRDDFactory$: [hulbdeMacBook-Air.local][hulb][Thread-1]Data load is successful for dtwave_dev.carbon_tablename_newcarbon.sql("select count(1) from dtwave_dev.carbon_tablename_new").show
3.6 删除
DELETE FROM table1 A WHERE A.CUSTOMERID = ‘123’scala> carbon.sql("select * from dtwave_dev.carbon_tablename_new").show+----+-----------+|name|PhoneNumber|+----+-----------+| 1| 2|| 3| 4|+----+-----------+scala> carbon.sql("delete from dtwave_dev.carbon_tablename_new a WHERE a.name='1'")17/09/20 14:51:43 AUDIT command.ProjectForDeleteCommand: [hulbdeMacBook-Air.local][hulb][Thread-1] Delete data request has been received for dtwave_dev.carbon_tablename_new.17/09/20 14:51:47 AUDIT command.deleteExecution$: [hulbdeMacBook-Air.local][hulb][Thread-1]Delete data operation is successful for dtwave_dev.carbon_tablename_newres1: org.apache.spark.sql.DataFrame = []scala> carbon.sql("select * from dtwave_dev.carbon_tablename_new").show+----+-----------+|name|PhoneNumber|+----+-----------+| 3| 4|+----+-----------+carbon.sql("update dtwave_dev.carbon_tablename_new A SET (A.name) = A.name WHERE A.PhoneNumber = '4'")carbon.sql("UPDATE dtwave_dev.carbon_tablename_new a SET (a.name, a.PhoneNumber) = ( SELECT '5' as name ,'6' from dtwave_dev.carbon_tablename_new b)")carbon.sql("UPDATE dtwave_dev.carbon_tablename_new a SET (a.name, a.PhoneNumber) = ( SELECT '5' as name ,'6' as PhoneNumber)")
四、HDFS对应的文件
4.1 HDFS 目录:
/user/carbon/carbonstore/dtwave_dev/carbon_tablename_newdrwxr-xr-x hulb supergroup 0 B 0 0 B Factdrwxr-xr-x hulb supergroup 0 B 0 0 B Metadata
4.2 数据文件
/user/carbon/carbonstore/dtwave_dev/carbon_tablename_new/Fact/Part0Permission Owner Group Size Replication Block Size Namedrwxr-xr-x hulb supergroup 0 B 0 0 B Segment_0drwxr-xr-x hulb supergroup 0 B 0 0 B Segment_1
4.3 元数据文件
/user/carbon/carbonstore/dtwave_dev/carbon_tablename_new/MetadataGo!Permission Owner Group Size Replication Block Size Name-rw-r--r-- hulb supergroup 16 B 2 128 MB 62acf472-3574-434e-a53f-f45901dff949.dict-rw-r--r-- hulb supergroup 11 B 2 128 MB 62acf472-3574-434e-a53f-f45901dff949.dictmeta-rw-r--r-- hulb supergroup 11 B 2 128 MB 62acf472-3574-434e-a53f-f45901dff949_16.sortindex-rw-r--r-- hulb supergroup 16 B 2 128 MB c5c7949a-a437-41d1-8f47-a7a81e68c4ba.dict-rw-r--r-- hulb supergroup 11 B 2 128 MB c5c7949a-a437-41d1-8f47-a7a81e68c4ba.dictmeta-rw-r--r-- hulb supergroup 11 B 2 128 MB c5c7949a-a437-41d1-8f47-a7a81e68c4ba_16.sortindex-rw-r--r-- hulb supergroup 387 B 2 128 MB schema-rw-r--r-- hulb supergroup 7.37 KB 2 128 MB tablestatus-rw-r--r-- hulb supergroup 243 B 2 128 MB tableupdatestatus-1505890299461
阅读全文
0 0
- spark+carbondata使用
- cdh上使用spark-thriftserver操作carbondata
- CarbonData 使用性能测试
- CarbonData编译、安装和集成Spark 2.2
- CarbonData使用示例(java)
- 关于CarbonData+Spark SQL的一些应用实践和调优
- Apache CarbonData
- carbondata 介绍
- 华为开源存储框架Carbondata在Cent.OS7.2下的编译到使用
- Holodesk VS CarbonData
- Carbondata源码阅读(1)
- carbondata 安装文档
- CarbonData源码阅读(3)
- cloudera cdh编译carbondata
- spark使用
- spark使用
- spark 使用
- Apache CarbonData快速入门指南
- centos7 安装mysql
- MySQL最新版安装配置教程
- 使用CodeWarrior编译时出现Warning:C12056 SP debug info incorrect because of optimization or inline assembler
- Python与设计模式(二)——builder
- Qt5显示中文
- spark+carbondata使用
- 开始写博客
- Jmeter 多用户同时登陆
- oracle创建dblink全过程
- Mybatis的工作流程
- textview 高级用法,设置下划线,颜色,加粗等(转载)
- iPhone X 屏幕适配,没有铺满屏幕的情况
- POJ-1979 Red and Bla
- Quartz(一) 简单的使用