spark连接cassandra配置说明

来源:互联网 发布:春晓软件注册码 编辑:程序博客网 时间:2024/06/08 16:39

Cassandra Authentication Parameters

All parameters should be prefixed with spark.cassandra.

Property NameDefaultDescriptionauth.conf.factoryDefaultAuthConfFactoryName of a Scala module or class implementing AuthConfFactory providing custom authentication configuration

Cassandra Connection Parameters

All parameters should be prefixed with spark.cassandra.

Property NameDefaultDescriptionconnection.compression Compression to use (LZ4, SNAPPY or NONE)connection.factoryDefaultConnectionFactoryName of a Scala module or class implementingCassandraConnectionFactory providing connections to the Cassandra clusterconnection.hostlocalhostContact point to connect to the Cassandra clusterconnection.keep_alive_ms250Period of time to keep unused connections openconnection.local_dcNoneThe local DC to connect to (other nodes will be ignored)connection.port9042Cassandra native connection portconnection.reconnection_delay_ms.max60000Maximum period of time to wait before reconnecting to a dead nodeconnection.reconnection_delay_ms.min1000Minimum period of time to wait before reconnecting to a dead nodeconnection.timeout_ms5000Maximum period of time to attempt connecting to a nodequery.retry.count10Number of times to retry a timed-out queryquery.retry.delay4 * 1.5The delay between subsequent retries (can be constant, like 1000; linearly increasing, like 1000+100; or exponential, like 1000*2)read.timeout_ms120000Maximum period of time to wait for a read to return

Cassandra DataFrame Source Parameters

All parameters should be prefixed with spark.cassandra.

Property NameDefaultDescriptiontable.size.in.bytesNoneUsed by DataFrames Internally, will be updated in a future release toretrieve size from C*. Can be set manually now

Cassandra SQL Context Options

All parameters should be prefixed with spark.cassandra.

Property NameDefaultDescriptionsql.clusterdefaultSets the default Cluster to inherit configuration fromsql.keyspaceNoneSets the default keyspace

Cassandra SSL Connection Options

All parameters should be prefixed with spark.cassandra.

Property NameDefaultDescriptionconnection.ssl.enabledfalseEnable secure connection to Cassandra clusterconnection.ssl.enabledAlgorithmsSet(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA)SSL cipher suitesconnection.ssl.protocolTLSSSL protocolconnection.ssl.trustStore.passwordNoneTrust store passwordconnection.ssl.trustStore.pathNonePath for the trust store being usedconnection.ssl.trustStore.typeJKSTrust store type

Read Tuning Parameters

All parameters should be prefixed with spark.cassandra.

Property NameDefaultDescriptioninput.consistency.levelLOCAL_ONEConsistency level to use when readinginput.fetch.size_in_rows1000Number of CQL rows fetched per driver requestinput.metricstrueSets whether to record connector specific metrics on writeinput.split.size_in_mb64Approx amount of data to be fetched into a Spark partition

Write Tuning Parameters

All parameters should be prefixed with spark.cassandra.

Property NameDefaultDescriptionoutput.batch.grouping.buffer.size1000How many batches per single Spark task can be stored inmemory before sending to Cassandraoutput.batch.grouping.keyPartitionDetermines how insert statements are grouped into batches. Available values are
  • none : a batch may contain any statements
  • replica_set : a batch may contain only statements to be written to the same replica set
  • partition : a batch may contain only statements for rows sharing the same partition key value
output.batch.size.bytes1024Maximum total size of the batch in bytes. Overridden byspark.cassandra.output.batch.size.rowsoutput.batch.size.rowsNoneNumber of rows per single batch. The default is 'auto'which means the connector will adjust the numberof rows based on the amount of datain each rowoutput.concurrent.writes5Maximum number of batches executed in parallel by a single Spark taskoutput.consistency.levelLOCAL_ONEConsistency level for writingoutput.metricstrueSets whether to record connector specific metrics on writeoutput.throughput_mb_per_sec2.147483647E9*(Floating points allowed)*
Maximum write throughput allowed per single core in MB/s.
Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability
1 0
原创粉丝点击