Cassandra in Spring

来源:互联网 发布:智联招聘 java开发 编辑:程序博客网 时间:2024/06/06 19:50

最近在学习Cassandra,发现只有安装配置的教程,但是对于如何实际使用介绍的很少,即使有也是非常间的示例,或者通过EasyCassandra的方案无法继续。找到下面的文章,非常难得,转过来,希望给大家帮助。

转载自:http://middlewaresnippets.blogspot.com/2015/02/cassandra-in-spring.html



Cassandra in Spring

In this post we set-up Apache Cassandra. In order to do this in a consistent manner across multiple hosts (using a tarball), we create templates. After we have set-up a Cassandra cluster, we will use Spring Data to create Java clients.

Introduction

Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure. Its architecture is based on the understanding that system and hardware failures can and do occur. Cassandra addresses the problem of failures by employing a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster. A commit log on each node captures write activity to ensure data durability. Data is then indexed and written to an in-memory structure, called a memtable, which resembles a write-back cache. Once the memory structure is full, the data is written to disk in anSSTable (Sorted String Table) data file. All writes are automatically partitioned and replicated throughout the cluster. Using a process called compaction Cassandra periodically consolidates SSTables, discarding obsolete data and tombstones (an indicator that data was deleted). More information can be read in the write path to compaction.

Cassandra uses a storage structure similar to a Log-Structured Merge Tree (The Log-Structured Merge-Tree (LSM-Tree)), unlike a typical relational database that uses aB-Tree (B-Tree Visualization). Cassandra avoids reading before writing. Read-before-write, especially in a large distributed system, can produce stalls in read performance and other problems. Reading before writing also corrupts caches and increases IO requirements. To avoid a read-before-write condition, the storage engine groups inserts/updates to be made, and sequentially writes only the updated parts of a row in append mode. Cassandra never re-writes or re-reads existing data, and never overwrites the rows in place. More information can be read in thedatabase internals.

Install and configure Cassandra

Choosing appropriate hardware depends on selecting the right balance of the following resources: memory, CPU, disks, number of nodes, and network:
  • Memory, the more memory a Cassandra node has, the better the read performance. More RAM also allows memory tables (memtables - a table-specific in-memory data structure that resembles a write-back cache) to hold more recently written data. Larger memtables lead to a fewer number of SSTables being flushed to disk and fewer files to scan during a read. The ideal amount of RAM depends on the anticipated size of the hot data.
  • CPU, insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound (all writes go to the commit log - a file to which Cassandra appends changed data for recovery in the event of a hardware failure).
  • Disk, disk space depends on usage, so it is important to understand the mechanism. Cassandra writes data to disk when appending data to the commit log for durability and when flushing the memtable to SSTable data files for persistent storage. The commit log has a different access pattern (read/writes ratio) than the pattern for accessing data from SSTables. SSTables are periodically compacted. Compaction improves performance by merging and rewriting data and discarding old data. Depending on the type of compaction strategy and size of the compactions, compaction increases disk utilization, and we should leave an adequate amount of free disk space available on the node.
  • Network, since Cassandra is a distributed data store, it puts load on the network to handle read/write requests and replication of data across nodes. Be sure that the network can handle traffic between nodes without bottlenecks. Cassandra routes requests to replicas that are geographically closest to the coordinator node.
We set-up our environment by using the comments and steps described in the post Notes on Virtualizing Java EE Systems. The document Selecting hardware for enterprise implementations provides more information on selecting hardware resources. Implementation or design patterns that are ineffective and/or counterproductive in Cassandra production installations are described in the documentAnti-patterns in Cassandra.

Understanding the performance characteristics of the Cassandra cluster is critical to diagnosing issues and planning capacity. Cassandra exposes a number of statistics and management operations via JMX. During operation, Cassandra outputs information and statistics that can be monitored using JMX-compliant tools, for example,
  • the nodetool command-line utility;
  • the DataStax OpsCenter management console; or
  • JMX clients such as JConsole, Java VisualVM, and Java Mission Control. More information can be found in the post Monitoring with JMX.
In order to make the installation and configuration Cassandra consistent across different hosts by using a tarball installation, we create a template. To this end, we parameterize the following configuration files:
  • conf/cassandra.yaml, is the main configuration file.
  • conf/cassandra-env.sh contains Java Virtual Machine configuration settings.
  • bin/cassandra.in.sh sets up environment variables such as CLASSPATH and JAVA_HOME.
For the cassandra.yaml file we have
  1. # The name of the cluster. This is mainly used to prevent machines in
  2. # one logical cluster from joining another.
  3. cluster_name: '<%= @CLUSTER_NAME %>'
  4.  
  5. # This defines the number of tokens randomly assigned to this node on the ring
  6. # The more tokens, relative to other nodes, the larger the proportion of data
  7. # that this node will store. You probably want all nodes to have the same number
  8. # of tokens assuming they have equal hardware capability.
  9. #
  10. # If you leave this unspecified, Cassandra will use the default of 1 token for legacy compatibility,
  11. # and will use the initial_token as described below.
  12. #
  13. # Specifying initial_token will override this setting on the node's initial start,
  14. # on subsequent starts, this setting will apply even if initial token is set.
  15. #
  16. # If you already have a cluster with 1 token per node, and wish to migrate to
  17. # multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
  18. num_tokens: <%= @NUM_TOKENS %>
  19.  
  20. ...
  21.  
  22. # any class that implements the SeedProvider interface and has a
  23. # constructor that takes a Map<String, String> of parameters will do.
  24. seed_provider:
  25. # Addresses of hosts that are deemed contact points.
  26. # Cassandra nodes use this list of hosts to find each other and learn
  27. # the topology of the ring. You must change this if you are running
  28. # multiple nodes!
  29. - class_name: org.apache.cassandra.locator.SimpleSeedProvider
  30. parameters:
  31. # seeds is actually a comma-delimited list of addresses.
  32. # Ex: "<ip1>,<ip2>,<ip3>"
  33. - seeds: "<%= @SEED_ADDRESSES %>"
  34.  
  35. ...
  36.  
  37. # TCP port, for commands and data
  38. storage_port: <%= @STORAGE_PORT %>
  39.  
  40. # SSL port, for encrypted communication. Unused unless enabled in
  41. # encryption_options
  42. ssl_storage_port: <%= @SSL_STORAGE_PORT %>
  43.  
  44. # Address or interface to bind to and tell other Cassandra nodes to connect to.
  45. # You _must_ change this if you want multiple nodes to be able to communicate!
  46. #
  47. # Set listen_address OR listen_interface, not both. Interfaces must correspond
  48. # to a single address, IP aliasing is not supported.
  49. #
  50. # Leaving it blank leaves it up to InetAddress.getLocalHost(). This
  51. # will always do the Right Thing _if_ the node is properly configured
  52. # (hostname, name resolution, etc), and the Right Thing is to use the
  53. # address associated with the hostname (it might not be).
  54. #
  55. # Setting listen_address to 0.0.0.0 is always wrong.
  56. listen_address: <%= @LISTEN_ADDRESS %>
  57. # listen_interface: eth0
  58.  
  59. ...
  60.  
  61. # Whether to start the native transport server.
  62. # Please note that the address on which the native transport is bound is the
  63. # same as the rpc_address. The port however is different and specified below.
  64. start_native_transport: true
  65. # port for the CQL native transport to listen for clients on
  66. native_transport_port: <%= @NATIVE_TRANSPORT_PORT %>
  67.  
  68. ...
  69.  
  70. # The address or interface to bind the Thrift RPC service and native transport
  71. # server to.
  72. #
  73. # Set rpc_address OR rpc_interface, not both. Interfaces must correspond
  74. # to a single address, IP aliasing is not supported.
  75. #
  76. # Leaving rpc_address blank has the same effect as on listen_address
  77. # (i.e. it will be based on the configured hostname of the node).
  78. #
  79. # Note that unlike listen_address, you can specify 0.0.0.0, but you must also
  80. # set broadcast_rpc_address to a value other than 0.0.0.0.
  81. rpc_address: <%= @RPC_ADDRESS %>
  82. # rpc_interface: eth1
  83.  
  84. # port for Thrift to listen for clients on
  85. rpc_port: <%= @RPC_PORT %>
  86.  
  87. # RPC address to broadcast to drivers and other Cassandra nodes. This cannot
  88. # be set to 0.0.0.0. If left blank, this will be set to the value of
  89. # rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must
  90. # be set.
  91. broadcast_rpc_address: <%= @BROADCAST_RPC_ADDRESS %>
  92.  
  93. ...
  94.  
  95. # endpoint_snitch -- Set this to a class that implements
  96. # IEndpointSnitch. The snitch has two functions:
  97. # - it teaches Cassandra enough about your network topology to route
  98. # requests efficiently
  99. # - it allows Cassandra to spread replicas around your cluster to avoid
  100. # correlated failures. It does this by grouping machines into
  101. # "datacenters" and "racks." Cassandra will do its best not to have
  102. # more than one replica on the same "rack" (which may not actually
  103. # be a physical location)
  104. #
  105. # IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
  106. # YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS
  107. # ARE PLACED.
  108. #
  109. # Out of the box, Cassandra provides
  110. # - SimpleSnitch:
  111. # Treats Strategy order as proximity. This can improve cache
  112. # locality when disabling read repair. Only appropriate for
  113. # single-datacenter deployments.
  114. # - GossipingPropertyFileSnitch
  115. # This should be your go-to snitch for production use. The rack
  116. # and datacenter for the local node are defined in
  117. # cassandra-rackdc.properties and propagated to other nodes via
  118. # gossip. If cassandra-topology.properties exists, it is used as a
  119. # fallback, allowing migration from the PropertyFileSnitch.
  120. # - PropertyFileSnitch:
  121. # Proximity is determined by rack and data center, which are
  122. # explicitly configured in cassandra-topology.properties.
  123. # - Ec2Snitch:
  124. # Appropriate for EC2 deployments in a single Region. Loads Region
  125. # and Availability Zone information from the EC2 API. The Region is
  126. # treated as the datacenter, and the Availability Zone as the rack.
  127. # Only private IPs are used, so this will not work across multiple
  128. # Regions.
  129. # - Ec2MultiRegionSnitch:
  130. # Uses public IPs as broadcast_address to allow cross-region
  131. # connectivity. (Thus, you should set seed addresses to the public
  132. # IP as well.) You will need to open the storage_port or
  133. # ssl_storage_port on the public IP firewall. (For intra-Region
  134. # traffic, Cassandra will switch to the private IP after
  135. # establishing a connection.)
  136. # - RackInferringSnitch:
  137. # Proximity is determined by rack and data center, which are
  138. # assumed to correspond to the 3rd and 2nd octet of each node's IP
  139. # address, respectively. Unless this happens to match your
  140. # deployment conventions, this is best used as an example of
  141. # writing a custom Snitch class and is provided in that spirit.
  142. #
  143. # You can use a custom Snitch by setting this to the full class name
  144. # of the snitch, which will be assumed to be on your classpath.
  145. endpoint_snitch: <%= @ENDPOINT_SNITCH %>
As we will be running on Java 8, we use the Garbage First Collector and TieredCompilation (which is the default in Java 8). More information on howTieredCompilation works and how to tune the Garbage First Collector can be found in the postJava Virtual Machine Code Generation and Optimization. For the cassandra-env.sh file we have (the options that are not shown are disabled)
  1. MAX_HEAP_SIZE=<%= @HEAP_SIZE %>
  2. HEAP_NEWSIZE=<%= @NURSERY_SIZE %>
  3.  
  4. # Specifies the default port over which Cassandra will be available for JMX connections.
  5. JMX_PORT=<%= @JMX_PORT %>
  6.  
  7. JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
  8. JVM_OPTS="$JVM_OPTS -Xmx${MAX_HEAP_SIZE}"
  9.  
  10. # Larger interned string table, for gossip's benefit (CASSANDRA-6410)
  11. JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
  12.  
  13. # GC tuning options
  14. JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
  15. JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=<%= @PAUSE_TIME_GOAL_MILLIS %>"
  16. # Settings to be able to take flight recordings (optional, needs the JavaSE Advanced license in production environments)
  17. JVM_OPTS="$JVM_OPTS -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dname=<%= @SERVER_NAME %>"
  18.  
  19. JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"
  20. JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
  21. JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"
  22. JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
  23. JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
Note that, we have also disabled the new generation setting (Xmn). As we are working with a pause time goal we must avoid explicitly setting the new generation size, as this overrides the pause time goal. The experimental flagsG1NewSizePercent (default 5%) and G1MaxNewSizePercent (default 60%) can be used to respectively set minimum and maximum for the new generation size. Values for experimental flags can be changed by enablingUnlockExperimentalVMOptions. To see what the default values of command-line options are (such as the thread stack size) refer to thecommand-line options reference. For information on the use of thread priorities refer to the documentMap Thread priorities to system thread/process priorities.

In the cassandra.in.sh file we only change the JAVA_HOME variable (the rest remains unchanged)
  1. ...
  2. # JAVA_HOME can optionally be set here
  3. JAVA_HOME="<%= @JAVA_HOME_DIRECTORY %>"
  4. ...
To install and configure Cassandra on a particular host, we can use
  1. #!/bin/sh
  2.  
  3. # directory where the tar file is located
  4. SOFTWARE_DIRECTORY="/u01/software/apache/cassandra/2.1.2"
  5.  
  6. # user and group that will be the owner of Apache Cassandra
  7. USER_NAME="weblogic"
  8. GROUP_NAME="javainstall"
  9.  
  10. # directory where Apache Cassandra will be installed
  11. CASSANDRA_ROOT="/u01/app/apache/cassandra2.1.2"
  12. # the name of the cluster
  13. CLUSTER_NAME="TestCluster"
  14. # the name of the cassandra instance (only used for flight recording purposes)
  15. SERVER_NAME="cassandra"
  16. # the number of tokens randomly assigned to the node on the ring
  17. NUM_TOKENS="256"
  18. # address to bind to and tell other Cassandra nodes to connect to
  19. LISTEN_ADDRESS=$(hostname)
  20. # address to bind the Thrift RPC service and native transport server to
  21. RPC_ADDRESS="0.0.0.0"
  22. # if rpc_address is set to 0.0.0.0, broadcast_rpc_address must be set to a value other than 0.0.0.0
  23. BROADCAST_RPC_ADDRESS=$(hostname)
  24. # comma-delimited list of hosts that are deemed contact points (this list is used to find nodes and learn the topology of the ring)
  25. SEED_ADDRESSES="machine1.com"
  26. # endpoint snitch, the snitch has two functions:
  27. # - teach Cassandra about the network topology to route requests efficiently
  28. # - allow Cassandra to spread replicas around the cluster to avoid correlated failures.
  29. ENDPOINT_SNITCH="SimpleSnitch"
  30. # storage port
  31. STORAGE_PORT="7080"
  32. # ssl storage port
  33. SSL_STORAGE_PORT="7443"
  34. # native transport server port
  35. NATIVE_TRANSPORT_PORT="9042"
  36. # port for Thrift to listen for clients on
  37. RPC_PORT="9160"
  38.  
  39. # Java settings
  40. # directory such that JAVA_HOME_DIRECTORY/bin contains the java executable
  41. JAVA_HOME_DIRECTORY="/u01/app/oracle/weblogic12.1.3/jdk1.8.0_31"
  42. # heap size (Xmx and Xms are set to equal sizes
  43. HEAP_SIZE="1024m"
  44. # nursery size (Xmn)
  45. NURSERY_SIZE="256m"
  46. # as we are using the garbage first collector give a goal for the pause times
  47. PAUSE_TIME_GOAL_MILLIS="200"
  48. # port over which JMX connections are accepted
  49. JMX_PORT="7199"
  50.  
  51. extract_template() {
  52. echo "EXTRACTING TEMPLATE"
  53. # the apache-cassandra-2.1.2.tar.gz contains the following directory structure
  54. # /CASSANDRA_ROOT
  55. # /bin
  56. # cassandra (used to start apache cassandra)
  57. # cassandra.in.sh (sets environment variables, such as CASSANDRA_HOME and JAVA_HOME)
  58. # /conf
  59. # cassandra.yaml (apache cassandra storage configuration)
  60. # cassandra-env.sh (sets the JVM options)
  61. # /data (contains data files)
  62. # /lib (contains libraries)
  63.  
  64. if [ ! -d "${CASSANDRA_ROOT}" ]; then
  65. mkdir -p ${CASSANDRA_ROOT}
  66. fi
  67.  
  68. tar xf ${SOFTWARE_DIRECTORY}/apache-cassandra-2.1.2.tar.gz -C ${CASSANDRA_ROOT}
  69. ROOT_CASSANDRA_ROOT=$(echo ${CASSANDRA_ROOT} | cut -d "/" -f2)
  70. chown -R ${USER_NAME}:${GROUP_NAME} /${ROOT_CASSANDRA_ROOT}
  71. }
  72.  
  73. edit_files() {
  74. echo "EDITING cassandra.in.sh"
  75. sed -i -e '/<%= @JAVA_HOME_DIRECTORY %>/ s;<%= @JAVA_HOME_DIRECTORY %>;'${JAVA_HOME_DIRECTORY}';g' ${CASSANDRA_ROOT}/bin/cassandra.in.sh
  76.  
  77. echo "EDITING cassandra.yaml"
  78. sed -i -e '/<%= @CLUSTER_NAME %>/ s;<%= @CLUSTER_NAME %>;'${CLUSTER_NAME}';g' \
  79. -e '/<%= @NUM_TOKENS %>/ s;<%= @NUM_TOKENS %>;'${NUM_TOKENS}';g' \
  80. -e '/<%= @SEED_ADDRESSES %>/ s;<%= @SEED_ADDRESSES %>;'${SEED_ADDRESSES}';g' \
  81. -e '/<%= @STORAGE_PORT %>/ s;<%= @STORAGE_PORT %>;'${STORAGE_PORT}';g' \
  82. -e '/<%= @SSL_STORAGE_PORT %>/ s;<%= @SSL_STORAGE_PORT %>;'${SSL_STORAGE_PORT}';g' \
  83. -e '/<%= @LISTEN_ADDRESS %>/ s;<%= @LISTEN_ADDRESS %>;'${LISTEN_ADDRESS}';g' \
  84. -e '/<%= @NATIVE_TRANSPORT_PORT %>/ s;<%= @NATIVE_TRANSPORT_PORT %>;'${NATIVE_TRANSPORT_PORT}';g' \
  85. -e '/<%= @RPC_ADDRESS %>/ s;<%= @RPC_ADDRESS %>;'${RPC_ADDRESS}';g' \
  86. -e '/<%= @BROADCAST_RPC_ADDRESS %>/ s;<%= @BROADCAST_RPC_ADDRESS %>;'${BROADCAST_RPC_ADDRESS}';g' \
  87. -e '/<%= @RPC_PORT %>/ s;<%= @RPC_PORT %>;'${RPC_PORT}';g' \
  88. -e '/<%= @ENDPOINT_SNITCH %>/ s;<%= @ENDPOINT_SNITCH %>;'${ENDPOINT_SNITCH}';g' ${CASSANDRA_ROOT}/conf/cassandra.yaml
  89. echo "EDITING cassandra-env.sh"
  90. sed -i -e '/<%= @HEAP_SIZE %>/ s;<%= @HEAP_SIZE %>;'${HEAP_SIZE}';g' \
  91. -e '/<%= @NURSERY_SIZE %>/ s;<%= @NURSERY_SIZE %>;'${NURSERY_SIZE}';g' \
  92. -e '/<%= @JMX_PORT %>/ s;<%= @JMX_PORT %>;'${JMX_PORT}';g' \
  93. -e '/<%= @PAUSE_TIME_GOAL_MILLIS %>/ s;<%= @PAUSE_TIME_GOAL_MILLIS %>;'${PAUSE_TIME_GOAL_MILLIS}';g' \
  94. -e '/<%= @SERVER_NAME %>/ s;<%= @SERVER_NAME %>;'${SERVER_NAME}';g' ${CASSANDRA_ROOT}/conf/cassandra-env.sh
  95. }
  96. create_boot_script() {
  97. echo "CREATING BOOT SCRIPT"
  98. touch /etc/rc.d/init.d/cassandra
  99. chmod u+x /etc/rc.d/init.d/cassandra
  100.  
  101. echo '#!/bin/sh
  102. #
  103. # chkconfig: - 95 20
  104. #
  105. # description: controls the Cassandra runtime lifecycle
  106. # processname: cassandra
  107. #
  108.  
  109. # Source function library.
  110. . /etc/rc.d/init.d/functions
  111.  
  112. RETVAL=0
  113. SERVICE="cassandra"
  114. USER="'${USER_NAME}'"
  115. CASSANDRA_HOME="'${CASSANDRA_ROOT}'"
  116. LOCK_FILE="/var/lock/subsys/${SERVICE}"
  117.  
  118. start() {
  119. echo "Starting Cassandra"
  120. su - ${USER} -c "${CASSANDRA_HOME}/bin/cassandra -p ${CASSANDRA_HOME}/cassandra.pid" >/dev/null 2>/dev/null &
  121. RETVAL=$?
  122. [ $RETVAL -eq 0 ] && success || failure
  123. echo
  124. [ $RETVAL -eq 0 ] && touch ${LOCK_FILE}
  125. return $RETVAL
  126. }
  127.  
  128. stop() {
  129. echo "Stopping Cassandra"
  130. su - ${USER} -c "kill -TERM $(cat ${CASSANDRA_HOME}/cassandra.pid)"
  131. RETVAL=$?
  132. [ $RETVAL -eq 0 ] && success || failure
  133. echo
  134. [ $RETVAL -eq 0 ] && rm -r ${LOCK_FILE}
  135. return $RETVAL
  136. }
  137.  
  138. check() {
  139. echo "Checking Cassandra"
  140. PID=$(pgrep -of org.apache.cassandra.service.CassandraDaemon)
  141. RETVAL=$?
  142. if [ $RETVAL -eq 0 ]; then
  143. echo "Cassandra is running on this host with PID=$PID"
  144. netstat -anp | grep $PID
  145. else
  146. echo "Cassandra is not running on this host"
  147. fi
  148. return $RETVAL
  149. }
  150.  
  151. case "$1" in
  152. start)
  153. start
  154. ;;
  155. stop)
  156. stop
  157. ;;
  158. restart)
  159. check
  160. stop
  161. start
  162. check
  163. ;;
  164. check)
  165. check
  166. ;;
  167. *)
  168. echo $"Usage: $0 {start|stop|restart|check}"
  169. exit 1
  170. esac
  171.  
  172. exit $?' > /etc/rc.d/init.d/cassandra
  173.  
  174. chkconfig --add cassandra
  175. chkconfig cassandra on
  176. }
  177. extract_template
  178.  
  179. edit_files
  180.  
  181. create_boot_script
The last step in the script creates a boot script, such that Cassandra starts when the host is started.

Set-up data model

Cassandra's data model is a partitioned row store with tunable consistency. Rows are organized into tables; the first component of a table's primary key is the partition key; within a partition, rows are clustered by the remaining columns of the key. Other columns can be indexed separately from the primary key. Tables can be created, dropped, and altered at runtime without blocking updates and queries.

A keyspace is the CQL counterpart to an SQL database, but a little different. The Cassandra keyspace is a namespace that defines how data is replicated on nodes. Typically, a cluster has one keyspace per application. Replication is controlled on a per-keyspace basis, so data that has different replication requirements typically resides in different keyspaces. Keyspaces are not designed to be used as a significant map layer within the data model. Keyspaces are designed to controldata replication for a set of tables.

Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. Note that we canincrease the replication factor and add the desired number of nodes later. Two replication strategies are available:
  • SimpleStrategy, use for a single data center only. In the case of more than one data center, use theNetworkTopologyStrategy.
  • NetworkTopologyStrategy, recommended for most deployments because it is much easier to expand to multiple data centers when necessary.
To create a keyspace and a table (table properties) we can use thecqlsh command-line utility (the CQL reference can be found here)
  1. [weblogic@machine1 bin]$ ./cqlsh machine1.com 9042
  2. Connected to TestCluster at machine1.com:9042.
  3. [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]
  4. Use HELP for help.
  5.  
  6. # create keyspace
  7. cqlsh> CREATE KEYSPACE IF NOT EXISTS test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};
  8.  
  9. # create table
  10. cqlsh> CREATE TABLE IF NOT EXISTS test.persons ( sofinummer int PRIMARY KEY , naam text ) WITH compression = {'sstable_compression': 'LZ4Compressor' } AND compaction = {'class': 'LeveledCompactionStrategy'};
  11.  
  12. # check created keyspace
  13. cqlsh> SELECT * FROM system.schema_keyspaces;
  14.  
  15. keyspace_name | durable_writes | strategy_class | strategy_options
  16. ---------------+----------------+---------------------------------------------+----------------------------
  17. test | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"2"}
  18. system | True | org.apache.cassandra.locator.LocalStrategy | {}
  19. system_traces | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"2"}
  20.  
  21. (3 rows)
  22.  
  23. # check created table
  24. cqlsh> SELECT * FROM system.schema_columnfamilies WHERE columnfamily_name = 'persons' ALLOW FILTERING;
  25.  
  26. keyspace_name | columnfamily_name | bloom_filter_fp_chance | caching | cf_id | column_aliases | comment | compaction_strategy_class | compaction_strategy_options | comparator | compression_parameters | default_time_to_live | default_validator | dropped_columns | gc_grace_seconds | index_interval | is_dense | key_aliases | key_validator | local_read_repair_chance | max_compaction_threshold | max_index_interval | memtable_flush_period_in_ms | min_compaction_threshold | min_index_interval | read_repair_chance | speculative_retry | subcomparator | type | value_alias
  27. ---------------+-------------------+------------------------+---------------------------------------------+--------------------------------------+----------------+---------+--------------------------------------------------------------+-----------------------------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------------+----------------------+-------------------------------------------+-----------------+------------------+----------------+----------+----------------+-------------------------------------------+--------------------------+--------------------------+--------------------+-----------------------------+--------------------------+--------------------+--------------------+-------------------+---------------+----------+-------------
  28. test | persons | 0.1 | {"keys":"ALL", "rows_per_partition":"NONE"} | 296a2de0-a6ea-11e4-a8b4-b97fe3ba0f40 | [] | | org.apache.cassandra.db.compaction.LeveledCompactionStrategy | {} | org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type) | {"sstable_compression":"org.apache.cassandra.io.compress.LZ4Compressor"} | 0 | org.apache.cassandra.db.marshal.BytesType | null | 864000 | null | False | ["sofinummer"] | org.apache.cassandra.db.marshal.Int32Type | 0.1 | 32 | 2048 | 0 | 4 | 128 | 0 | 99.0PERCENTILE | null | Standard | null
  29.  
  30. (1 rows)
  31.  
  32. cqlsh> SELECT * FROM system.schema_columns WHERE columnfamily_name = 'persons' ALLOW FILTERING;
  33.  
  34. keyspace_name | columnfamily_name | column_name | component_index | index_name | index_options | index_type | type | validator
  35. ---------------+-------------------+-------------+-----------------+------------+---------------+------------+---------------+-------------------------------------------
  36. test | persons | naam | 0 | null | null | null | regular | org.apache.cassandra.db.marshal.UTF8Type
  37. test | persons | sofinummer | null | null | null | null | partition_key | org.apache.cassandra.db.marshal.Int32Type
  38.  
  39. (2 rows)
  40.  
  41. # to check the cluster we can use nodetool (nodetool help status, provides more information)
  42. [weblogic@machine1 bin]$ ./nodetool --host machine1.com --port 7199 status test
  43. Datacenter: datacenter1
  44. =======================
  45. Status=Up/Down
  46. |/ State=Normal/Leaving/Joining/Moving
  47. -- Address Load Tokens Owns (effective) Host ID Rack
  48. UN 192.168.231.110 77.89 KB 256 100.0% 6f563cc6-3e2f-4089-b28c-bdb58959c214 rack1
  49. UN 192.168.231.100 104.46 KB 256 100.0% 0eddc3ea-6692-4c62-9135-67ef37d655b5 rack1

Test

To test the set-up we use a Java client. For the Java client to be able to connect, we need theCassandra Java Driver. The driver uses the binary protocol (start_native_transport: true,native_transport_port: 9042, and rpc_address: <hostname reachable from the client> must be set in thecassandra.yaml file). The document Writing your first client describes a step-by-step example.

The driver discovers nodes that constitute a cluster by querying the contact points used in building the cluster object. After this it is up to the cluster's load balancing policy to keep track of node events. If a Cassandra node fails or becomes unreachable, the Java driver automatically and transparently tries other nodes in the cluster and schedules reconnections to the dead nodes in the background. More information on the driver can be found in thedriver reference.

As an example, we can use the following
  1. # obtain a Cluster and a Session instance, and make those available for future use
  2. package model.utils;
  3.  
  4. import com.datastax.driver.core.Cluster;
  5. import com.datastax.driver.core.ProtocolOptions;
  6. import com.datastax.driver.core.Session;
  7.  
  8. import java.net.InetSocketAddress;
  9. import java.util.ArrayList;
  10. import java.util.List;
  11. import java.util.concurrent.TimeUnit;
  12.  
  13. public class CassandraUtil {
  14.  
  15. private static final Cluster cluster;
  16. private static final Session session;
  17.  
  18. private static final String[] HOST_NAMES = {"machine1.com","machine2.com"};
  19. private static final Integer PORT = ProtocolOptions.DEFAULT_PORT;
  20. private static final String KEYSPACE = "test";
  21.  
  22. private CassandraUtil() {
  23. }
  24.  
  25. static {
  26. try {
  27. List<InetSocketAddress> addresses = new ArrayList<>();
  28. for (String host: HOST_NAMES) {
  29. InetSocketAddress address = new InetSocketAddress(host, PORT);
  30. addresses.add(address);
  31. }
  32.  
  33. System.out.println("initializing");
  34. long start_time = System.nanoTime();
  35.  
  36. //cluster = Cluster.builder().addContactPoints(HOST_NAMES).build();
  37. cluster = Cluster.builder().addContactPointsWithPorts(addresses).build();
  38. session = cluster.connect(KEYSPACE);
  39.  
  40. long end_time = System.nanoTime();
  41. System.out.println("done initializing");
  42. long total_time = TimeUnit.NANOSECONDS.toMillis(end_time - start_time);
  43. System.out.println("initializing took: " + total_time + " ms.");
  44. } catch (Throwable ex) {
  45. ex.printStackTrace();
  46. throw new ExceptionInInitializerError(ex);
  47. }
  48. }
  49.  
  50. public static Cluster getCluster() {
  51. return cluster;
  52. }
  53.  
  54. public static Session getSession() {
  55. return session;
  56. }
  57. }
  1. # entity representing our table (used by Spring Data)
  2. package model.entities;
  3.  
  4. import org.springframework.data.cassandra.mapping.PrimaryKey;
  5. import org.springframework.data.cassandra.mapping.Table;
  6.  
  7. @Table("persons")
  8. public class Person implements Comparable<Person> {
  9.  
  10. @PrimaryKey
  11. private Integer sofinummer;
  12.  
  13. private String naam;
  14.  
  15. public Integer getSofinummer() {
  16. return sofinummer;
  17. }
  18.  
  19. public void setSofinummer(Integer sofinummer) {
  20. this.sofinummer = sofinummer;
  21. }
  22.  
  23. public String getNaam() {
  24. return naam;
  25. }
  26.  
  27. public void setNaam(String naam) {
  28. this.naam = naam;
  29. }
  30.  
  31. @Override
  32. public boolean equals(Object object) {
  33. if (this == object) {
  34. return true;
  35. }
  36.  
  37. if (object == null) {
  38. return false;
  39. }
  40.  
  41. if (!(object instanceof Person)) {
  42. return false;
  43. }
  44.  
  45. Person person = (Person) object;
  46. return getSofinummer().equals(person.getSofinummer());
  47. }
  48.  
  49. @Override
  50. public int hashCode() {
  51. return getSofinummer().hashCode();
  52. }
  53.  
  54. @Override
  55. public String toString() {
  56. return getNaam() + " " + getSofinummer();
  57. }
  58.  
  59. @Override
  60. public int compareTo(Person other) {
  61. return this.getNaam().compareTo(other.getNaam());
  62. }
  63. }
  1. package test;
  2.  
  3. import com.datastax.driver.core.*;
  4. import com.datastax.driver.core.querybuilder.Batch;
  5. import com.datastax.driver.core.querybuilder.QueryBuilder;
  6. import com.datastax.driver.core.querybuilder.Select;
  7. import model.entities.Person;
  8. import model.utils.CassandraUtil;
  9. import org.springframework.data.cassandra.core.CassandraOperations;
  10. import org.springframework.data.cassandra.core.CassandraTemplate;
  11.  
  12. import java.util.List;
  13.  
  14. public class Test {
  15.  
  16. public static void main(String[] args) {
  17. try {
  18. Test test = new Test();
  19. test.getMetaData();
  20. test.useCQL();
  21. test.useSpring();
  22. } catch (Exception e) {
  23. e.printStackTrace();
  24. } finally {
  25. CassandraUtil.getSession().close();
  26. CassandraUtil.getCluster().close();
  27. }
  28. }
  29.  
  30. public void getMetaData() {
  31. Configuration configuration = CassandraUtil.getCluster().getConfiguration();
  32. Metadata metadata = CassandraUtil.getCluster().getMetadata();
  33. System.out.println("Connected to cluster: " + metadata.getClusterName());
  34. for (Host host : metadata.getAllHosts()) {
  35. System.out.println("- host: " + host.getDatacenter() + ", " +
  36. host.getRack() + ", " +
  37. host.getAddress() + ", " +
  38. configuration.getPolicies().getLoadBalancingPolicy().distance(host));
  39. }
  40.  
  41. Metrics metrics = CassandraUtil.getCluster().getMetrics();
  42. System.out.println("- connected to hosts: " + metrics.getConnectedToHosts().getValue());
  43. System.out.println("- open connections: " + metrics.getOpenConnections().getValue());
  44.  
  45. TableMetadata.Options options = metadata.getKeyspace("test").getTable("persons").getOptions();
  46. System.out.println("Table options:");
  47. System.out.println("- " + options.getCaching());
  48. System.out.println("- " + options.getCompaction());
  49. System.out.println("- " + options.getCompression());
  50. }
  51.  
  52. public void useCQL() {
  53. CassandraUtil.getSession().execute("INSERT INTO test.persons (sofinummer, naam) VALUES (123456789, 'John Zorn');");
  54. CassandraUtil.getSession().execute("INSERT INTO test.persons (sofinummer, naam) VALUES (987654321, 'Frank Zappa');");
  55.  
  56. ResultSet results = CassandraUtil.getSession().execute("SELECT * FROM test.persons");
  57. for (Row row : results) {
  58. System.out.println(row.getInt("sofinummer") + ", " + row.getString("naam"));
  59. }
  60. }
  61.  
  62. public void useSpring() {
  63. Person johnzorn = new Person();
  64. johnzorn.setSofinummer(123456789);
  65. johnzorn.setNaam("John Zorn");
  66.  
  67. Person frankzappa = new Person();
  68. frankzappa.setSofinummer(987654321);
  69. frankzappa.setNaam("Frank Zappa");
  70.  
  71. CassandraOperations operations = new CassandraTemplate(CassandraUtil.getSession());
  72. operations.insert(johnzorn);
  73. operations.insert(frankzappa);
  74.  
  75. Select select = QueryBuilder.select().from("test", "persons");
  76. List<Person> persons = operations.select(select, Person.class);
  77. persons.forEach(System.out::println);
  78. }
  79. }
in which we have also used Spring Data, and in particular Spring Data Cassandra. In order to run the example above, we need to have the following .jar files in the class path
  1. aopalliance-1.0.jar
  2. cassandra-driver-core-2.1.4.jar
  3. commons-logging-1.1.3.jar
  4. guava-14.0.1.jar
  5. metrics-core-3.0.2.jar
  6. slf4j-api-1.7.5.jar
  7. spring-aop-4.1.4.RELEASE.jar
  8. spring-beans-4.1.4.RELEASE.jar
  9. spring-context-4.1.4.RELEASE.jar
  10. spring-core-4.1.4.RELEASE.jar
  11. spring-cql-1.1.2.RELEASE.jar
  12. spring-data-cassandra-1.1.2.RELEASE.jar
  13. spring-data-commons-1.9.2.RELEASE.jar
  14. spring-expression-4.1.4.RELEASE.jar
  15. spring-tx-4.1.4.RELEASE.jar
The Cassandra Driver (and related .jar files) can be obtained here. The Spring .jar files can be obtained from the Spring Repository. We can also obtain the .jar files using Maven, for example, when using IntelliJ IDEA, click File, Project Structure and chooseLibraries; click + and choose from maven. In theDownload Library from Maven Repository screen, enter the artifact id (spring-data-cassandra), press search, select the version to be used (org.springframework.data:spring-data-cassandra:1.1.2.RELEASE), optionallly choose to download the library to a project directory, and click ok.

When the example is run, we get the following output
  1. initializing
  2. ...
  3. done initializing
  4. initializing took: 745 ms.
  5. Connected to cluster: TestCluster
  6. - host: datacenter1, rack1, machine2.com/192.168.231.110, LOCAL
  7. - host: datacenter1, rack1, machine1.com/192.168.231.100, LOCAL
  8. - connected to hosts: 2
  9. - open connections: 3
  10. Table options:
  11. - {keys=ALL, rows_per_partition=NONE}
  12. - {class=org.apache.cassandra.db.compaction.LeveledCompactionStrategy}
  13. - {sstable_compression=org.apache.cassandra.io.compress.LZ4Compressor}
  14. 123456789, John Zorn
  15. 987654321, Frank Zappa
  16. John Zorn 123456789
  17. Frank Zappa 987654321

Spring

Spring offers the very handy CassandraOperations API, which we can use to create a generic DAO approach as was done in the postFun with Spring. The Spring Data Commons module offers so-called repositories. The central interface in Spring Data repository abstraction is theRepository interface. It takes the domain class to manage as well as the id type of the domain class as type arguments. This interface acts primarily as a marker interface to capture the types to work with and to help discover interfaces that extend this one. The CrudRepository interface provides CRUD functionality for the entity class that is being managed. To implement theCrudRepository we can use something like
  1. package model.logic;
  2.  
  3. import org.springframework.cassandra.core.util.CollectionUtils;
  4. import org.springframework.data.cassandra.core.CassandraOperations;
  5. import org.springframework.data.repository.CrudRepository;
  6.  
  7. import java.io.Serializable;
  8. import java.lang.reflect.ParameterizedType;
  9. import java.lang.reflect.Type;
  10.  
  11. public abstract class GenericCassandraRepository<T, ID extends Serializable> implements CrudRepository<T, ID> {
  12.  
  13. private Class<T> persistentClass;
  14. private CassandraOperations operations;
  15.  
  16. public GenericCassandraRepository(CassandraOperations operations) {
  17. Type type = getClass().getGenericSuperclass();
  18. if (type instanceof ParameterizedType) {
  19. ParameterizedType parameterizedType = (ParameterizedType) type;
  20. setPersistentClass((Class<T>) parameterizedType.getActualTypeArguments()[0]);
  21. } else {
  22. System.out.println("Not an instance of parameterized type: " + type);
  23. }
  24.  
  25. setOperations(operations);
  26. }
  27.  
  28. public Class<T> getPersistentClass() {
  29. return persistentClass;
  30. }
  31.  
  32. public void setPersistentClass(Class<T> persistentClass) {
  33. this.persistentClass = persistentClass;
  34. }
  35.  
  36. public CassandraOperations getOperations() {
  37. return operations;
  38. }
  39.  
  40. public void setOperations(CassandraOperations operations) {
  41. this.operations = operations;
  42. }
  43.  
  44. @Override
  45. public <S extends T> S save(S entity) {
  46. return getOperations().insert(entity);
  47. }
  48.  
  49. @Override
  50. public <S extends T> Iterable<S> save(Iterable<S> entities) {
  51. return getOperations().insert(CollectionUtils.toList(entities));
  52. }
  53.  
  54. @Override
  55. public T findOne(ID id) {
  56. return getOperations().selectOneById(getPersistentClass(), id);
  57. }
  58.  
  59. @Override
  60. public boolean exists(ID id) {
  61. return getOperations().exists(getPersistentClass(), id);
  62. }
  63.  
  64. @Override
  65. public Iterable<T> findAll() {
  66. return getOperations().selectAll(getPersistentClass());
  67. }
  68.  
  69. @Override
  70. public Iterable<T> findAll(Iterable<ID> entities) {
  71. return getOperations().selectBySimpleIds(getPersistentClass(), entities);
  72. }
  73.  
  74. @Override
  75. public long count() {
  76. return getOperations().count(getPersistentClass());
  77. }
  78.  
  79. @Override
  80. public void delete(ID id) {
  81. getOperations().deleteById(getPersistentClass(), id);
  82. }
  83.  
  84. @Override
  85. public void delete(T entity) {
  86. getOperations().delete(entity);
  87. }
  88.  
  89. @Override
  90. public void delete(Iterable<? extends T> entities) {
  91. getOperations().delete(CollectionUtils.toList(entities));
  92. }
  93.  
  94. @Override
  95. public void deleteAll() {
  96. getOperations().deleteAll(getPersistentClass());
  97. }
  98. }
In the constructor we use Generics to retrieve information about the persistent entity class, i.e., find the class of theT generic argument. If we look at the JavaDocs the following can be expected:
  • If the superclass is a parameterized type, the Type object returned must accurately reflect the actual type parameters used in the source code. The parameterized type representing the superclass is created if it had not been created before.
  • A parameterized type is created the first time it is needed by a reflective method, as specified in this package. When a parameterized typep is created, the generic type declaration that p instantiates is resolved, and all type arguments ofp are created recursively.
  • A type variable is created the first time it is needed by a reflective method, as specified in this package. If a type variablet is referenced by a type (i.e, class, interface or annotation type)T, and T is declared by the n-th enclosing class of T, then the creation of t requires the resolution of the i-th enclosing class of T, for i = 0 to n, inclusive.
The CrudInterface interface can be extended for particular entities, for example,
  1. package model.logic;
  2.  
  3. import model.entities.Person;
  4. import org.springframework.data.repository.CrudRepository;
  5.  
  6. public interface PersonRepository extends CrudRepository<Person, Integer> {
  7. public void sortAllPersons();
  8. }
with the corresponding implementation:
  1. package model.logic;
  2.  
  3. import model.entities.Person;
  4. import org.springframework.cassandra.core.util.CollectionUtils;
  5. import org.springframework.data.cassandra.core.CassandraOperations;
  6.  
  7. import java.util.List;
  8. import java.util.concurrent.TimeUnit;
  9.  
  10. public class PersonRepositoryImpl extends GenericCassandraRepository<Person, Integer> implements PersonRepository {
  11. public PersonRepositoryImpl(CassandraOperations operations) {
  12. super(operations);
  13. }
  14.  
  15. @Override
  16. public void sortAllPersons() {
  17. List<Person> persons = CollectionUtils.toList(super.findAll());
  18.  
  19. long sequential_start_time = System.nanoTime();
  20. persons.stream().sorted().count();
  21. long sequential_end_time = System.nanoTime();
  22. long sequential_total_time = TimeUnit.NANOSECONDS.toMillis(sequential_end_time - sequential_start_time);
  23. System.out.println("sequential sort took: " + sequential_total_time + " ms.");
  24.  
  25. long parallel_start_time = System.nanoTime();
  26. persons.parallelStream().sorted().count();
  27. long parallel_end_time = System.nanoTime();
  28. long parallel_total_time = TimeUnit.NANOSECONDS.toMillis(parallel_end_time - parallel_start_time);
  29. System.out.println("parallel sort took: " + parallel_total_time + " ms.");
  30. }
  31. }

Using Spring configuration

We can also configure Cassandra (create a Cluster and Session instance; create aCassandraOperations instance, and inject that into our PersonRepository) by using Spring configuration, for example,
  1. # cassandra.properties
  2. cassandra.contactpoints=machine1.com,machine.com
  3. cassandra.port=9042
  4. cassandra.keyspace=test
  1. # spring configuration
  2. <beans:beans xmlns:beans="http://www.springframework.org/schema/beans"
  3. xmlns:cassandra="http://www.springframework.org/schema/data/cassandra"
  4. xmlns:context="http://www.springframework.org/schema/context"
  5. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  6. xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
  7. http://www.springframework.org/schema/data/cassandra http://www.springframework.org/schema/data/cassandra/spring-cassandra.xsd
  8. http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd">
  9.  
  10. <context:property-placeholder location="classpath:cassandra.properties"/>
  11.  
  12. <cassandra:cluster id="cluster" contact-points="${cassandra.contactpoints}" port="${cassandra.port}"/>
  13.  
  14. <cassandra:session id="session" cluster-ref="cluster" keyspace-name="${cassandra.keyspace}"/>
  15.  
  16. <cassandra:mapping entity-base-packages="model.entities"/>
  17. <cassandra:converter/>
  18.  
  19. <cassandra:template id="operations" session-ref="session"/>
  20.  
  21. <beans:bean id="personrepository" class="model.logic.PersonRepositoryImpl">
  22. <beans:constructor-arg ref="operations"/>
  23. </beans:bean>
  24. </beans:beans>
  1. # SpringUtil class used to obtain instances from the Spring ApplicationContext
  2. package model.util;
  3.  
  4. import com.datastax.driver.core.Cluster;
  5. import com.datastax.driver.core.Session;
  6. import model.logic.PersonRepository;
  7. import org.springframework.context.ApplicationContext;
  8. import org.springframework.context.support.ClassPathXmlApplicationContext;
  9.  
  10. import java.util.concurrent.TimeUnit;
  11.  
  12. public class SpringUtil {
  13.  
  14. private static ApplicationContext applicationContext;
  15.  
  16. private SpringUtil(){
  17. }
  18.  
  19. static {
  20. System.out.println("initializing");
  21. long start_time = System.nanoTime();
  22.  
  23. applicationContext = new ClassPathXmlApplicationContext("spring-config.xml");
  24.  
  25. long end_time = System.nanoTime();
  26. System.out.println("done initializing");
  27. long total_time = TimeUnit.NANOSECONDS.toMillis(end_time - start_time);
  28. System.out.println("initializing took: " + total_time + " ms.");
  29. }
  30.  
  31. public static PersonRepository getPersonRepository() {
  32. return applicationContext.getBean("personrepository", PersonRepository.class);
  33. }
  34.  
  35. public static Cluster getCluster() {
  36. return applicationContext.getBean("cluster", Cluster.class);
  37. }
  38.  
  39. public static Session getSession() {
  40. return applicationContext.getBean("session", Session.class);
  41. }
  42. }
Which can be used in the following manner
  1. package test;
  2.  
  3. import model.entities.Person;
  4. import model.logic.PersonRepository;
  5. import model.util.SpringUtil;
  6.  
  7. import java.util.Random;
  8.  
  9. public class Test {
  10.  
  11. private Random generator = new Random();
  12.  
  13. public static void main(String[] args) {
  14. Test test = new Test();
  15. try {
  16. test.doSomeTest(SpringUtil.getPersonRepository());
  17. } catch (Exception e) {
  18. e.printStackTrace();
  19. } finally {
  20. SpringUtil.getSession().close();
  21. SpringUtil.getCluster().close();
  22. }
  23. }
  24.  
  25. private void doSomeTest(PersonRepository repository) {
  26. for (int i = 0; i < 10; i++) {
  27. // generate a random data
  28. Person person = createPerson();
  29. // insert or update a person
  30. repository.save(person);
  31. if (generator.nextDouble() < 0.001) {
  32. // remove a person
  33. repository.delete(generateSofinummer());
  34. // get all persons
  35. System.out.println("number of persons: " + repository.count());
  36. repository.sortAllPersons();
  37. } else {
  38. // find a person by ID
  39. repository.findOne(generateSofinummer());
  40. }
  41. }
  42. }
  43.  
  44. private Person createPerson() {
  45. Person person = new Person();
  46. person.setSofinummer(generateSofinummer());
  47. person.setNaam(Long.toString(Math.abs(generator.nextLong()), 36));
  48. return person;
  49. }
  50.  
  51. private Integer generateSofinummer() {
  52. return generator.nextInt(100000);
  53. }
  54. }
When the example is run, we get the following output
  1. initializing
  2. ...
  3. done initializing
  4. initializing took: 6511 ms.
Initialization using the Spring configuration is much slower than the direct approach as was done in theCassandraUtil class. So in the test below, we will stick to the 'CassandraUtil'-approach.

Test

To test it all we can use something like
  1. package test;
  2.  
  3. import model.entities.Person;
  4. import model.logic.PersonRepository;
  5. import model.logic.PersonRepositoryImpl;
  6. import model.utils.CassandraUtil;
  7. import org.springframework.data.cassandra.core.CassandraOperations;
  8. import org.springframework.data.cassandra.core.CassandraTemplate;
  9.  
  10. import java.util.Random;
  11.  
  12. public class LoadTest {
  13.  
  14. private Random generator = new Random();
  15.  
  16. public static void main(String[] args) {
  17. LoadTest test = new LoadTest();
  18. try {
  19. CassandraOperations operations = new CassandraTemplate(CassandraUtil.getSession());
  20. PersonRepository repository = new PersonRepositoryImpl(operations);
  21. test.doRandomReadWriteTest(repository);
  22. } catch (Exception e) {
  23. e.printStackTrace();
  24. } finally {
  25. CassandraUtil.getSession().close();
  26. CassandraUtil.getCluster().close();
  27. }
  28. }
  29.  
  30. private void doRandomReadWriteTest(PersonRepository repository) {
  31. while (true) {
  32. // generate a random data
  33. Person person = createPerson();
  34. // insert or update a person
  35. repository.save(person);
  36. if (generator.nextDouble() < 0.001) {
  37. // remove a person
  38. repository.delete(generateSofinummer());
  39. // get all persons
  40. System.out.println("number of persons: " + repository.count());
  41. repository.sortAllPersons();
  42. } else {
  43. // find a person by ID
  44. repository.findOne(generateSofinummer());
  45. }
  46. }
  47. }
  48.  
  49. private Person createPerson() {
  50. Person person = new Person();
  51. person.setSofinummer(generateSofinummer());
  52. person.setNaam(Long.toString(Math.abs(generator.nextLong()), 36));
  53. return person;
  54. }
  55.  
  56. private Integer generateSofinummer() {
  57. return generator.nextInt(100000);
  58. }
  59. }
We also start a flight recording on the Cassandra nodes, in order to see how the JVM is doing. To create flight recordings we will use the script presented in the postJava Virtual Machine Code Generation and Optimization (this is also the reason why we set the extra parameter-Dname=<%= @SERVER_NAME %> in the cassandra-env.sh file, such that we can easily find the process id of the Cassandra process by using the provided name), i.e.,
  1. [weblogic@machine1 monitor]$ ./FlightRecording.sh
  2. Provide an <ACTION> and a <SERVER_NAME>
  3. Usage FlightRecording.sh <ACTION> <SERVER_NAME>, <ACTION> must be one of {start|stop|dump|check|clean}
  4.  
  5. [weblogic@machine1 monitor]$ ./FlightRecording.sh start cassandra
  6. Starting Flight Recording for server: cassandra with PID 2783
  7. 2783:
  8. Started recording 1. The result will be written to: /home/weblogic/cassandra-29-01-2015_16:39:44.jfr
  9.  
  10. [weblogic@machine1 monitor]$ ./FlightRecording.sh check cassandra
  11. 2783:
  12. Recording: recording=1 name="cassandra" duration=20m filename="/home/weblogic/cassandra-29-01-2015_16:39:44.jfr" compress=false (running)
When the test is run, the following output is observed (nice to see what parallel sorting can do)
  1. number of persons: 1377
  2. sequential sort took: 11 ms.
  3. parallel sort took: 12 ms.
  4. number of persons: 1694
  5. sequential sort took: 3 ms.
  6. parallel sort took: 4 ms.
  7. number of persons: 4916
  8. sequential sort took: 8 ms.
  9. parallel sort took: 9 ms.
  10. ...
  11. number of persons: 10128
  12. sequential sort took: 6 ms.
  13. parallel sort took: 6 ms.
  14. ...
  15. number of persons: 20018
  16. sequential sort took: 13 ms.
  17. parallel sort took: 6 ms.
  18. ...
  19. number of persons: 30236
  20. sequential sort took: 17 ms.
  21. parallel sort took: 6 ms
  22. ...
  23. number of persons: 40829
  24. sequential sort took: 20 ms.
  25. parallel sort took: 7 ms.
  26. ...
  27. number of persons: 51534
  28. sequential sort took: 26 ms.
  29. parallel sort took: 9 ms.
  30. ...
  31. number of persons: 60884
  32. sequential sort took: 36 ms.
  33. parallel sort took: 13 ms.
  34. ...
  35. number of persons: 70080
  36. sequential sort took: 51 ms.
  37. parallel sort took: 17 ms.
  38. ...
  39. number of persons: 80022
  40. sequential sort took: 60 ms.
  41. parallel sort took: 21 ms.
  42. ...
  43. number of persons: 90167
  44. sequential sort took: 56 ms.
  45. parallel sort took: 20 ms.
  46. ...
  47. number of persons: 95031
  48. sequential sort took: 66 ms.
  49. parallel sort took: 22 ms
  50. ...
  51. number of persons: 99390
  52. sequential sort took: 59 ms.
  53. parallel sort took: 19 ms.
Monitoring results. To set-up an MBean Browser we have to provide a JMX URL, i.e,service:jmx:rmi:///jndi/rmi://<HOSTNAME>:<JMX_PORT>/jmxrmi. The following shows snapshots from the JMX Console in Java Mission Control
console_mbean_browser
console_memory
console_threads
On Linux we can use lsof to get insight in the open files, and sar to obtain statistics related to the disk.
  1. # list open files
  2. [weblogic@machine1 cassandra2.1.2]$ lsof -i -a -p 2783
  3. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
  4. java 2783 weblogic 53u IPv4 14637 0t0 TCP *:7199 (LISTEN)
  5. java 2783 weblogic 54u IPv4 14640 0t0 TCP *:44168 (LISTEN)
  6. java 2783 weblogic 59u IPv4 18410 0t0 TCP machine1.com:35531->machine2.com:empowerid (ESTABLISHED)
  7. java 2783 weblogic 60u IPv4 18411 0t0 TCP machine1.com:empowerid->machine1.com:39417 (ESTABLISHED)
  8. java 2783 weblogic 61r IPv6 19642 0t0 TCP machine1.com:9042->192.168.231.1:53436 (ESTABLISHED)
  9. java 2783 weblogic 75r IPv4 19619 0t0 TCP machine1.com:39418->machine1.com:empowerid (ESTABLISHED)
  10. java 2783 weblogic 88u IPv4 14894 0t0 TCP machine1.com:empowerid (LISTEN)
  11. java 2783 weblogic 90u IPv6 15012 0t0 TCP machine1.com:9042 (LISTEN)
  12. java 2783 weblogic 91u IPv4 15014 0t0 TCP machine1.com:apani1 (LISTEN)
  13. java 2783 weblogic 92u IPv4 18409 0t0 TCP machine1.com:empowerid->machine2.com:57281 (ESTABLISHED)
  14. java 2783 weblogic 95u IPv4 19603 0t0 TCP machine1.com:35536->machine2.com:empowerid (ESTABLISHED)
  15. java 2783 weblogic 96u IPv4 19606 0t0 TCP machine1.com:39417->machine1.com:empowerid (ESTABLISHED)
  16. java 2783 weblogic 97u IPv4 19607 0t0 TCP machine1.com:empowerid->machine1.com:39418 (ESTABLISHED)
  17. java 2783 weblogic 99u IPv4 19620 0t0 TCP machine1.com:empowerid->machine2.com:57291 (ESTABLISHED)
  18.  
  19. # i/o statistics
  20. [weblogic@machine1 cassandra2.1.2]$ sar -b
  21. Linux 2.6.32-504.3.3.el6.x86_64 (machine1.com) 01/29/2015 _x86_64_ (4 CPU)
  22. 04:30:01 PM tps rtps wtps bread/s bwrtn/s
  23. 04:40:01 PM 4.43 1.82 2.61 70.90 29.22
  24. 04:50:01 PM 18.95 0.04 18.91 1.82 279.62
  25. 05:00:01 PM 31.86 0.45 31.41 132.36 478.27
  26. Average: 18.15 0.79 17.36 68.57 257.92
  27.  
  28. # individual block device i/o
  29. [weblogic@machine1 cassandra2.1.2]$ sar -p -d
  30. Linux 2.6.32-504.3.3.el6.x86_64 (machine1.com) 01/29/2015 _x86_64_ (4 CPU)
  31. 04:30:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
  32. 04:40:01 PM sda 1.57 35.45 14.67 31.86 0.03 16.66 8.14 1.28
  33. 04:40:01 PM vg_machine1-lv_root 2.86 35.45 14.55 17.49 0.07 24.30 4.48 1.28
  34. 04:40:01 PM vg_machine1-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  35. 04:50:01 PM sda 1.45 0.91 139.81 96.75 0.03 23.39 13.28 1.93
  36. 04:50:01 PM vg_machine1-lv_root 17.49 0.91 139.81 8.04 0.84 48.16 1.10 1.93
  37. 04:50:01 PM vg_machine1-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  38. 05:00:01 PM sda 1.74 66.18 239.14 175.20 0.11 62.79 15.40 2.68
  39. 05:00:01 PM vg_machine1-lv_root 30.12 66.18 239.14 10.14 8.58 284.98 0.89 2.68
  40. 05:00:01 PM vg_machine1-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  41. Average: sda 1.59 34.29 128.98 102.68 0.06 35.24 12.28 1.95
  42. Average: vg_machine1-lv_root 16.56 34.29 128.94 9.86 3.11 187.95 1.18 1.95
  43. Average: vg_machine1-lv_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
  44.  
  45. # swap space
  46. [weblogic@machine1 cassandra2.1.2]$ sar -S
  47. Linux 2.6.32-504.3.3.el6.x86_64 (machine1.com) 01/29/2015 _x86_64_ (4 CPU)
  48. 04:30:01 PM kbswpfree kbswpused %swpused kbswpcad %swpcad
  49. 04:40:01 PM 4194300 0 0.00 0 0.00
  50. 04:50:01 PM 4194300 0 0.00 0 0.00
  51. 05:00:01 PM 4194300 0 0.00 0 0.00
  52. Average: 4194300 0 0.00 0 0.00
  53.  
  54. # CPU and memory usage by the JVM
  55. [weblogic@machine1 cassandra2.1.2]$ ps -p 2783 -o %cpu,%mem,cmd
  56. %CPU %MEM CMD
  57. 39.8 26.3 /u01/app/oracle/weblogic12.1.3/jdk1.8.0_25/bin/java -Xms1024m -Xmx1024m -XX:StringTableSize=1000003 -XX:+UseG1GC
  58.  
  59. # run queue and load average
  60. [weblogic@machine1 cassandra2.1.2]$ sar -q
  61. Linux 2.6.32-504.3.3.el6.x86_64 (machine1.com) 01/29/2015 _x86_64_ (4 CPU)
  62. 04:30:01 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
  63. 04:40:01 PM 0 379 0.09 0.10 0.10
  64. 04:50:01 PM 0 382 0.00 0.04 0.06
  65. 05:00:01 PM 0 383 0.29 0.11 0.06
  66. 05:10:01 PM 0 379 0.00 0.04 0.05
  67. Average: 0 381 0.10 0.07 0.07
  68.  
  69. # after we shutdown machine1.com, we check the status of the cluster
  70. [weblogic@machine2 bin]$ ./nodetool --host machine2.com --port 7199 status test
  71. Datacenter: datacenter1
  72. =======================
  73. Status=Up/Down
  74. |/ State=Normal/Leaving/Joining/Moving
  75. -- Address Load Tokens Owns (effective) Host ID Rack
  76. UN 192.168.231.110 136.14 KB 256 100.0% 6f563cc6-3e2f-4089-b28c-bdb58959c214 rack1
  77. DN 192.168.231.100 141.25 KB 256 100.0% 0eddc3ea-6692-4c62-9135-67ef37d655b5 rack1
  78.  
  79. [weblogic@machine2 bin]$ ./cqlsh machine2.com 9042
  80. Connected to TestCluster at machine2.com:9042.
  81. [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]
  82. Use HELP for help.
  83. cqlsh> SELECT count(*) FROM test.persons;
  84.  
  85. count
  86. -------
  87. 99399
  88.  
  89. (1 rows)
From the Flight Recording, we can obtain detailed information on how the Java Virtual Machine is doing. TheGeneral, Overview tab provides a first peak (the JVM Information tab provides the information about the Java Virtual Machine settings)
fr_general_overview
fr_general_jvminformation
The Memory, Garbage Collections tab provides information regarding the individual garbage collections (such as pause times)
fr_memory_garbagecollections
The Memory, Allocations tab shows information about object allocations
fr_memory_allocations
fr_memory_allocations_threads
The Code, Overview tab shows were the hot spots are
fr_code_overview
The Code, Compilations tab gives insight in compilation times
fr_code_compilations
The Events, Graph tab gives insight in how the threads are doing. But first we obtain the top 5 threads from theThreads, Hot Threads tab
fr_threads_hotthreads
fr_events_graph
fr_events_graph_zoom
Great stuff! As a last remark, SANs were designed to solve problems Cassandra does not have, i.e., Cassandra was designed from the start for commodity hardware.

References

[1] Cassandra Documentation.
[2] Spring Data Cassandra - Reference Documentation.
0 0
原创粉丝点击