Cassandra数据库学习

来源:互联网 发布:数据库优化面试题 编辑:程序博客网 时间:2024/06/06 05:46

http://wayneshawn.github.io/2015/04/07/Cassandra-get-started/


在线资源

Cassandra Getting Started

  • 2010-07-15 分布式 Key-Value 存储系统:Cassandra 入门
  • 2015-03-25 Apache Cassandra Wiki
    -DATASTAX Documentation
    -Cassandra2.x中文教程系列Blog

Python Cassandra-driver

  • cassandra-driver 2.5.0

单节点Cassandra使用示范

1.启动Cassandra

若未设置环境变量,进入到Cassandra的bin目录下
[root@server1 bin]# ./cassandra -f
若未使用-f选项,Cassandra会作为daemon进程运行。

2.使用cqlsh连接本地Cassandra

[root@server1 bin]# ./cqlsh -f[root@server1 bin]# ./cqlshConnected to Test Cluster at localhost:9160.[cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0]Use HELP for help.cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};cqlsh> use mykeyspace ;cqlsh:mykeyspace> create table users( name text primary key, age int, email text );cqlsh:mykeyspace> insert into users(name, age, email) values('wayne', 21, 'leon_sin@126.com');cqlsh:mykeyspace> insert into users(name, age, email) values('kerr', 22, 'singleon@126.com');cqlsh:mykeyspace>  cqlsh:mykeyspace> select * from users;name   | age | email--------+-----+-------------------   kerr |  22 |  singleon@126.com lambda |  20 | 227089@qq.com  wayne |  21 |  leon@126.com

CQL指代Cassandra Query Language。

3.使用Cassandra-driver示例cassandraDriverTest.py

from cassandra.cluster import Clustercluster = Cluster()session = cluster.connect('mykeyspace')#1.you should use %s for all types of arguments#2.second argument should be a sequence, one element tuple should be ('blah',)session.execute('INSERT INTO users(name, age, email) VALUES(%s, %s, %s)', ('shawn', 21, 'shawn@163.com'))rows = session.execute('SELECT name, age, email FROM users')for (name, age, email) in rows:        print name, age, email

4.关闭Cassandra进程

可以使用ps -ef|grep cassandra来查找其进程id,然后kill掉。

简单的两节点Cassandra集群配置

参考资源
-Initializing a multiple node cluster (single data center)
-简单配置cassandra集群

0.实验环境

VMware9.0.2,CentOS 6.5 64bits,Cassandra 2.0.13

1.先假定在如下系统上都安装了Cassandra

node0 192.168.56.100 (seed)node1 192.168.56.201

2.更改防火墙设置或者直接关闭防火墙

对于CentOS,$setup进入设置(图形界面),可以关闭防火墙

3.关闭Cassandra进程并清除数据

$ps -ef|grep cassandra$kill pid$rm -rf /var/lib/cassandra/data/system/*

4.设置/conf/cassandra.yaml

node0:

seed_provider:  - class_name: org.apache.cassandra.locator.SimpleSeedProviderparameters: - seeds: "192.168.56.100"listen_address: 192.168.56.100rpc_address: 0.0.0.0endpoint_snitch: GossipingPropertyFileSnitch

node1:

seed_provider:  - class_name: org.apache.cassandra.locator.SimpleSeedProviderparameters: - seeds: "192.168.56.100"listen_address: 192.168.56.201rpc_address: 0.0.0.0endpoint_snitch: GossipingPropertyFileSnitch

5.设置/conf/cassandra-rackdc.properties

例如:

# indicate the rack and dc for this nodedc=DC1rack=RAC1

6.启动Cassandra

在我的实验中,node0的主机名为master,node1的主机名为slave1.之所以这样起,因为最初是安装一个hadoop集群配置教程来设置的。对于VMware搭建Cassandra集群来说,关键在于两个能ping通的虚拟机。
先启动node0的Cassandra
[root@master bin]# ./cassandra

再启动node1的Cassandra
[root@slave1 bin]# ./cassandra

7.检查ring是否在运行

列出来的节点状态应该UN(UP Normal)

[root@master bin]# ./nodetool statusDatacenter: DC1===============Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--  Address         Load       Tokens  Owns (effective)  Host ID                               RackUN  192.168.56.201  74.89 KB   256     100.0%            e6121751-682e-4833-8de7-718eac08e718  RAC1UN  192.168.56.100  105.21 KB  256     100.0%            a153a679-5add-4995-adbf-

8.测试

在之前节点的测试中,我已经在mykeyspace的users表中插入了4条记录。
现在我们在node0中插入第五条记录.

[root@master bin]# ./cqlshConnected to Test Cluster at localhost:9160.[cqlsh 4.1.1 | Cassandra 2.0.13 | CQL spec 3.1.1 | Thrift protocol 19.39.0]Use HELP for help.cqlsh> use mykeyspace ;cqlsh:mykeyspace> select * from users; name   | age | email--------+-----+-------------------   kerr |  22 |  singleon@126.com lambda |  20 | 2270891001@qq.com  wayne |  21 |  leon_sin@126.com  shawn |  21 |     shawn@163.com(4 rows)cqlsh:mykeyspace> insert into users(name, age, email) values('slave', 40, 'zwxx@126.com');cqlsh:mykeyspace> select * from users; name   | age | email--------+-----+-------------------  slave |  40 |      zwxx@126.com   kerr |  22 |  singleon@126.com lambda |  20 | 2270891001@qq.com  wayne |  21 |  leon_sin@126.com  shawn |  21 |     shawn@163.com(5 rows)cqlsh:mykeyspace>

接下来,我们在node1进行查询,由于node1之前是使用VMware的clone功能从master拷贝来并作相应修改的,因此node1最初也在users表中有4条记录。现在我们去验证是否增加了一条记录。

[root@slave1 bin]# ./cassandra-cli -h 192.168.56.201Connected to: "Test Cluster" on 192.168.56.201/9160Welcome to Cassandra CLI version 2.0.13The CLI is deprecated and will be removed in Cassandra 3.0.  Consider migrating to cqlsh.CQL is fully backwards compatible with Thrift data; see http://www.datastax.com/dev/blog/thrift-to-cql3Type 'help;' or '?' for help.Type 'quit;' or 'exit;' to quit.[default@mykeyspace][default@mykeyspace] list users;Using default limit of 100Using default cell limit of 100-------------------RowKey: slave=> (name=, value=, timestamp=1428733896613000)=> (name=age, value=00000028, timestamp=1428733896613000)=> (name=email, value=7a777878403132362e636f6d, timestamp=1428733896613000)-------------------RowKey: kerr=> (name=, value=, timestamp=1428733672723000)=> (name=age, value=00000016, timestamp=1428733672723000)=> (name=email, value=73696e676c656f6e403132362e636f6d, timestamp=1428733672723000)-------------------RowKey: lambda=> (name=, value=, timestamp=1428414359621000)=> (name=age, value=00000014, timestamp=1428414359621000)=> (name=email, value=323237303839313030314071712e636f6d, timestamp=1428414359621000)-------------------RowKey: wayne=> (name=, value=, timestamp=1428733660801000)=> (name=age, value=00000015, timestamp=1428733660801000)=> (name=email, value=6c656f6e5f73696e403132362e636f6d, timestamp=1428733660801000)-------------------RowKey: shawn=> (name=, value=, timestamp=1428417278072000)=> (name=age, value=00000015, timestamp=1428417278072000)=> (name=email, value=736861776e403136332e636f6d, timestamp=1428417278072000)5 Rows Returned.Elapsed time: 572 msec(s).

运行程序cassandraDriverTest.py,也能看到新增加了一条记录‘slave’

[Kerr@slave1 ~]$ python cassandraDriverTest.py slave 40 zwxx@126.comkerr 22 singleon@126.comlambda 20 2270891001@qq.comwayne 21 leon_sin@126.comshawn 21 shawn@163.com

多节点Cassandra配置的地址问题

情景:搭建了3节点Cassandra集群,IP分别为172.16.37.17,172.16.37.18,172.16.37.19(seed 为172.16.37.18).只启动18和19上的Cassandra,那么从17节点能否使用Cassandra-driver连接数据库并查询?(节点之间互相能ping通的)

配置1

IP 172.16.37.18seeds: "172.16.37.18"listen_address: c37b18rpc_address: localhostendpoint_snitch: GossipingPropertyFileSnitchIP 172.16.37.17seeds: "172.16.37.18"listen_address: c37b17rpc_address: localhostendpoint_snitch: GossipingPropertyFileSnitchIP 172.16.37.19seeds: "172.16.37.18"listen_address: c37b19rpc_address: localhostendpoint_snitch: GossipingPropertyFileSnitch

.17节点上的cassandra-driver测试程序

from cassandra.cluster import Clustercluster = Cluster(['c37b18','c37b19'])session = cluster.connect('lsflog')res = session.execute('SELECT * FROM jcleanlog')print res

结果:
session = cluster.connect('lsflog') File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 756, in connect self.control_connection.connect() File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 1867, in connect self._set_new_connection(self._reconnect_internal()) File "/usr/lib/python2.6/site-packages/cassandra_driver-2.5.0-py2.6.egg/cassandra/cluster.py", line 1902, in _reconnect_internal raise NoHostAvailable("Unable to connect to any servers", errors) cassandra.cluster.NoHostAvailable: ('Unable to connect to any servers', {'c37b18': error(111, "Tried connecting to [('172.16.37.18', 9042)]. Last error: Connection refused"), 'c37b19': error(111, "Tried connecting to [('172.16.37.19', 9042)]. Last error: Connection refused")})

相关知识(添加于20150513)

broadcast_rpc_address

  • The broadcast_rpc_address should be an IP address that drivers/clients can connect to.link
  • RPC address to broadcast to ·drivers· and ·other Cassandra nodes·. This cannot be set to 0.0.0.0. If left blank, this will be set to the value of rpc_address. If rpc_address is set to 0.0.0.0, broadcast_rpc_address must be set.(/conf/cassandra.yaml)
  • 如果不设置broadcast_rpc_address,它会默认与设置的rpc_address相同。

rpc_address

  • unset:
    Resolves the address using the hostname configuration of the node. If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS.
  • 0.0.0.0:
    Listens on all configured interfaces, but you must set the broadcast_rpc_address to a value other than 0.0.0.0.
  • IP address
  • hostname

关于Cassandra 的Port使用(link)

  • 7199 - JMX (was 8080 pre Cassandra 0.8.xx)
  • 7000 - Internode communication (not used if TLS enabled)
  • 7001 - TLS Internode communication (used if TLS enabled)
  • 9160 - Thift client API
  • 9042 - CQL native transport port

关于nodetool的使用

  • 从node1尝试./nodetool <-h node2-ip> Connection refused
    我目前只能在启动了Cassandra的节点上使用./nodetool status
    比如我尝试从.17节点指定-h 172.16.37.18会Failed to connect to '172.16.37.18:7199' - ConnectException: 'Connection refused'.
    值得注意的是从18节点自己来
  • ./nodetool status正常
  • ./nodetool -h 172.16.37.18 status Connection refused
  • ./nodetool -h localhost status 正常
    似乎跟JMX设置有关
    stackoverflow problem1
    /conf/cassandra-env.sh中有如下语句
    # jmx: metrics and administration interface## add this if you're having trouble connecting:# JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<public name>"## see# https://blogs.oracle.com/jmxetc/entry/troubleshooting_connection_problems_in_jconsole# for more on configuring JMX through firewalls, etc. (Short version:# get it working with no firewall first.)## Cassandra ships with JMX accessible *only* from localhost.# To enable remote JMX connections, uncomment lines below# with authentication and/or ssl enabled. See https://wiki.apache.org/cassandra/JmxSecurity#LOCAL_JMX=yesif [ "$LOCAL_JMX" = "yes" ]; then  JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT -XX:+DisableExplicitGC"else  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true"  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"fi

注意上述中JMX accessible *only* from localhost我尝试注释掉LOCAL_JMX=yes,并将后面的需要authenticate的语句注释掉,但是还是会报错。Error: Password file not found: /etc/cassandra/jmxremote.password
还需要进一步阅读关于jmx的文档。

配置2

IP 172.16.37.18seeds: "172.16.37.18"listen_address: c37b18rpc_address: 0.0.0.0broadcast_rpc_address: 172.16.37.18endpoint_snitch: GossipingPropertyFileSnitchIP 172.16.37.17seeds: "172.16.37.18"listen_address: c37b17rpc_address: 0.0.0.0broadcast_rpc_address: 172.16.37.17endpoint_snitch: GossipingPropertyFileSnitchIP 172.16.37.19seeds: "172.16.37.18"listen_address: c37b19rpc_address: 0.0.0.0broadcast_rpc_address: 172.16.37.19endpoint_snitch: GossipingPropertyFileSnitch

.17节点上的cassandra-driver测试程序运行结果
[Row(job_id=1, event_time=2, idx=0)]

0 0