MemSQL翻译第一天

来源:互联网 发布:收集区域手机号码软件 编辑:程序博客网 时间:2024/06/04 20:10

原文地址:http://docs.memsql.com/docs/memsql-faq


MemSQL is the database platform for real-time analytics. Querying is done through standard SQL drivers and syntax, leveraging a broad ecosystem of drivers and applications. Read the links below to get familiar with MemSQL:


MEMSQL 文档

MemSQL是一个实时分析的数据库平台,借助于广泛的驱动和应用生态系统来实现通过标准的SQL驱动,语法来进行查询,
阅读下面的连接来熟悉MemSQL:


MemSQL is a distributed, relational database that handles mixed transactions and real-time analytics at scale. It is accessible through standard SQL drivers and syntax and supports a broad ecosystem of drivers and applications.

MemSQL has a two-tiered architecture that provides high throughput. It is a distributed system that can scale horizontally on commodity hardware, and is very compatible with other technologies in the modern data processing ecosystem (e.g. orchestration platforms, developer IDEs, and BI tools). It features an in-memory rowstore, and an on-disk columnstore. It also features Streamliner, a tool that can efficiently stream data into the MemSQL rowstore and columnstore.


MEMSQL工作原理

MemSQL是一个处理复杂事务和大规模数据实时分析的分布式关系型数据库,使用标准的SQL语法和驱动,并且支持大多数驱动和应用。
MemSQL有一个双层架构来提供高吞吐量,它是一个分布式的系统,可以在商业化硬件上水平拓展,和其他的现代数据处理系统有很好
的兼容性(比如编排平台,开发者工具IDE和BI工具),它的特点是内存的行存储和磁盘的列式存储,还有流水线(这个工具可以有效的把
流式数据存储在MemSQL的行式存储和列式存储中)
https://files.readme.io/AKsH4rgbTNatcOwfHcv3_memsql-architecture.png


More detail about MemSQL is described in the sections below.


更多的细节看下面:


Two-tiered Architecture

MemSQL has a two-tiered, clustered architecture. Each instance of MemSQL is called a "node" and runs identical software. The only difference is the role the nodes are configured to play.
Aggregators are the interface to database clients and applications. Aggregators run SQL queries across the cluster and aggregate results.
Leaves store and process data.


双层架构

MemSQL有一个双层的集群架构,每一个MemSQL的实例被称作一个节点并且运行着各自的软件,他们之间的不同是每个节点的角色不同。
Aggregator是数据库客户端和应用的接口,Aggregator在集群上运行SQL查询并且聚合结果。
Leaves存储和处理数据。
https://files.readme.io/Yf4td3yTjG0VtOQIqkgz_data%20loading%20and%20queries.png

High Throughput
MemSQL is designed to enable high throughput on concurrent workloads. A distributed query optimizer evenly divides the processing workload to maximize the efficiency of CPU usage. Queries are compiled to machine code and cached to expedite subsequent executions. Rather than cache the results of the query, MemSQL caches a compiled query plan to provide the most efficient execution path. The compiled query plan does not pre-specify values for the parameters, which allows MemSQL to substitute the values upon request, enabling subsequent queries of the same structure to run quickly, even with different parameter values. Moreover, due to MemSQL’s use of MVCC and lock-free data structures, data remains highly accessible, even amidst a high volume of concurrent reads and writes.


高吞吐量

MemSQL被设计成在高并发情况下有高吞吐量,一个分布式的查询优化器平均的分散处理压力去实现CPU的最大利用率,查询被编译成机器码
并且被缓存,这样可以加快随后的执行,MemSQL并不是缓存查询的结果,而是缓存一个compiled query plan来提供最有效的执行路径,compiled
query plan并不是为参数预先设置默认值,它允许MemSQL根据不同的请求替换值,使接下来有相同结构的查询运行的更快,即使是使用的不同
的参数,另外,由于MemSQL使用MVCC(Multi-Version Concurrency Control 多版本并发控制)和lock-free(http://preshing.com/20120612/an-introduction-to-lock-free-programming/)
数据结构,数据是即使在高并发读写的情况下也是高可用的

Highly Scalable
MemSQL is a highly scalable distributed system. The cluster can be scaled out at any time to provide increased storage capacity and processing power. Sharding is done automatically and the cluster re-balances data and workload distribution. Data is highly available and nodes can go down with negligible effect on performance.
In addition to being fast, consistent, and scalable, MemSQL is also durable. Transactions are committed to disk as logs and periodically compressed as snapshots of the entire database. If any node goes down, it can restart using one of these logs.


高拓展性

MemSQL是一个高拓展性的分布式系统,集群能够被在任何时候拓展来提供更好的存储能力和处理能力,自动分片和负载均衡,数据是高可用的
并且节点挂了对集群的影响很小。
除了快速,一致性和可拓展,MemSQL也是耐用的,事务作为日志被提交到磁盘并且定期的压缩整个磁盘的快照,如果任何节点挂掉,它能够用这些
日志重启。

Highly Compatible
MemSQL is an ODBC-compatible database. It is wire protocol compatible with MySQL so that applications that use a MySQL driver can connect to and use MemSQL transparently. MemSQL supports a subset of the MySQL syntax, plus extensions to support advanced features not in MySQL such as Distributed SQL, Geospatial, and JSON.


高兼容性

MemSQL是一个兼容ODBC的数据库,同时也兼容MySQL,可以用MySQL的驱动链接到MemSQL并且很方便的使用它,MemSQL提供了MySQL语法的子集,又
加上了一些拓展来支持分布式SQL,Geospatial(地理空间)和JSON

In-Memory and On-Disk Storage
MemSQL supports storing and processing data using a completely-in-memory rowstore or a disk-backed columnstore. The MemSQL in-memory rowstore is best for optimum performance in transactional workloads. The MemSQL columnstore is best for cost-effective data storage of large amounts of historical data for real time analytics. A combination of the MemSQL rowstore and columnstore engines allow merging of real-time and historical data in a single query.


内存和磁盘存储

MemSQL支持使用完全内存和基于磁盘二种方式来存储和处理数据,MemSQL内存行式存储是为了优化事务,列式存储是为了存储大量的
的历史数据来进行实时分析,二者的结合允许在一条查询里面融合历史数据和实时数据。

Tight Spark Integration for Real-Time Data Streaming
MemSQL has tight Apache Spark integration, giving MemSQL users a simple way to create and manage real-time data pipelines. Users can install Apache Spark with one click in MemSQL Ops so they can create custom data extractors and transformers for streaming real-time data into MemSQL.


与Spark实时数据流完美整合

MemSQL与Apache Spark实时数据流完美整合,让用户有一个简单的方式来创造和管理实时数据管道,用户能够方便的在MemSQL ops中安装
Spark,这样他们就可以创建用户数据提取和实时数据流转换到MemSQL。


MemSQL FAQ


MEMSQL常见问题解答


This page addresses some frequently asked questions about MemSQL.

这一页解决了很多常见的MemSQL问题。

常规的


Is MemSQL a storage engine for MySQL?

No. MemSQL is a standalone database that is compatible with the MySQL client. MemSQL includes its own storage engine and SQL-based execution engine built around lock-free data structures and machine code generation.


MemSQL是一个MySQL的存储引擎么?

不,MemSQL是一个与MySQL兼容的独立的数据库,MemSQL包含自己的存储引擎和SQL执行引擎,lock-free数据结构和机器代码生成器。


Is MemSQL a row-based or column-based store?

MemSQL provides both in-memory row-based and on-disk column-based stores.
The in-memory row store
Works best for mixed transactional and analytical workloads
Provides low latency and highly concurrent reads and writes of individual rows
as well as sophisticated analytical SQL queries
Supports PRIMARY and UNIQUE keys
Supports geospatial indexes
Has longer recovery times (as the entire table needs to be loaded into
memory)

The on-disk column store


Works best for analytical workloads
Allows tables larger than the amount of available RAM in the cluster
Uses compression (which lowers disk usage and accelerates replication)
Provides fast and efficient scans of large datasets
Provides sorted columnstore indexes
Is optimized for batch UPDATE and DELETE queries
Requires more expensive query compilation (as compared to the row store)


MemSQL是基于列式存储还是行式存储?

MemSQL提供二种方式存储,
行式存储:
       适合复杂的事务和分析
提供低延迟和每一行的高并发读写和复杂的SQL查询分析
支持PRIMARY和UNIQUE
支持地理坐标索引
       有很长的恢复时间(因为整个表都要被加载到内存)
列式存储:
适合做分析
允许表比集群的RAM大
使用压缩(降低磁盘利用率,加快复制)
提供快速有效的大量数据库的扫描
提供排序的列式存储索引
对批量的UPDATE和DELETE操作提供优化
需要更多的查询编译

How does MemSQL's in-memory lock-free storage engine work?
MemSQL's storage engine uses multi-version concurrency control with lock-free skip lists and lock-free hash tables which allow highly concurrent reads and writes at very high throughput. Reads in MemSQL are never blocked, but updates to the same row can conflict with logical locks.


内存的lock-free存储引擎工作原理?

MemSQL使用多版本并发控制和无锁的跳表和无锁的哈希表以很高的吞吐量的情况下保证高并发的读写,MemSQL的读从来不会被阻塞,但是更新
有可能和逻辑锁冲突。

What is the advantage of MemSQL over traditional databases like Oracle, SQL Server or MySQL with data in a “ramdisk” or large buffer pools?
Two common techniques for leveraging large amounts of memory in traditional databases is storing data files on a “ramdisk” or running a disk-based storage engine with a large buffer pool. For example, MySQL performs much better with InnoDB configured to use a large buffer pool.
While running an existing storage engine like InnoDB in memory can alleviate some of the bottlenecks involved with disk, MemSQL has four distinguishing memory-optimized features that enable it to perform significantly better than disk-based storage engines running in memory:
MemSQL is a distributed scale-out system. MemSQL scales to thousands of machines on commodity hardware.
No buffer pool. Traditional databases manage a global buffer pool since they assume that the dataset can’t fit into memory. The buffer pool is a resource shared across all databases and all tables, which creates significant contention.
Lock-free data structures. MemSQL uses memory-optimized, lock-free skip lists and hash tables as its indexes. Unlike B+ Trees, these data structures are designed from the ground up to be fast in memory.
Code generation. Lock-free data structures are so fast that dynamic SQL interpretation quickly becomes the limiting factor for query execution. With code generation, MemSQL compiles SQL down to native code for maximum performance.

MemSQL相比于传统的,带有内存虚拟盘和大的缓冲池的数据库有什么优点?

传统数据库二种解决大量数据的方式是存储数据在内存虚拟盘或者用基于磁盘的大的缓冲池,例如,MySQL用大的缓冲池表现的比InnoDB表现
的更好。然而当运行一个像InnoDB的存储引擎的时候能够缓解一些磁盘的瓶颈,MemSQL有四种杰出的存储优化来确保它比基于磁盘存储引擎
表现的更好:
MemSQL是一个分布式的,水平拓展的系统,可以在成千上万的商业机器上拓展
没有缓冲池,传统的数据库管理一个全局的缓冲池是因为他们假设数据集不能全部放到内存,缓冲池被所有的数据库和所有的表
共享,这引起了很多的竞争
无锁的数据结构,MemSQL使用内存优化的,没有锁的数据跳表和哈希表作为索引,不像B+树,这些数据结构被设计成能够快速加载到内存中
       代码生成器,无锁的数据结构很快导致动态SQL解释很快成为限制查询执行的瓶颈,有了代码生成器,MemSQL编译SQL成本地的代码以加快
执行速度


What is the advantage of MemSQL over other distributed databases?
Full SQL. MemSQL support full SQL and transactional semantics.
Storage pyramid. MemSQL combines row and column store engines tuned for memory and flash storage.
Scales on commodity hardware. MemSQL doesn’t require exotic hardware and can run on premises or in the cloud.
Enterprise ready. MemSQL supports a large number of enterprise security and manageability features.


MemSQL相比其他的分布式数据库的优点?
完全SQL,MemSQL支持完整的SQL和事务语义
存储金字塔,MemSQL行式存储和列式存储调优成内存和闪存。
在商业化软件拓展,MemSQL不需要额外的硬件并且能够运行在云上
企业支持,MemSQL支持大量的企业安全和管理功能。


What is MemSQL not for?
MemSQL excels at real-time and high throughput query use cases. It is a great general purpose database for running both transactional and analytic workloads. However, there are use cases which MemSQL is not designed to run. Some of these are listed below:
Object store. MemSQL is not designed to be a blob store or "data lake". It is designed for high value data that is structured or semi-structured and ready to query. MemSQL has open-source connectors for integrating with a variety of great object stores, including Amazon S3 and Hadoop File System (HDFS). See How To Load Data Into MemSQL for more information.
Running on low hardware. MemSQL is not designed to run on "micro instances", mobile phones or other low-powered computers. It is designed to run on machines with at least 4 cores and 8GB of RAM. The easiest way to run MemSQL for development is to use the MemSQL Quick Start Docker Container; see Quick Start with Docker.
In-process database. MemSQL is not run as a library or in-process with an application. MemSQL is a distributed database which runs in separate processes from the application, and applications connect to MemSQL via a client driver.
Serializable transactions. MemSQL supports extremely fast, distributed "READ-COMMITTED" transactions, but it is not suitable for applications which require "SERIALIZABLE" transactions.
Full-text search. MemSQL does not have built-in full-text search capabilities. MemSQL supports basic search queries with the LIKE and REGEXP operations and is compatible with full-text search technologies like Sphinx and ElasticSearch/Lucene/Solr, which can connect to MySQL-compatible databases.

MemSQL不适合什么?

MemSQL适合实时和高吞吐量的情况,适合事务性和分析工作,然而,有一些情况是MemSQL不适合的:
对象存储:MemSQL不适合或者blob存储或者数据湖
被设计成存储高价值的数据,也就是结构化或者半结构化的数据,准备去查询,MemSQL有各种开源的连接器和大量对象存储整合,
包括Amazon S3和HDFS 更多的细节看:http://docs.memsql.com/docs/how-to-load-data-into-memsql
运行在低配置的硬件上:
MemSQL不是被运行在“低实例”,移动电话或者其他的低功耗电脑,他被设计运行在至少4核CPU和8G的RAM,用docker安装时最简单
的情况,更多情况看:http://docs.memsql.com/docs/quick-start-with-docker
进程内数据库
MemSQL不是以包或者应用程序内的数据库,MemSQL是一个运行在不同进程的分布式的数据库,应用程序通过客户端驱动连接。
序列化事务
MemSQL支持快速的,分布式的事务,但是不适合序列化的事务
全文搜索
MemSQL没有内置的全文搜索兼容性,MemSQL支持基本的查询LIKE和REGEXP操作,兼容全文查询比如Sphinx and ElasticSearch/
Lucene/Solr,这些可以连接到MemSQL


Why does CREATE TABLE take so long in MemSQL?
In order to speed up the compilation of queries in MemSQL, CREATE TABLE will precompile code used to access and update table indexes. This is a one-time operation for each unique table schema, and compiled table code will be cached on disk for future uses.

    

为什么在MemSQL中创建表如此的慢?

为了加速MemSQL的查询编译,CREATE TABLE将要预编译代码来访问和更新数据库索引,对于每一张表只有这一次操作,并且编译的代码
将要缓存到磁盘用于将来的使用。

Why do MemSQL queries typically run faster the second time they are executed?

Traditional relational database management systems interpret SQL queries the same way interpreters for languages like Python and Ruby run programs. The first time a MemSQL server encounters a given query shape, it will optimize and compile the query for future invocations. This incurs overhead which does not depend on the amount of data to be processed, but rather the complexity of the query. The process of code generation involves extracting parameters from the query then transforming the normalized query into a MemSQL-specific intermediate representation tailored to the system. Subsequent requests with the same shape can reuse this plan to complete both quickly and consistently. Starting with MemSQL 5, MemSQL embeds an industrial compiler (LLVM) for code generation, leading to fast query performance for even the first time queries are run.


为什么MemSQL查询语句第二次被执行的时候就明显的快了很多?

传统的数据库管理系统通过python或者ruby语言制作的解释器每次都是以相同的方式进行解释,但是当MemSQL第一次遇到一个查询语句
的时候,它将要优化和编译查询语句一遍将来调用。这个开销并不来自于数据量的大小而是根据查询的复杂度决定的,代码生成器的工作
包括提取参数,将普通的SQL查询根据系统变为MemSQL,随后的相似的请求能够重新使用这个计划从而又快又一致,从MemSQL 5开始,MemSQL
MemSQL为代码生成器嵌入了工业化的解释器,这就使得第一次查询也可能很快。

How much can you change a query before it needs to be recompiled?
If you only change an integer or string constant in a query, it will not require recompilation.
MemSQL strips out numeric and string parameters from a query and attaches the resulting string to a compiled plan. This string is referred to as a parameterized query. For example, SELECT * FROM foo WHERE id=22 AND name='bar' is parameterized to SELECT * FROM foo WHERE id=@ AND name=^.
You can list the distinct parameterized queries corresponding to all executed queries by running SHOW PLANCACHE.
The one exception to this rule is constants in the projection clause without an alias. These constants are compiled directly into the plan's assembly code for performance reasons. For example, SELECT id + 1, id + 2 AS z FROM foo is converted to SELECT id + 1, id + @ AS z FROM foo.

查询改动多少才会导致重新编译?

如果你只是改了一个整数或者字符串类型,不需要从新编译,MemSQL从查询里面剔除了数字和字符串,把结果字符串发送到编译计划,
这个字符串变成了参数化查询,例如,SELECT * FROM foo WHERE id=22 AND name=‘bar’变成SELECT * FROM foo WHERE id=@ AND name=^
你能够列出不同的参数化查询,通过运行 SHOW PLANCAAHE命令。
这种规则的一个例外是在投影情况下没有别名的常量,这些常量处于性能的原因直接被编译到汇编代码,比如SELECT id + 1, id + 2 AS z FROM foo
变为SELECT id + 1, id + @ AS z FROM foo

Durability
What is the durability guaranteed by MemSQL?
MemSQL provides several options which control tradeoffs between performance and durability (see Durability and Recovery). In its most durable state, MemSQL will not lose any transactions which have been acknowledged. Many users, however, find it useful to risk a bounded amount of data loss to for greatly improved latencies.


耐用性

MemSQL保证什么样的易用性?
MemSQL提供了几个选项来实现可靠性和性能之间的平衡(http://docs.memsql.com/docs/using-durability-and-recovery),在他最可靠的情况下,
MemSQL将要不丢失任何事物,然而,多用户情况下丢失一些数据可以改善延迟的情况。


Can I configure MemSQL to be fully durable?
Yes. You can get full durability at the cost of increased query latency by setting transaction_buffer=0 in memsql.cnf.


我能够定义MemSQL是完全可靠的么?

是的,你可以通过memsql.cnf中的transaction_buffer=0增加查询延时性来保证可靠性


Does being in-memory mean that MemSQL will lose all data upon system failure or restart?

No. Unlike traditional relational database management systems, MemSQL uses RAM as the primary storage for data. However, MemSQL continuously backs up data to disk with transaction logs and periodic snapshots. These features can be tuned all the way from synchronous durability (every write transaction is recorded on disk before the transaction completes) to purely in-memory durability (maximum sustained throughput on writes).
On restart, MemSQL uses the snapshot and log files to recover its state to what it was before shutting down. Because the recovery process is parallelized across CPUs, the bottleneck in this process is the sequential hard drive speed.
See Using Durability and Recovery for more information.

内存数据库是否意味着系统故障或者重启MemSQL会丢失全部数据?

不会,不像传统的关系型数据库管理系统,MemSQL用RAM作为数据的原始存储,然而,MemSQL不断的利用事务日志和周期性镜像向磁盘备份数据,
这些行为从各个方面调优了整个集群的可靠性(事务被写在日志上,然后才会被执行)同时最大化了写的吞吐量。
在重启的情况下,MemSQL使用镜像和日志文件恢复关机前的状态,因为恢复过程是CPU之间并行的,这个操作的瓶颈是硬件驱动速度。
http://docs.memsql.com/docs/using-durability-and-recovery

If MemSQL writes data to disk, how can it be faster than disk-based databases?
Traditional relational database management systems use disk as the primary storage for data and memory as a cache. Managing this caching layer adds bookkeeping overhead and contention thus reducing throughput and concurrency. These constraints result in random read and write I/O, which puts significant pressure on both rotational and solid state disks.
On the other hand, MemSQL stores data primarily in memory and backs it up to disk in a compact format. As a result, MemSQL uses only sequential I/O and the transaction log size is significantly smaller. This I/O pattern is optimized for both rotational and solid state disks. Furthermore, reads in MemSQL can use memory-optimized lock-free skip lists and hash tables that cannot be managed in
a buffer pool.

如果MemSQL向磁盘写入数据,它怎么就是比磁盘数据库快?

传统的关系型数据库管理系统把磁盘作为原始媒介,内存作为缓存,管理缓存层增加了记录和竞争,从而减少了吞吐量和并发度,这些限制导致
了随机的读写IO,这些给磁盘和固态磁盘都带来了压力。另外,MemSQL在内存中存储数据并以压缩形式备份到磁盘,结果,MemSQL只使用顺序IO
并且事务日志体积很小,这种IO模式对于硬盘和固态硬盘都有好处,还有,MemSQL读操作能够使用内存优化的无锁的不能在缓冲池中管理的
跳表和哈希表

What isolation levels does MemSQL provide?
MemSQL provides the "READ COMMITTED" isolation level. This guarantees that no transaction will read any uncommitted data from another transaction. Furthermore, once a change is observed in one transaction, it will be visible to all future transactions.
Unlike the "REPEATABLE READ" or "SNAPSHOT" isolation level, "READ COMMITTED" isolation level does not guarantee that a row will remain the same for every read query in a given transaction. Applications that use MemSQL should take this into account.
Even though regular transactions use "READ COMMITTED" isolation level, backups created using BACKUP command use "SNAPSHOT" isolation level.

MemSQL提供了什么样的隔离级别?

MemSQL提供了“READ COMMITTED”隔离级别,这保证了没有事务将会读取到其他事务没提交的数据,还有,一旦一个事务改变了数据,将来的其他事务
都是看到。不像"REPEATABLE READ"或者"SNAPSHOT"隔离级别,"READ COMMITTED" 隔离级别不保证一个事务中的每一次查询都是相同的,使用MemSQL
的应用程序应该注意到这一点。尽管常规的事务是“READ COMMITTED”隔离级别,备份使用BACKUP命令是“SNAPSHOT”隔离级别。


On which Linux distribution does MemSQL run best?
MemSQL is developed and tested most extensively on Ubuntu 14.04 and CentOS 6.4. See System Requirements for the full list of Linux distributions that are officially supported.

MemSQL在什么样的linux集群运行的最好?

MemSQL被在Ubuntu 14.04和CentOS 6.4开发和测试,http://docs.memsql.com/docs/system-requirements


How much disk space should I allocate for MemSQL?
MemSQL uses disk for three types of storage:


Snapshot and log files that backup row store data. You should allocate about as much space on disk for this purpose as memory on your machine.
Compressed columnstore data files that contain column store data in MemSQL.
Object files that are the result of code generation. This includes for Data Definition Language (DDL) queries like CREATE TABLE and ALTER TABLE and for Data Manipulation Language (DML) queries like INSERT, UPDATE, DELETE and SELECT. On average, these usually require about 0.1 MB per unique plan.
Therefore, you should allocate roughly the amount of memory on your machine + space for compressed column store data + 0.1 MB for each plan. Note that the exact disk requirements will vary with the application, so it is advisable (and usually cheap) to allocate some extra disk space.

我应该为MemSQL保留多少空间?

MemSQL用磁盘存储三种东西:
备份行式存储的快照和日志文件,你应该为这个保留一个内存大小的磁盘空间。
压缩的列式存储的数据文件
代码生成器的结果,这包括DDL查询,比如CREATE TABLE 和 ALTER TABLE,也包括DML操作,比如INSERT, UPDATE, DELETE and SELECT,
平均来说,每个计划应该是0.1MB
因此,你应该保留内存大小+列式存储大小+0.1MB*(each query),注意不同的应用应该有不同的磁盘空间,所以应该比建议多留一些空间。

What happens if I run out of memory?

If the amount of memory used by row store tables (Alloc_table_memory from SHOW STATUS EXTENDED) is greater than the maximum_table_memory global variable (from SHOW GLOBAL VARIABLES), MemSQL will refuse to start new write queries (INSERT, UPDATE and LOAD DATA). Note that DELETE queries are not affected by this limit.
If a currently running query runs out of memory, it will rollback and notify the client of the error. See Memory Management for more information.


内存溢出会发生什么?

如果利用SHOW STATUS EXTENDED命令得到的Alloc_table_memory比SHOW GLOBAL VARIABLES命令得到的全局变量maximum_table_memory要大,
MemSQL将要拒绝新的写操作(INSERT,UPDATE和LOAD DATA),注意DELETE操作不会受到影响。
如果一个当前运行的查询造成了内存溢出,它将要回滚并且通知客户端错误,http://docs.memsql.com/docs/memory-management

What happens if I run out of disk space?
If the amount of available disk space (in the <MEMSQL HOME>/data directory) is less than the minimal_disk_space global variable (from SHOW GLOBAL VARIABLES), MemSQL will refuse to start new write queries (INSERT, UPDATE and LOAD DATA). Note that DELETE queries are not affected by this limit, and the database will remain online for reads.
If a currently running write query exhausts the available disk space before making its changes durable, it will wait until more disk space becomes available before continuing. Queries may appear to "hang" when this happens. To determine how many queries and background threads are
waiting for disk space run SHOW STATUS EXTENDED LIKE 'Threads_waiting_for_disk_space'.

磁盘空间运行完了会发生什么?

如果<MEMSQL HOME>/data下的可用磁盘空间比通过SHOW GLOBAL VARIABLES命令获得的全局变量minimal_disk_space小,MemSQL将要拒绝
新的写操作(INSERT,UPDATE和LOAD DATA),注意DELETE操作不会受到影响。数据库依然会在线来提供读操作。如果当前的查询导致
磁盘空间不够,它将要等待,直到磁盘空间被开辟,查询在这种情况下会被挂起,为了查出有多少查询和后台线程等待磁盘空间,
运行命令SHOW STATUS EXTENDED LIKE 'Threads_waiting_for_disk_space'

How does MemSQL shard tables?
Every distributed table (except reference tables, which are replicated in whole on each "leaf" node) has a SHARD KEY that specifies which columns of a row to hash to determine what partition a row should reside in. When rows are inserted into a sharded table, they are hashed by the table’s shard key and sent to the leaf carrying the corresponding partition. This technique is commonly referred to as hash-based partitioning. You can choose how to shard each table by specifying its SHARD KEY as part of the CREATE TABLE statement. See Distributed SQL for more details.

MemSQL是怎样分享表的?

每一个分布式的表(除了参照表,它被完整的复制到叶子节点)有一个SHARD KEY决定一行的那些列参与哈希运算,哈希值决定这一行应该存储在哪里,
当一行插入到分布式的表中,他们被表的SHARD KEY哈希,携带者相应的分区信息发送到叶子节点,这个技术被称为哈希分区,你能够决定如何hash
一张表,http://docs.memsql.com/docs/distributed-sql

What are aggregator and leaf nodes?
MemSQL stores and computes data on leaf nodes. You can linearly scale both storage and computational power by adding more leaf nodes. Clients query an aggregator node, which in turn queries one or more leaf nodes to collect the rows required to execute the query. Multiple aggregators nodes perform the same functions with respect to executing Data Manipulation Language (DML) queries and allow clients to load-balance queries across the aggregators. Leaf nodes should not be queried directly except for maintenance purposes in exceptional situations.

什么是aggregator和leaf节点?

MemSQL在叶子节点上存储和计算数据,你能够通过增加叶子节点来线性的增加存储和计算能力,客户端查询一个aggregator节点,aggregator查询叶子节点
来手机执行这个查询的行数据,多个aggregator执行相同的函数并允许客户负载均衡,不应该直接查询叶子节点,除非是为了维护这种特殊情况。

What is a "master aggregator"?
The Master Aggregator is an aggregator responsible for executing DDL and clustering operations (e.g. ADD LEAF ... or CREATE TABLE...).

什么是master aggregator?

master aggregator是一个执行DDL和集群操作(ADD LEAF和CREATE TABLE)的aggregator的代表。

What happens if the master aggregator crashes?
If the Master Aggregator becomes unresponsive, clients can continue to execute DML queries (e.g. INSERT and SELECT) against the other aggregators, but DDL and clustering operations can not be performed until the master aggregator is revived or another aggregator is "promoted" to be the master aggregator.

如果master aggregator挂了会发生什么?

如果master aggregator不可响应,客户端能通过其他的aggregator继续执行DML操作,但是DDL和集群操作必须等待master aggregator被执行或者另一个
aggregator被提升到master aggregator。

How many aggregator and leaf nodes do I need?
MemSQL stores data in leaf nodes, so you need enough leaf nodes to store all your data in memory. If you are replicating data (redundancy level 2), you need twice as many leaf nodes.
The recommended number of aggregators depends on your use case. If, for instance, your cluster is being used for more than one type of workload (for example, it is the backend for a web application and also being queried by analysts), it is probably best to have multiple aggregators, or pools of aggregators, for these separate workloads. Aside from distribution of workload, the most significant factor to consider is network bandwidth. As a rule of thumb, clusters with 50 nodes or fewer should have about a 5:1 leaf to aggregator ratio. Clusters with more than 50 nodes can have closer to a 10:1 leaf to aggregator ratio. Note that you can also add nodes to a cluster to tune performance after it is up and running.
The appropriate ratio of aggregators to leaves also depends on the type of workload running. Transactional workloads that run many small queries or queries that involve only a single partition require more aggregators, since those queries interact with one aggregator and one leaf. Analytical workloads,especially those involving distributed joins, require fewer aggregators because almost all the work is performed on the leaves.

我需要多少的aggregator和leaf?

MemSQL在叶子节点上存储数据,所以你需要足够的叶子节点在内存中存储你的全部数据,如果你要重复存储,你需要二倍的叶子节点。
aggregator的数量根据你的应用情况来决定,例如,如果你的集群有多重用途(比如是web应用的后端,也是用于分析查询),这样来讲,最好有多个aggregator,
或者aggregator池,除了这个因素,还有一个很重要的因素是网络带宽,首要的规则是,50个节点或者更少应该有leaf:aggregator=5:1的比例,多于50个节点应该
是10:1的比例,注意到你能够在集群运行的时候增加他的节点。
合适的比例也要根据工作的类型来确定,包含很多小查询的事务或者单一分区的查询需要更多的aggregator,因为这些查询和一个aggregator和一个leaf交互,分析类
的交互,尤其是分布式的join,需要更少的aggregator,因为主要的工作都在叶子节点上做。

Can I JOIN multiple sharded tables in a query?
Yes. MemSQL supports advanced join capabilities and will automatically redistribute data as necessary to complete a query. MemSQL can also take advantage of collocated data across shard keys and reference tables to reduce data movement. See Distributed Joins.

我能够在一个查询中join多张分享表么?

是的,MemSQL支持先进的join兼容性,并且为了完成一个查询必要的情况下将会自动的重新分布数据,MemSQL能够通过SHARD KEY和引用表来减少数据移动
http://docs.memsql.com/docs/distributed-sql#section-distributed-joins

Can I optimize a distributed join involving a small, static table?
Yes, a small table which does not change frequently can be made into a reference table, which is replicated to all the leaf nodes. This ensures that the table does not need to be moved when joined against, at the cost of using more memory. See Distributed SQL.

我能够优化一个包含小的,固定的表的分布式jion么?

是的,一个小的,不经常被修改的表可以做成引用表,它被复制到所有的叶子节点,这样确保了表在join操作的时候不需要被移动,http://docs.memsql.com/docs/distributed-sql

Can I optimize a distributed join involving a small, static table?
Yes, a small table which does not change frequently can be made into a reference table, which is replicated to all the leaf nodes. This ensures that the table does not need to be moved when joined against, at the cost of using more memory. See Distributed SQL.

我为什么会得到UNIQUE KEY的错误?

MemSQL不支持unique key除非unique key包含shard key,http://docs.memsql.com/docs/distributed-sql#section-shard-keys

How can I backup a MemSQL database?
MemSQL supports consistent, online, cluster-wide BACKUP and RESTORE operations that do not require blocking write operations on the database like mysqldump does. See Backing up and Restoring MemSQL section for more information.

我如何备份MemSQL数据库?

MemSQL支持一致的,在线的,集群范围的备份和恢复,不需要想musqldump一样锁住写操作,http://docs.memsql.com/docs/backing-up-and-restoring-data

How can I import data from MySQL, Postgres, MS-SQL etc?
See How To Load Data Into MemSQL

我如何从MySQL,Postgres,MS-SQL复制数据到MemSQL?

http://docs.memsql.com/docs/loading-data-into-memsql

How can I easily copy a table?
You can use CREATE TABLE dest AS SELECT * FROM source. See CREATE TABLE.
Or, create a new empty table using the schema of the original table from SHOW CREATE TABLE source and copy data from source table into the new table using INSERT INTO dest SELECT * FROM source.

我怎样简单的复制一个表?

CREATE TABLE dest AS SELECT * FROM source或者创建一个空表SHOW CREATE TABLE source然后把数据复制过去INSERT INTO dest SELECT * FROM source

How can I easily copy a database?
There are two options:
Use replication. Run REPLICATE DATABASE dest_db FROM user@host:port/src_db, and after it fully synchronizes run STOP REPLICATING dest_db. See Replication .
BACKUP the database and RESTORE it on the same (or a different) cluster under a different name.

我怎样简单的复制数据库?

二种方法:
REPLICATE DATABASE dest_db FROM user@host:port/src_db,之后运行STOP REPLICATING dest_db 看doc:admin/replication
备份和恢复使用不同的名字

Where are the important data files (recovery log, binary logs, snapshots, data files etc)?
Run SHOW STATUS EXTENDED LIKE ‘%_directory’ to get the full paths.

重要的数据都在哪里?

SHOW STATUS EXTENDED LIKE ‘%_directory’

How are MemSQL and Apache Spark related?
MemSQL and Apache Spark are both distributed, in-memory technologies. MemSQL is a SQL database, while Spark is a general computation framework. MemSQL has tight integration with Apache Spark through its integrated MemSQL Spark solution offerings. Specifically, within MemSQL Ops, users can deploy a Spark cluster colocated with a MemSQL cluster, and leverage MemSQL and Spark functionality together. For instance, with MemSQL and Spark clusters deployed, users can use MemSQL Streamliner to quickly configure, run, and manage real-time data pipelines - extracting data from real-time sources such as Kafka, transforming the data within Spark, and finally loading the data into MemSQL.

MemSQL和Spark是怎么关联的?

MemSQL和Spark都是分布式的,基于内存的技术,MemSQL是数据库,Spark是通用计算框架,他们二个可以紧密结合,用户可以通过MemSQL ops部署Spark,
例如,安装二者之后,用户可以用MemSQL Streamliner快速的配置,运行和管理实时数据流,从实时数据流例如kafka中提取数据,在Spark中处理数据,
最终把数据加载到MemSQL.

What are the differences between MemSQL and Spark SQL?
Spark SQL treats datasets (RDDs) as immutable - there is currently no concept of an INSERT, UPDATE, or DELETE. You could express these concepts as a transformation, but this operation returns a new RDD rather than updating the dataset in place. In contrast, MemSQL is an operational database with full transactional semantics.
MemSQL supports updatable relational database indexes. The closest analogue in Spark is IndexRDD, which is currently under development, and provides updateable key/value indexes.

MemSQL和Spark SQL的不同?

Spark SQL对待数据集(RDD)是不可变的,也就是没有INSERT,UPDATE,DELETE的概念,你能够吧这些概念表达为转化,但是这些操作返回新的RDD而不是
更新数据集,MemSQL和这个完全不同。
MemSQL支持可更新的关系型数据库索引,最相似的概念在Spark是IndexRDD,还在开发中

How do MemSQL and Spark software interact with each other?
Manually through the MemSQL Spark Connector:


The MemSQL Spark Connector is an open source library that can be added as a dependency for any Spark application. Under the hood, it creates a mapping between MemSQL database partitions and Spark RDD partitions. It takes advantage of both systems’ distributed architectures to load data in parallel. The connector includes the MemSQLRDD class - allowing the user to create a Spark RDD from the result of a SQL query in MemSQL, and the saveToMemSQL function which makes it easy to write data into MemSQL after processing in Spark.
Through MemSQL Ops:
MemSQL Ops can deploy a Spark cluster and link it to MemSQL by leveraging the MemSQL Spark Connector and the MemSQL Spark Interface under the hood. The MemSQL Spark Interface is a Spark application that serves as the interface for MemSQL Ops to create and manage real-time data pipelines within Spark.

MemSQL和Spark如何交互?

1.MemSQL Spark Connector是一个开源的类库能够为任何Spark应用增加依赖,在底层,它创造了一个MemSQL数据库的分区和Spark RDD分区的对应,
它利用了二个系统的分布式的优点并行的加载数据。连接器包含MemSQLRDD类,这个类允许用户创建从SQL查询结果创建Spark RDD,并且saveToiMemSQL函数
使得它很简单的就可以把Spark处理过的结果写入MemSQL。
2.通过MemSQL Ops:
MemSQL Ops可以借助于MemSQL Connector和MemSQL Spark接口部署Spark集群并且把它连接到MemSQL,MemSQL Spark结构是为了MemSQL ops在Spark内部管理实时
数据的Spark应用程序。

Does MemSQL Streamliner support "exactly once" semantics when consuming data from Kafka?
Yes. Streamliner has very precise "at least once" semantics when consuming data from Kafka. MemSQL leverages its transactional nature to provide stronger semantics than what Kafka and Spark can offer out of the box, with much higher performance and precision (i.e. the "at least once" window of repeated values is minimal). That said, if a user really wants "exactly-once" semantics, they can achieve it by loading into the the row store with a unique key.

MemSQL Streamliner是否支持从kafka消费数据有且仅有一次?

是的,Streamliner有很珍贵的从kafka消费的至少一次的语义,MemSQL借助于他的事务特性提供了强大的语义,比kafka和Spark组合更强大,更好的表现和
准确性,这说明,如果用户想要有且仅有一次,可以利用唯一键来加载数据。

What happens if SQL push down fails?
The MemSQL Connector takes a best effort approach towards query push down. While Spark is preparing the query for execution, the MemSQL push down strategy attempts to push down every subtree starting with the entire query. If anything fails, we simply leave the tree as is and Spark handles executing the unsupported section of the tree.

SQL下推失败会发生什么?

MemSQL连接器尽最大的努力去SQL下推,当Spark准备了SQL去执行,MemSQL下推从整个查询开始的每个子树,如果失败,我们简单的保留树原来的样子,Spark
执行树中的不支持的部分

How can I check to see if a query is pushed down?
Every DataFrame has a method called .explain which will print the final plan before execution. If the first element in that plan is a MemSQLPhysicalRDD then the DataFrame has been fully pushed down.

怎么查看一个查询是否下推成功?

每一个DataFrame都有一个方法叫做.explain,打印执行前的最终计划,如果这个计划的第一个节点是MemSQLPhysicalRDD,说明成功

What are the index types MemSQL supports?
The in-memory row store supports skip lists, hashtables and geospatial indexes. The on-disk column store supports ordered columnstore indexes.

MemSQL支持什么样的索引

内存索引支持跳表,哈希表和地址索引,磁盘的索引是有序的列索引



















0 0