MongoDb Architecture(index)-------MongoDb的体系结构(索引)

来源:互联网 发布:获取网页数据 编辑:程序博客网 时间:2024/06/07 03:30

与RDBMS(关系型数据库管理系统)的主要不同之处

mangoDB与RDBMS的主要不同有如下几处:

1.不像RDBMS记录那样是整齐的(数据类型的数目是固定一直的)。MongoDb的基本组成单元是“文件”,该文件是嵌套的并且可以包含多值字段(数组,散列)。

2.不像RDBMS 那样所有的记录都存储在必须与表模式相一致的表内。任何结构的MangoDB文件都可以存储在同一个集合内。
3.在查询过程中没有连接操作。总之,数据最好以较为不规范的方式组织,同时将数据的一直性的保障尽量交给应用开发人员去处理。

4.在MongoDb中没有“事务”的概念。“原子性”不仅在文档级受保障(不会有一个文档的局部更新会发生)

5.没有“隔离”的概念,任何一个从客户端读出的数据可以受其他客户的并发修改。

如果除去那些RDBMS的经典特征,MongoDB在处理大数据量时将是一个更加轻量和更易扩展的的数据库。

 

Major difference from RDBMS
MongoDb differs from RDBMS in the following way

  • Unlike an RDBMS record which is "flat" (a fixed number of simple data type), the basic unit of MongoDb is "document" which is "nested" and can contain multi-value fields (arrays, hash).
  • Unlike RDBMS where all records stored in a table must be confined to the table schema, documents of any structure can be stored in the same collection.
  • There is no "join" operation in the query. Overall, data is encouraged to be organized in a more denormalized manner and the more burden of ensuring data consistency is pushed to the application developers
  • There is no concept of "transaction" in MongoDb. "Atomicity" is guaranteed only at the document level (no partial update of a document will occurred).
  • There is no concept of "isolation", any data read by one client may have its value modified by another concurrent client.

By removing some of those features that a classical RDBMS will provide, MongoDb can be more light-weight and be more scalable in processing big data.

 

 

Query processing
MongoDb belongs to the type of document-oriented DB. In this model, data is organized as JSON document, and store into a collection. Collection can be thought for equivalent to Table and Document is equivalent to records in RDBMS world.

MongoDB属于面向文档型的数据库,在这种模型中,数据被组织成JSON文档的形式,并且存储在集合中,集合与表等价,文档与RDBMS中的记录对应。

# create a doc and save into a collection
> p = {firstname:"Dave", lastname:"Ho"}
> db.person.save(p)
> db.person.insert({firstname:"Ricky", lastname:"Ho"})

# Show all docs within a collection
> db.person.find()

# Iterate result using cursor
> var c = db.person.find()
> p1 = c.next()
> p2 = c.next()

To specify the search criteria, an example document containing the fields that needs to match against need to be provided.

为了明确查询条件,一个包含那个需要匹配的字段的例子文档需要提供出来。

 

 

> p3 = db.person.findone({lastname:"Ho"})

Notice that in the query, the value portion need to be determined before the query is made (in other words, it cannot be based on other attributes of the document). For example, lets say if we have a collection of "Person", it is not possible to express a query that return person whose weight is larger than 10 times of their height.
注意在这个查询中,在查询开始前,值部分是需要确定的(换句话说,他不能根据文件的其他属性)。例如,如果说我们有一个“人”的集合(表),不可能写这么一个查询:返回体重大于他10被的人的信息。

 

To speed up the query, index can be used. In MongoDb, index is stored as a BTree structure (so range query is automatically supported). Since the document itself is a tree, the index can be specified as a path and drill into deep nesting level inside the document.
为了加速查询操作,索引是一个不错的选择。在MongoDb中,索引以B树的结构存储(所以范围搜索是自动支持的)。因为文档(记录)本身是一个树,索引可以被指定为一个路径并且可以进入文档的深度嵌套层。

# To build an index for a collection> db.person.ensureIndex({firstname:1})# To show all existing indexes> db.person.getIndexes()# To remove an index> db.person.dropIndex({firstname:1})# Index can be build on a path of the doc.> db.person.ensureIndex({"address.city":1})# A composite key can be used to build index> db.person.ensureIndex({lastname:1, firstname:1})



Index can also be build on an multi-valued attribute such as an array. In this case, each element in the array will have a separate node in the BTree.
索引也可以以多值属性的的形式建立,比如数组。在这种情况下,数组中的每个元素将在B树中又不同的节点。
Building an index can be done in both offline foreground mode or online background mode. Foreground mode will proceed much faster but the DB cannot be access during the build index period. If the system is running in a replica set (describe below), it is recommended to rotate each member DB offline and build the index in foreground.

索引的建立既可以在离线的前端进行,也可以在在线的后端模式进行。在后台模式下回较快一些,不过数据库在这个期间是不可以访问数据库的。如果系统是运行在一个副本集合上,最后重复每一个离线的数据成员并且在前端模式下建立索引。

When there are multiple selection criteria in a query, MongoDb attempts to use one single best index to select a candidate set and then sequentially iterate through them to evaluate other criteria.

当在查询中有多重条件时,Mongodb使用用最简单最佳的索引来选择一个候选集然后顺序的迭代来评价其他的条件。

When there are multiple indexes available for a collection. When handling a query the first time, MongoDb will create multiple execution plans (one for each available index) and let them take turns (within certain number of ticks) to execute until the fastest plan finishes. The result of the fastest executor will be returned and the system remembers the corresponding index used by the fastest executor. Subsequent query will use the remembered index until certain number of updates has happened in the collection, then the system repeats the process to figure out what is the best index at that time.

当对一个集合多个索引都可以使用时,当第一次处理一个查询的时候。MongoDB会建立一个多重执行计划(每一个计划对应一个索引)并且让他们按序的执行,只到最快的那个产生。最快的那个会返回并且系统会记住相对性的索引。后来的查询将使用这个被记住的索引,除非更新了一定次数的集合,然后系统会重新执行这个过程来得出在这个时候哪个是最好的索引。

 

Since only one index will be used, it is important to look at the search or sorting criteria of the query and build additional composite index to match the query better. Maintaining an index is not without cost as index need to be updated when docs are created, deleted and updated, which incurs overhead to the update operations. To maintain an optimal balance, we need to periodically measure the effectiveness of having an index (e.g. the read/write ratio) and delete less efficient indexes.

由于只有一个索引会被用到,重要的是观察查询的搜索和排序条件和建立附加索引来匹配出较好的查询操作。维持一个索引并不是不需要开销的,这是因为当文档建立,删除和更新时,索引也需要更新。为了维持一个最佳的平衡,我们需要定期的衡量索引的有效性并且删除效率低得索引。

 

外语原文:http://horicky.blogspot.jp/2012/04/mongodb-architecture.html

 

原创粉丝点击