mongodb源码分析(五)查询2之mongod的数据库加载

来源：互联网发布：webstorm js代码格式化编辑：程序博客网时间：2024/05/07 17:05

上一篇文章分析到了客户端查询请求的发送，接着分析服务端的处理动作，分析从服务端响应开始到数据库

正确加载止,主要流程为数据库的读入过程与用户的认证.

mongod服务对于客户端请求的处理在mongo/db/db.cpp MyMessageHandler::process中，其中调用了

函数assembleResponse完成请求响应,我们就从这个函数开始入手分析,代码很长,删除一些支流或者不相关的代码.

    void assembleResponse( Message &m, DbResponse &dbresponse, const HostAndPort& remote ) {        if ( op == dbQuery ) {            if( strstr(ns, ".$cmd") ) {                isCommand = true;                opwrite(m);//写入诊断用的log,默认loglevel为0,未开启,需要开启启动时加入--diaglog x,0 = off; 1 = writes, 2 = reads, 3 = both                if( strstr(ns, ".$cmd.sys.") ) {//7 = log a few reads, and all writes.                    if( strstr(ns, "$cmd.sys.inprog") ) {                        inProgCmd(m, dbresponse);//查看当前进度的命令                        return;                    }                    if( strstr(ns, "$cmd.sys.killop") ) {                        killOp(m, dbresponse);//终止当前操作                        return;                    }                    if( strstr(ns, "$cmd.sys.unlock") ) {                        unlockFsync(ns, m, dbresponse);                        return;                    }                }            }            else {                opread(m);            }        }        else if( op == dbGetMore ) {            opread(m);        }        else {            opwrite(m);        }        long long logThreshold = cmdLine.slowMS;//启动的时候设置的参数默认是100ms,当操作超过了这个时间且启动时设置--profile为1或者2        bool shouldLog = logLevel >= 1;//时mongodb将记录这次慢操作,1为只记录慢操作,即操作时间大于了设置的slowMS,2表示记录所有操作        if ( op == dbQuery ) {         //可通过--slowms设置slowMS            if ( handlePossibleShardedMessage( m , &dbresponse ) )//这里和shard有关,以后会的文章会讲到                return;            receivedQuery(c , dbresponse, m );//真正的查询入口        }        else if ( op == dbGetMore ) {//已经查询了数据,这里只是执行得到更多数据的入口            if ( ! receivedGetMore(dbresponse, m, currentOp) )                shouldLog = true;        }                if ( op == dbKillCursors ) {                    currentOp.ensureStarted();                    logThreshold = 10;                    receivedKillCursors(m);                }                else if ( op == dbInsert ) {//插入操作入口                    receivedInsert(m, currentOp);                }                else if ( op == dbUpdate ) {//更新操作入口                    receivedUpdate(m, currentOp);                }                else if ( op == dbDelete ) {//删除操作入口                    receivedDelete(m, currentOp);                }        if ( currentOp.shouldDBProfile( debug.executionTime ) ) {//该操作将被记录,原因可能有二:一,启动时设置--profile 2,则所有操作将被            // performance profiling is on                    //记录.二,启动时设置--profile 1,且操作时间超过了默认的slowMs,那么操作将被            else {//这个地方if部分被删除了,就是在不能获取锁的状况下不记录该操作的代码                Lock::DBWrite lk( currentOp.getNS() );//记录具体记录操作,就是在xxx.system.profile集合中插入该操作的具体记录                if ( dbHolder()._isLoaded( nsToDatabase( currentOp.getNS() ) , dbpath ) ) {                    Client::Context cx( currentOp.getNS(), dbpath, false );                    profile(c , currentOp );                }            }        }

前进到receivedQuery,其解析了接收到的数据,然后调用runQuery负责处理查询,然后出来runQuery抛出的异常,直接进入runQuery.

    string runQuery(Message& m, QueryMessage& q, CurOp& curop, Message &result) {        shared_ptr<ParsedQuery> pq_shared( new ParsedQuery(q) );        if ( pq.couldBeCommand() ) {//这里表明这是一个命令,关于mongodb的命令的讲解这里有一篇文章,我就不再分析了.            BSONObjBuilder cmdResBuf;//http://www.cnblogs.com/daizhj/archive/2011/04/29/mongos_command_source_code.html            if ( runCommands(ns, jsobj, curop, bb, cmdResBuf, false, queryOptions) ){}        bool explain = pq.isExplain();//这里的explain来自这里db.coll.find().explain(),若使用了.explain()则为true,否则false        BSONObj order = pq.getOrder();        BSONObj query = pq.getFilter();        // Run a simple id query.        if ( ! (explain || pq.showDiskLoc()) && isSimpleIdQuery( query ) && !pq.hasOption( QueryOption_CursorTailable ) ) {            if ( queryIdHack( ns, query, pq, curop, result ) ) {//id查询的优化                return "";            }        }        bool hasRetried = false;        while ( 1 ) {//这里的ReadContext这这篇文章的主角,其内部在第一次锁数据库时完成了数据库的加载动作                Client::ReadContext ctx( ns , dbpath ); // read locks                replVerifyReadsOk(&pq);//还记得replset模式中无法查询secondary服务器吗,就是在这里限制的                BSONObj oldPlan;                if ( ! hasRetried && explain && ! pq.hasIndexSpecifier() ) {                    scoped_ptr<MultiPlanScanner> mps( MultiPlanScanner::make( ns, query, order ) );                    oldPlan = mps->cachedPlanExplainSummary();                }//这里才是真正的查询,其内部很复杂,下一篇文章将讲到                return queryWithQueryOptimizer( queryOptions, ns, jsobj, curop, query, order,                                                pq_shared, oldPlan, shardingVersionAtStart,                                                 pgfs, npfe, result );            }        }    }

Client::ReadContext::ReadContext(const string& ns, string path, bool doauth ) {        {            lk.reset( new Lock::DBRead(ns) );//数据库锁,这里mongodb的锁机制本文将不会涉及到,感兴趣的自己分析            Database *db = dbHolder().get(ns, path);            if( db ) {//第一次加载时显然为空                c.reset( new Context(path, ns, db, doauth) );                return;            }        }        if( Lock::isW() ) { //全局的写锁// write locked already                DEV RARELY log() << "write locked on ReadContext construction " << ns << endl;                c.reset( new Context(ns, path, doauth) );            }        else if( !Lock::nested() ) {             lk.reset(0);            {                Lock::GlobalWrite w;//加入全局的写锁,这里是真正的数据库加载地点                Context c(ns, path, doauth);            }            // db could be closed at this interim point -- that is ok, we will throw, and don't mind throwing.            lk.reset( new Lock::DBRead(ns) );            c.reset( new Context(ns, path, doauth) );        }    }

    Client::Context::Context(const string& ns, string path , bool doauth, bool doVersion ) :        _client( currentClient.get() ),         _oldContext( _client->_context ),        _path( path ),         _justCreated(false), // set for real in finishInit        _doVersion(doVersion),        _ns( ns ),         _db(0)     {        _finishInit( doauth );    }

继续看_finishInit函数:

    void Client::Context::_finishInit( bool doauth ) {        _db = dbHolderUnchecked().getOrCreate( _ns , _path , _justCreated );//读取或者创建数据库        checkNsAccess( doauth, writeLocked ? 1 : 0 );//认证检查    }

    Database* DatabaseHolder::getOrCreate( const string& ns , const string& path , bool& justCreated ) {        string dbname = _todb( ns );//将test.coll这种类型的字符串转换为test        {            SimpleMutex::scoped_lock lk(_m);            Lock::assertAtLeastReadLocked(ns);            DBs& m = _paths[path];//在配置的路径中找到已经加载的数据库,直接返回            {                DBs::iterator i = m.find(dbname);                 if( i != m.end() ) {                    justCreated = false;                    return i->second;                }            }        Database *db = new Database( dbname.c_str() , justCreated , path );//实际的数据读取        {            SimpleMutex::scoped_lock lk(_m);//数据库加载完成后按照路径数据库记录            DBs& m = _paths[path];            verify( m[dbname] == 0 );            m[dbname] = db;            _size++;        }        return db;    }

    Database::Database(const char *nm, bool& newDb, const string& _path )        : name(nm), path(_path), namespaceIndex( path, name ),          profileName(name + ".system.profile")    {        try {            newDb = namespaceIndex.exists();//查看xxx.ns文件是否存储,存在表示数据库已经创建            // If already exists, open.  Otherwise behave as if empty until            // there's a write, then open.            if (!newDb) {                namespaceIndex.init();//加载具体的xxx.ns文件                if( _openAllFiles )                    openAllFiles();//加载所有的数据文件xxx.0,xxx.1,xxx.2这种类型的文件            }            magic = 781231;    }

继续看namespaceIndex::init函数,若其未初始化则调用_init初始化,初始化了则什么也不做,直接去到namespaceIndex::_init

    NOINLINE_DECL void NamespaceIndex::_init() {        unsigned long long len = 0;        boost::filesystem::path nsPath = path();//xxx.ns        string pathString = nsPath.string();        void *p = 0;        if( boost::filesystem::exists(nsPath) ) {//如果存在该文件,则使用内存映射文件map该文件            if( f.open(pathString, true) ) {//这里f为MongoMMF对象                len = f.length();                if ( len % (1024*1024) != 0 ) {                    log() << "bad .ns file: " << pathString << endl;                    uassert( 10079 ,  "bad .ns file length, cannot open database", len % (1024*1024) == 0 );                }                p = f.getView();//这里得到map的文件的指针            }        }        else {            // use lenForNewNsFiles, we are making a new database            massert( 10343, "bad lenForNewNsFiles", lenForNewNsFiles >= 1024*1024 );            maybeMkdir();            unsigned long long l = lenForNewNsFiles;//创建具体的ns文件,默认大小是16M,可以用--nssize 来设置大小,MB为单位,只对新创建的数据库            if( f.create(pathString, l, true) ) {   //起作用                getDur().createdFile(pathString, l); // always a new file                len = l;                verify( len == lenForNewNsFiles );                p = f.getView();            }        }        verify( len <= 0x7fffffff );        ht = new HashTable<Namespace,NamespaceDetails>(p, (int) len, "namespace index");        if( checkNsFilesOnLoad )            ht->iterAll(namespaceOnLoadCallback);    }

继续看MongoMMF::open流程:

    bool MongoMMF::open(string fname, bool sequentialHint) {        LOG(3) << "mmf open " << fname << endl;        setPath(fname);        _view_write = mapWithOptions(fname.c_str(), sequentialHint ? SEQUENTIAL : 0);//这里是真正的映射,        return finishOpening();    }

    bool MongoMMF::finishOpening() {        if( _view_write ) {            if( cmdLine.dur ) {//开启了journal功能后创建一个私有的map,这个日志功能我将以后专门写一篇文章分析.                _view_private = createPrivateMap();                if( _view_private == 0 ) {                    msgasserted(13636, str::stream() << "file " << filename() << " open/create failed in createPrivateMap (look in log for more information)");                }                privateViews.add(_view_private, this); // note that testIntent builds use this, even though it points to view_write then...            }            else {                _view_private = _view_write;            }            return true;        }        return false;    }

回到namespaceIndex::_init函数:

        ht = new HashTable<Namespace,NamespaceDetails>(p, (int) len, "namespace index");

这里有必要关注下NamespaceDetails结构,每一个集合对应于一个NamespaceDetails结构,该结构作用如下(来自NamespaceDetails结构的上的描述)

NamespaceDetails : this is the "header" for a collection that has all its details.
It's in the .ns file and this is a memory mapped region (thus the pack pragma above).

    class NamespaceDetails {    public:        enum { NIndexesMax = 64, NIndexesExtra = 30, NIndexesBase  = 10 };        /*-------- data fields, as present on disk : */        DiskLoc firstExtent;//记录第一个extent,在分析数据的插入时会具体讨论mongodb的存储        DiskLoc lastExtent;//记录的最后一个extent        /* NOTE: capped collections v1 override the meaning of deletedList.                 deletedList[0] points to a list of free records (DeletedRecord's) for all extents in                 the capped namespace.                 deletedList[1] points to the last record in the prev extent.  When the "current extent"                 changes, this value is updated.  !deletedList[1].isValid() when this value is not                 yet computed.        */        DiskLoc deletedList[Buckets];        // ofs 168 (8 byte aligned)        struct Stats {            // datasize and nrecords MUST Be adjacent code assumes!            long long datasize; // this includes padding, but not record headers            long long nrecords;        } stats;        int lastExtentSize;        int nIndexes;    private:        // ofs 192        IndexDetails _indexes[NIndexesBase];//10个索引保存到这里,若1个集合索引超过10其它的索引以extra的形式存在,extra地址保存在下面的        // ofs 352 (16 byte aligned)        //extraOffset处        int _isCapped;                         // there is wasted space here if I'm right (ERH)        int _maxDocsInCapped;                  // max # of objects for a capped table.  TODO: should this be 64 bit?        double _paddingFactor;                 // 1.0 = no padding.        // ofs 386 (16)        int _systemFlags; // things that the system sets/cares about    public:        DiskLoc capExtent;        DiskLoc capFirstNewRecord;        unsigned short dataFileVersion;       // NamespaceDetails version.  So we can do backward compatibility in the future. See filever.h        unsigned short indexFileVersion;        unsigned long long multiKeyIndexBits;    private:        // ofs 400 (16)        unsigned long long reservedA;        long long extraOffset;                // where the $extra info is located (bytes relative to this)    public:        int indexBuildInProgress;             // 1 if in prog    private:        int _userFlags;        char reserved[72];        /*-------- end data 496 bytes */}

从这里可以明白ns保存了所有集合的头信息,其中包括了该集合的起始位置，结束位置以及索引所在.

_init函数执行完毕,网上回到Database::Database()函数:

                if( _openAllFiles )                    openAllFiles();//这里映射所有的xx.0,xx.1这种文件,记录映射的文件,映射的方式如同映射xx.ns,在开启了journal时同时保存两份地址.这里不再分析,感兴趣的自己研究吧

至此数据库的映射工作完成.往上回到Client::Context::_finishInit函数,下面来看看权限的检查函数checkNsAccess,其最终调用了下面的函数,通过认证返回true,

未通过将返回false,返回false,将导致mongod向客户端发送未认证信息,客户端的操作请求失败

    bool AuthenticationInfo::_isAuthorized(const string& dbname, Auth::Level level) const {        if ( noauth ) {//启动时可--noauth设置为true,--auth设置为false,默认为false            return true;        }        {            scoped_spinlock lk(_lock);    //查询dbname这个数据库是否已经得到认证,这里的认证数据是在mongo启动时连接服务端认证通过后保存的            if ( _isAuthorizedSingle_inlock( dbname , level ) )                return true;            if ( _isAuthorizedSingle_inlock( "admin" , level ) )                return true;            if ( _isAuthorizedSingle_inlock( "local" , level ) )                return true;        }        return _isAuthorizedSpecialChecks( dbname );//若未通过上面的认证将会查看是否打开了_isLocalHostAndLocalHostIsAuthorizedForAll,也就是该连接是否是来自于本地连接.    }

本文到这里结束,主要是搞清楚了mongod接收到来自客户端请求后的执行流程到数据库的加载,重要的

是明白ns文件的作用,普通数据文件xx.0,xx.1的映射,下一篇文章我们将继续分析查询请求的处理.

本文链接:http://blog.csdn.net/yhjj0108/article/details/8255968

作者: yhjj0108,杨浩