从内核的角度看linux文件系统以及一些类似于mount命令的原理（未完）

来源：互联网发布：js获取json的key 编辑：程序博客网时间：2024/06/05 15:52

最近在玩linux嵌入式编程，涉及到文件系统的一些地方不是很懂比如说mount的原理，作为一个有强迫症的人感觉心里痒痒的再加上之前我看过一些内核的书籍有些基础，于是我马上找到相关内核代码跟了一下，且看了一下一些前辈的文章，对linux文件系统大致有了算是比较深入的了解吧。

好了现在进入正题，首先是linux下文件这个概念，文件按理来说就是一堆存储在非易性存储器或者磁盘之类的存储器上的一堆被各种各样格式化的二进制，它是实实在在存储在硬件上的一种“信息”，早期的磁盘上有多个读写磁头，磁头上有线圈，而且有多个带有磁介质的盘面，然后多个磁头做切割磁感线运动产生感应电流（分为读，写磁头），由于磁介质的上的剩磁状态的方向不同，产生的感应电流的方向也不同，两个方向分别代表0，1，于是二进制信息被存储在一层层盘面上，每个盘面被不同半径的圆分成一条条磁道（圆环），而所谓扇区只是磁道上被均分的一部分。以上是文件在硬件介质上大致的存储方式，而在linux里面，为了方便对各种各样的文件各种各样的文件系统（不同的文件系统对存储器上的信息空间的组织和分布不同，如Ext3，FAT32这些）进行管理，采用了VFS（virtual filesystem switch）虚拟文件系统转换这个策略，VFS管理文件是按时管理的，只对已经挂载的文件系统进行管理，linux内核在启动时会自动在根目录下挂载该内核镜像中的文件系统（编译内核时决定），这句话是不是感觉有些绕，举个例子，本来的文件系统是个存储在磁盘的一个文件，该文件包含了对一个储存空间的分布信息（意思是里面分布有一些文件和文件夹）。

其实也就是镜像文件，如图所示，当你不挂载时，VFS不会“展开它”（其实也不知道从哪个目录项展开它，关于目录项之后会讲，现在你就当它是个目录），假设我们要将该文件系统文件挂载到part0，在挂载之前我们先看看当前文件系统下的part0这个目录文件下的信息。

用mount挂载该文件系统文件后（后面我将从内核角度分析这个过程）

你会看到该目录下的信息被改变了，其实key.txt这个文件还在磁盘里面，但是不是在当前这个文件系统的管理下你从当前文件系统就会“看不到了”，你可以umount一下就发现文件又回来了（其实只是那个目录项指向的超级块对象又重新指向之前的文件系统）其实这些文件名其实是linux内核抽象出来的inode对象里面的一个属性，通通都是对象，你并不是直接对该文件进行操作，而是通过一个结构体指针访问该文件然后进行操作，这些名字只是是存储在内存中的信息，每个“实实在在”的文件linux都有一个inode也叫索引节点对象，每个索引节点对象都有一个索引节点号，内核就是通过这个标识一个文件，而不是文件名，也就是说整个文件的内容和信息被内核隔离开来了，内核通过它给文件标记的信息来管理文件。

VFS下与文件相关的大致有以下对象：

1.文件系统对象在内核里面叫super_block，每个文件系统在挂载时都对应一个超级块对象，一个文件系统可以被多次挂载，可以有多个超级块对象，下面是内核源代码

1349struct super_block {<span style="color:#FF0000;">1350        struct list_head        s_list;         /* Keep this first *///内核维持了个超级对象双向链表</span>1351        dev_t                   s_dev;          /* search index; _not_ kdev_t */设备表示符1352        unsigned char           s_blocksize_bits;//块大小，以位为单位，用过dd命令的就知道文件输入输出一般用块来表示1353        unsigned long           s_blocksize;//块大小，以字节为单位1354        loff_t                  s_maxbytes;     /* Max file size */1355        struct file_system_type *s_type;//指向该超级块对象的文件系统类型1356        const struct super_operations   *s_op;//超级块方法1357        const struct dquot_operations   *dq_op;//磁盘限额方法1358        const struct quotactl_ops       *s_qcop;1359        const struct export_operations *s_export_op;1360        unsigned long           s_flags; //登录标志1361        unsigned long           s_iflags;       /* internal SB_I_* flags */1362        unsigned long           s_magic;1363        struct dentry           *s_root;  //超级块对象登录的目录项指针1364        struct rw_semaphore     s_umount;//这是卸载信号量1365        int                     s_count;//超级块引用计数1366        atomic_t                s_active;1367#ifdef CONFIG_SECURITY1368        void                    *s_security;//安全模块1369#endif1370        const struct xattr_handler **s_xattr;13711372        const struct fscrypt_operations *s_cop;13731374        struct hlist_bl_head    s_anon;         /* anonymous dentries for (nfs) exporting */1375        struct list_head        s_mounts;       /* list of mounts; _not_ for fs use */1376        struct block_device     *s_bdev;1377        struct backing_dev_info *s_bdi;1378        struct mtd_info         *s_mtd;1379        struct hlist_node       s_instances;1380        unsigned int            s_quota_types;  /* Bitmask of supported quota types */1381        struct quota_info       s_dquot;        /* Diskquota specific options */13821383        struct sb_writers       s_writers;13841385        char s_id[32];                          /* Informational name */1386        u8 s_uuid[16];                          /* UUID */13871388        void                    *s_fs_info;     /* Filesystem private info *///文件系统信息1389        unsigned int            s_max_links;1390        fmode_t                 s_mode;13911392        /* Granularity of c/m/atime in ns.1393           Cannot be worse than a second */1394        u32                s_time_gran;13951396        /*1397         * The next field is for VFS *only*. No filesystems have any business1398         * even looking at it. You had been warned.1399         */1400        struct mutex s_vfs_rename_mutex;        /* Kludge */14011402        /*1403         * Filesystem subtype.  If non-empty the filesystem type field1404         * in /proc/mounts will be "type.subtype"1405         */1406        char *s_subtype;14071408        /*1409         * Saved mount options for lazy filesystems using1410         * generic_show_options()1411         */1412        char __rcu *s_options;1413        const struct dentry_operations *s_d_op; /* default d_op for dentries *///目录项方法14141415        /*1416         * Saved pool identifier for cleancache (-1 means none)1417         */1418        int cleancache_poolid;14191420        struct shrinker s_shrink;       /* per-sb shrinker handle */14211422        /* Number of inodes with nlink == 0 but still referenced */1423        atomic_long_t s_remove_count;14241425        /* Being remounted read-only */1426        int s_readonly_remount;14271428        /* AIO completions deferred from interrupt context */1429        struct workqueue_struct *s_dio_done_wq;1430        struct hlist_head s_pins;14311432        /*1433         * Keep the lru lists last in the structure so they always sit on their1434         * own individual cachelines.1435         */1436        struct list_lru         s_dentry_lru ____cacheline_aligned_in_smp;1437        struct list_lru         s_inode_lru ____cacheline_aligned_in_smp;1438        struct rcu_head         rcu;1439        struct work_struct      destroy_work;14401441        struct mutex            s_sync_lock;    /* sync serialisation lock *///超级块对象互斥锁14421443        /*1444         * Indicates how deep in a filesystem stack this SB is1445         */1446        int s_stack_depth;14471448        /* s_inode_list_lock protects s_inodes */1449        spinlock_t              s_inode_list_lock ____cacheline_aligned_in_smp;<span style="color:#FF0000;">1450        struct list_head        s_inodes;       /* all inodes *///包含该超级块对象下所有文件的索引结点，这一项指向的是头结点</span>1451};

今天我不考虑那些高级属性（其实我还没咋研究：）），考虑到红字标识的部分，脑袋应该大致有个数据结构模型，首先各个超级块对象被双向链表连在一起，内核维护那个链表，每个超级块对象通过s_inodes将在它之下的所有文件联系在一起，有点类似于文件系统的文件是个基类，该超级块对象的文件系统类型信息通过s_type指针获得，我们用mount创造该类的一个对象并挂载在某个目录项下面，

struct dentry           *s_root

这个就是指向挂载的目录项的指针。而且我们对超级块对象的操作方法也被封装在s_op指向的操作函数的结构体里面，这有点类似于java的方法实例，我这不作讨论。你看到文章后面就会发现，文件的索引节点信息的数据结构里面有一项指向它所属的超级块对象，目录项里面也有，这就成了一个闭合的圆圈，于是诞生了如此智能的文件管理策略。

2.文件的索引节点对象在内核里面是inode

 602struct inode { 603        umode_t                 i_mode;//用于权限控制 604        unsigned short          i_opflags;//打开状态标记 605        kuid_t                  i_uid;//即user id用户id 606        kgid_t                  i_gid;//即group id用户组id 607        unsigned int            i_flags;//文件系统标志 608 609#ifdef CONFIG_FS_POSIX_ACL 610        struct posix_acl        *i_acl; 611        struct posix_acl        *i_default_acl; 612#endif 613 614        const struct inode_operations   *i_op;//索引结点操作 <span style="color:#FF0000;">615        struct super_block      *i_sb;//指向该索引节点对应的文件所属的超级块对象</span> 616        struct address_space    *i_mapping;//地址映射 617 618#ifdef CONFIG_SECURITY 619        void                    *i_security;//安全模块 620#endif 621 622        /* Stat data, not accessed from path walking */ 623        unsigned long           i_ino; 624        /* 625         * Filesystems may only read i_nlink directly.  They shall use the 626         * following functions for modification: 627         * 628         *    (set|clear|inc|drop)_nlink 629         *    inode_(inc|dec)_link_count 630         */ 631        union { 632                const unsigned int i_nlink;//硬链接数目 633                unsigned int __i_nlink; 634        }; 635        dev_t                   i_rdev; 636        loff_t                  i_size;//以字节为单位 637        struct timespec         i_atime;//最近访问时间 638        struct timespec         i_mtime;//最近修改时间 639        struct timespec         i_ctime;//最近一次inode的改变时间 640        spinlock_t              i_lock; /* i_blocks, i_bytes, maybe i_size *///自旋锁 641        unsigned short          i_bytes;//大小，以字节为单位 642        unsigned int            i_blkbits; 643        blkcnt_t                i_blocks; 644 645#ifdef __NEED_I_SIZE_ORDERED 646        seqcount_t              i_size_seqcount; 647#endif 648 649        /* Misc */ 650        unsigned long           i_state; 651        struct rw_semaphore     i_rwsem; 652 653        unsigned long           dirtied_when;   /* jiffies of first dirtying */ 654        unsigned long           dirtied_time_when; 655 656        struct hlist_node       i_hash;//散列表 657        struct list_head        i_io_list;      /* backing dev IO list */ 658#ifdef CONFIG_CGROUP_WRITEBACK 659        struct bdi_writeback    *i_wb;          /* the associated cgroup wb */ 660 661        /* foreign inode detection, see wbc_detach_inode() */ 662        int                     i_wb_frn_winner; 663        u16                     i_wb_frn_avg_time; 664        u16                     i_wb_frn_history; 665#endif 666        struct list_head        i_lru;          /* inode LRU list */ 667        struct list_head        i_sb_list; 668        union { 669                struct hlist_head       i_dentry; 670                struct rcu_head         i_rcu; 671        }; 672        u64                     i_version;//版本号 673        atomic_t                i_count; 674        atomic_t                i_dio_count; 675        atomic_t                i_writecount; 676#ifdef CONFIG_IMA 677        atomic_t                i_readcount; /* struct files open RO */ 678#endif 679        const struct file_operations    *i_fop; /* former ->i_op->default_file_ops */ 680        struct file_lock_context        *i_flctx; 681        struct address_space    i_data; 682        struct list_head        i_devices; 683        union { 684                struct pipe_inode_info  *i_pipe;//管道标志 685                struct block_device     *i_bdev; 686                struct cdev             *i_cdev; 687                char                    *i_link; 688                unsigned                i_dir_seq; 689        }; 690 691        __u32                   i_generation; 692 693#ifdef CONFIG_FSNOTIFY 694        __u32                   i_fsnotify_mask; /* all events this inode cares about */ 695        struct hlist_head       i_fsnotify_marks; 696#endif 697 698#if IS_ENABLED(CONFIG_FS_ENCRYPTION) 699        struct fscrypt_info     *i_crypt_info; 700#endif 701 702        void                    *i_private; /* fs or device private pointer */ 703};

由上面的内核代码可以看出文件索引节点包含了文件的很多信息，这些信息通过ls可以看到。

从上图可以看到ls -l看到的信息其实就是inode结构体里面的一些项的值，在这里我为一个img镜像文件创建了个硬链接，可以看到两者的inode的值是一样的，它们分别指向磁盘的同一个文件，只是名称不同而已，ls -l的第二项可以看到此文件的硬链接数目，类似于java基于一个基类创建了2个对象，对象名称不同而已。想一下之前的第一个例子，那个key.txt为啥消失了，这就是一种针对文件系统超级块对象的封装，先有文件系统才有文件，你看一个文件必须通过一个文件系统，这是父与子的关系，文件索引节点里面的

struct super_block      *i_sb

指向该文件的所属的文件系统超级块对象（i sb。。这名字还是挺深刻的）。这保证了文件系统的独立性，不是它的儿子一概不认，哪怕目录项相同。和超级块对象一样，索引节点方法我就不细说了，具体可以自己看看源代码:点击打开链接

3.目录项对象，这个对象是内核为了方便管理文件系统而产生。

  83struct dentry {  84        /* RCU lookup touched fields */  85        unsigned int d_flags;           /* protected by d_lock */  86        seqcount_t d_seq;               /* per dentry seqlock *///单目录锁  87        struct hlist_bl_node d_hash;    /* lookup hash list *///散列表  88        struct dentry *d_parent;        /* parent directory *//父目录指针  89        struct qstr d_name; <span style="color:#FF0000;"> 90        struct inode *d_inode;          /* Where the name belongs to - NULL is  91                                         * negative *///与该目录项相关联的索引节点</span>  92        unsigned char d_iname[DNAME_INLINE_LEN];        /* small names */  93  94        /* Ref lookup also touches following */  95        struct lockref d_lockref;       /* per-dentry lock and refcount */  96        const struct dentry_operations *d_op;<span style="color:#FF0000;">  97        struct super_block *d_sb;       /* The root of the dentry tree *///指向所属的超级块对象</span>  98        unsigned long d_time;           /* used by d_revalidate */  99        void *d_fsdata;                 /* fs-specific data */ 100 101        union { 102                struct list_head d_lru;         /* LRU list */ 103                wait_queue_head_t *d_wait;      /* in-lookup ones only */ 104        }; 105        struct list_head d_child;       /* child of parent list */ 106        struct list_head d_subdirs;     /* our children */ 107        /* 108         * d_alias and d_rcu can share memory 109         */ 110        union { 111                struct hlist_node d_alias;      /* inode alias list */ 112                struct hlist_bl_node d_in_lookup_hash;  /* only for in-lookup ones */ 113                struct rcu_head d_rcu; 114        } d_u; 115};

整个目录项感觉就是一颗类似于树的结构，有父节点，子节点，而且通过d_sb指回了超级块对象，目录项不像inode一样是存放在磁盘上的静态数据结构，它是放在内存中的，一个inode文件在运行的时候可以链接多个dentry，而d_count记录了这个链接的数量，这其实就是硬链接，硬链接所在的相对于根目录的位置不同，但是都是指向同一个文件，删除其中一个不对磁盘上的文件造成影响，所有都被删除时内核才会对磁盘上的文件进行删除，不存在inode值相同但是指向不同超级块对象的文件，就像一个儿子不能有两个父亲，但是一个父亲可以有两个双胞胎或者多胞胎的儿子一样，故硬链接不能跨文件系统。而软连接产生的文件是不同文件，其inode的值不同，它是根据目录项来链接文件的，故可以在甚至还未存在文件的时候进行链接（只是没有生效），由于不同文件系统相对于某个目录项的位置不同，软链接是可以跨文件系统的，就像第一个例子的key.txt一样。其实还有目录项缓存我没讲，那是个加速用户访问的措施。

现在我们来看下在mount.h中的一部分内核源码：

  66struct vfsmount {  67        struct dentry *mnt_root;        /* root of the mounted tree */  68        struct super_block *mnt_sb;     /* pointer to superblock */  69        int mnt_flags;  70};

现在我相信你应该能理解mount的工作形式了，monut根据文件系统文件创造了个超级块对象，这个超级块对象的根目录指向mount后面指定的目录项。flags指定了mount的模式，这样可以进行读写保护。

4.文件对象，这个对象是文件被进程打开后，为了实现和管理多个进程同时调用一个文件，以及一个进程里面一个执行流多次调用某个文件而产生的，在每个进程描述符task_struct中都有个fs指向文件描述符指针。由于篇幅原因，这个暂时不细讲。

小结：

从上面的描述中你应该感受到了linux文件系统的神奇的构造，一层一层环环相扣，超级块对象，目录项，inode里面都有指针实现互指，屏蔽了磁盘上的硬件细节。并且linux采用的对象机制加快了文件访问，减少了磁盘空间的消耗。

0 0