简析PPC的Device Tree机制

来源：互联网发布：js控制div背景颜色编辑：程序博客网时间：2024/05/16 09:40

年底完成了公司设备从arm到ppc的移植，有很多心得需要总结，趁年后不是很忙，整理写下来。
自己也是第一次接触ppc架构的kernel（版本号：3.4.55），很多东西学习不够深入，只写个思路框架，不去深究细节，错误地方还望大家指正。
今天首先来总结下PPC的Device Tree设备树机制，之前在移植arm的uboot以及kernel时，uboot和kernel之前的传参机制在arm架构下是可以选择的，使用tags方式还是fdt方式（flattened device tree）。我选择使用tags，之前有总结过tags的传参方式，可以参考我的另一篇文章，链接如下：
http://blog.csdn.net/skyflying2012/article/details/35787971
但是阅读了PPC架构的kernel启动代码后，发现PPC架构kernel启动传参仅支持fdt方式，趁这个机会学习下fdt机制。
1 为什么要用FDT，FDT优点是什么。
从网上找到的官方解释如下：
IBM、Sun等厂家的服务器最初都采用了Firmware（一种嵌入到硬件设备中的程序，用于提供软件和硬件之间的接口），用于初始化系统配置，提供操作系统软件和硬件之间的接口，启动和运行系统。后来为了标准化和兼容性，IBM、Sun等联合推出了固件接口IEEE 1275标准，让他们的服务器如IBM PowerPCpSeries，Apple PowerPC，Sun SPARC等均采用Open Firmware，在运行时构建系统硬件的设备树信息传递给内核，进行系统的启动运行。这样做的好处有，减少内核对系统硬件的严重依赖，利于加速支持包的开发，降低硬件带来的变化需求和成本，降低对内核设计和编译的要求。
在嵌入式PowerPC中，一般使用U-Boot之类的系统引导代码，而不采用Open Firmware。早期的U-Boot使用include/asm-ppc/u-boot.h中的静态数据结构struct bd_t将板子基本信息传递给内核，其余的由内核处理。这样的接口不够灵活，硬件发生变化就需要重新定制编译烧写引导代码和内核，而且也不再适应于现在的内核。为了适应内核的发展及嵌入式PowerPC平台的千变万化，吸收标准OpenFirmware的优点，UBoot引入了扁平设备树FDT这样的动态接口，使用一个单独的FDT blob（二进制大对象，是一个可以存储二进制文件的容器）存储传递给内核的参数，一些确定信息，例如cache大小、中断路由等直接由设备树提供，而其他的信息，例如eTSEC的MAC地址、频率、PCI总线数目等由U-Boot在运行时修改。

我的理解是为了适应灵活的嵌入式平台，FDT将一些固定人为需要修改的参数信息从uboot和kernel中（如uboot下的bd_t）剥离出来，修改硬件后，不需要重新修改烧录uboot kernel，仅需要修改FDT文件即可完成对新硬件的支持。但是有一些动态修改的信息还是需要uboot以及kernel来操作，如cmdline，usb以及pci的枚举设备信息。
对比而言，arm下使用的tags方式就是需要对uboot中的tags（如mem大小等）进行修改，完成对新硬件的支持。
2 FDT怎么用，格式是什么。
FDT设备树我们可以看做是描述设备硬件配置的线性树形数据结构，开发人员需要根据设备硬件配置来编写设备树，设备树的编写提供一套完全可视化的文本形式dts（device tree source），然后利用dtc（device tree compiler）编译成kernel需要的设备数镜像文件dtb，d t c 编译器会对输入文件进行语法和语义检查，并根据L i n u x 内核的要求检查各节点及属性，将设备树源码文件（. d t s ）编译二进制文件（. d t b ），以保证内核能正常启动，一个简单的例子如下：

/ {    #address-cells = <1>;    #size-cells = <1>;    model = "test";    compatible = "test";    dcr-parent = <&{/cpus/cpu@0}>;    cpus {        #address-cells = <1>;        #size-cells = <0>;        cpu@0 {            device_type = "cpu";            model = "PowerPC,460EX";            reg = <0x00000000>;            i-cache-line-size = <32>;            d-cache-line-size = <32>;            i-cache-size = <32768>;            d-cache-size = <32768>;            dcr-controller;            dcr-access-method = "native";        };    };    memory {        device_type = "memory";        reg = <0x80000000 0x40000000>;    };    chosen {        name = "chosen";        bootargs = "console=ttyS0,115200 mem=512M rdinit=/sbin/init";    };};

这是我移植kernel时根据kernel下提供的dts文件修改的，kernel下已经有很多设备的dts文件，在arch/powerpc/boot/dts下，并且也集成了dtc编译器，我上面的dts文件是arch/powerpc/boot/dts/test.dts,则我可以在kernel下运行如下命令：

make test.dtb

就可以生成对应的dtb镜像。
对于开发人员来说，直接面对的是dts文件，下来就来说下dts文件的格式：
（dts格式网上有很多详细解释，并且在kernel下也有详细说明的文档，是Documentation/devicetree/booting-without-of.txt）
1 根节点
设备树的起始点称之为根节点” / ” 。属性m o d e l 指明了目标板平台或模块的名称，属性c o m p a t i b l e 值指明和目标板为同一系列的兼容的开发板名称。对于大多数3 2 位平台，属性# a d d r e s s - c e l l s 和# s i z e - c e l l s 的值一般为1 ，address-cells和size-cells分别定义了子节点地址和长度的宽度。
2 CPU节点
/ c p u s 节点是根节点的子节点，对于系统中的每一个C P U ，都有相应的节点。/ c p u s 节点没有必须指明的属性，但指明# a d d r e s s - c e l l s = < 1 > 和 # s i z e - c e l l s = < 0 > 是个好习惯，这同时指明了每个C P U 节点的r e g 属性格式，方便为物理C P U 编号。C P U 节点的单元名应该是c p u @ 0 的格式，此节点一般要指定d e v i c e _ t y p e （固定为” c p u ” ），一级数据/ 指令缓存的表项大小，一级数据/ 指令缓存的大小，核心、总线时钟频率等。在上面的示例中通过系统引导代码动态填写时钟频率相关项。
3 系统内存节点
此节点用于描述目标板上物理内存范围，一般称作/ m e m o r y 节点，可以有一个或多个。当有多个节点时，需要后跟单元地址予以区分；只有一个单元地址时，可以不写单元地址，默认为0 。
此节点包含板上物理内存的属性，一般要指定d e v i c e _ t y p e （固定为” m e m o r y ” ）和r e g 属性。其中r e g 的属性值以< 起始地址空间大小> 的形式给出，如上示例中目标板内存起始地址为0x80000000 ，大小为1G字节。
4 /chosen节点
这个节点有一点特殊。通常，这里由O p e n F i r m w a r e 存放可变的环境信息，例如参数，默认输入输出设备。
这个节点中一般指定b o o t a r g s 及l i n u x , s t d o u t - p a t h 属性值。b o o t a r g s 属性设置为传递给内核命令行的参数字符串。l i n u x , s t d o u t - p a t h 常常为标准终端设备的节点路径名，内核会以此作为默认终端。U - B o o t 在1 . 3 . 0 版本后添加了对扁平设备树F D T 的支持，U - B o o t 加载L i n u x 内核、R a m d i s k 文件系统（如果使用的话）和设备树二进制镜像到物理内存之后，在启动执行L i n u x 内核之前，它会修改设备树二进制文件。它会填充必要的信息到设备树中，例如M A C 地址、P C I 总线数目等。U - B o o t 也会填写设备树文件中的“/ c h o s e n ”节点，包含了诸如串口、根设备（R a m d i s k 、硬盘或N F S 启动）等相关信息。U - B o o t 源码c o m m o n / c m d _ b o o t m . c 的如下代码，显示了在执行内核代码前将调用f t _ s e t u p 函数填写设备树。
dts中最多的是SOC上的外设硬件配置，因为我在移植中为了保证原来原先依赖于arm框架的代码不变（没有使用FDT），模块driver中尽量不用设备树，所以dts中没有写外设硬件配置，这个有时间再去仔细研究。

3 dtb镜像的存储格式
现在学习代码，已经不像刚毕业那会对于任何代码都会死抠细节，而是想观其大略，了解其框架，待需要细究时在仔细研究，我想这也是一种进步，能让自己在kernel星辰大海中更加从容一点。
学习代码，我一直追求弄明白原因（为什么这样做）和方法（如何做）。
首先来看dtc编译dts生成的dtb镜像文件是什么格式的。
1 设备树主要由三大部分组成：头（H e a d e r ）、结构块（S t r u c t u r e b l o c k ）、字符串块（S t r i n g s b l o c k ）。在内存中分配图如下：
这里写图片描述

头主要描述设备树的基本信息，如设备树魔数标志、设备树块大小、结构块的偏移地址等，其具体结构b o o t _ p a r a m _ h e a d e r 如下。这个结构中的值都是以大端模式表示，并且偏移地址是相对于设备树头的起始地址计算的。

/* * This is what gets passed to the kernel by prom_init or kexec * * The dt struct contains the device tree structure, full pathes and * property contents. The dt strings contain a separate block with just * the strings for the property names, and is fully page aligned and * self contained in a page, so that it can be kept around by the kernel, * each property name appears only once in this page (cheap compression) * * the mem_rsvmap contains a map of reserved ranges of physical memory, * passing it here instead of in the device-tree itself greatly simplifies * the job of everybody. It's just a list of u64 pairs (base/size) that * ends when size is 0 */struct boot_param_header {    __be32  magic;          /* magic word OF_DT_HEADER */    __be32  totalsize;      /* total size of DT block */    __be32  off_dt_struct;      /* offset to structure */    __be32  off_dt_strings;     /* offset to strings */    __be32  off_mem_rsvmap;     /* offset to memory reserve map */    __be32  version;        /* format version */    __be32  last_comp_version;  /* last compatible version */    /* version 2 fields below */    __be32  boot_cpuid_phys;    /* Physical CPU id we're booting on */    /* version 3 fields below */    __be32  dt_strings_size;    /* size of the DT strings block */    /* version 17 fields below */    __be32  dt_struct_size;     /* size of the DT structure block */};

2 结构块（structure block）
扁平设备树结构块是线性化的树形结构，和字符串块一起组成了设备树的主体，以节点形式保存目标板的
设备信息。在结构块中，节点起始标志为3 2 位常值宏O F _ D T _ B E G I N _ N O D E ，节点结束标志为宏O F _ D T _ E N D _ N O D E ；子节点定义在节点结束标志前。一个节点的基本结构如下所示：
1 . 节点起始标志O F _ D T _ B E G I N _ N O D E （即0 x 0 0 0 0 _ 0 0 0 1 ）;
2 . 节点路径或者节点单元名（v e r s i o n < 3 以及节点路径表示，v e r s i o n > 1 6 时以节点单元名表示）；
3 . 填充字节保证四字节对齐；
4 . 节点属性。每个属性以常值宏O F _ D T _ P R O P 开始，后面依次为属性值的字节长度、属性名在在字符串块
中的偏移值、属性值及字节对齐填充段；
5 . 如果存在子节点，则定义子节点。
6 . 节点结束标志O F _ D T _ E N D _ N O D E （即0 x 0 0 0 0 _ 0 0 0 2 ）。
归纳起来，一个节点可以概括为以O F _ D T _ B E G I N _ N O D E 开始，节点路径、属性列表、子节点列表以及
O F _ D T _ E N D _ N O D E 结束的序列，每一个子节点自身也是类似的结构。
3 字符串块（Strings block）
为了节省空间，对于那些属性名，尤其是很多属性名是重复冗余出现的，提取出来单独存放到字符串块。
这个块中包含了很多有结束标志的属性名字符串。在设备树的结构块中存储了这些字符串的偏移地址，因
为可以很容易的查找到属性名字符串。字符串块的引入节省嵌入式系统较为紧张的存储空间。

4 kernel如何解析FDT
我们利用dtc编译了dts文件生成dtb，那么kernel就会“反汇编”dtb，从而获取其中的配置信息，因此上面描述到的dtb文件存储格式都会在kernel的解析中体现出来。
dtb文件是独立于bootloader以及kernel存在的，dtb中的chosen节点需要uboot中进行填写，dtb镜像地址也由uboot传递给kernel，保存在r3寄存器中，但是由于我移植中dtb的chosen手动填写，并且不用uboot启动kernel，所以修改kernel启动代码，直接写死dtb的首地址，代码如下：

/* As with the other PowerPC ports, it is expected that when code * execution begins here, the following registers contain valid, yet * optional, information: * *   r3 - Board info structure pointer (DRAM, frequency, MAC address, etc.) *   r4 - Starting address of the init RAM disk *   r5 - Ending address of the init RAM disk *   r6 - Start of kernel command line string (e.g. "mem=128") *   r7 - End of kernel command line string * */    __HEAD_ENTRY(_stext);_ENTRY(_start);    /*     * Reserve a word at a fixed location to store the address     * of abatron_pteptrs     */    nop    #device tree phy addr    lis r3, 0x81000000@h    ori r3, r3, 0x81000000@l    mr  r31,r3      /* save device tree ptr */    li  r24,0       /* CPU number */

PPC架构kernel对FDT解析可以分为两部分：
第一步是早期解析，获取kernel启动必需的cmdline以及cpu mem等信息。
第二步是后期的完全解析，以供driver加载时获取对应配置信息使用。
由于移植中尽量让driver不使用FDT，所以今天主要分析早期解析过程，进入start kernel之前调用machine init
在arch/powerpc/kernel/setup_32.c中，machine init则调用early init devtree完成早期设备树的解析，在arch/powerpc/kernel/prom.c,代码如下：

void __init early_init_devtree(void *params){    phys_addr_t limit;    /* Setup flat device-tree pointer */    initial_boot_params = params;#ifdef CONFIG_PPC_RTAS    /* Some machines might need RTAS info for debugging, grab it now. */    of_scan_flat_dt(early_init_dt_scan_rtas, NULL);#endif#ifdef CONFIG_PPC_POWERNV    /* Some machines might need OPAL info for debugging, grab it now. */    of_scan_flat_dt(early_init_dt_scan_opal, NULL);#endif#ifdef CONFIG_FA_DUMP    /* scan tree to see if dump is active during last boot */    of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL);#endif    /* Pre-initialize the cmd_line with the content of boot_commmand_line,     * which will be empty except when the content of the variable has     * been overriden by a bootloading mechanism. This happens typically     * with HAL takeover     */    strlcpy(cmd_line, boot_command_line, COMMAND_LINE_SIZE);    /* Retrieve various informations from the /chosen node of the     * device-tree, including the platform type, initrd location and     * size, TCE reserve, and more ...     */    of_scan_flat_dt(early_init_dt_scan_chosen_ppc, cmd_line);    /* Scan memory nodes and rebuild MEMBLOCKs */    of_scan_flat_dt(early_init_dt_scan_root, NULL);    of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL);    /* Save command line for /proc/cmdline and then parse parameters */    strlcpy(boot_command_line, cmd_line, COMMAND_LINE_SIZE);    parse_early_param();    /* make sure we've parsed cmdline for mem= before this */    if (memory_limit)        first_memblock_size = min(first_memblock_size, memory_limit);    setup_initial_memory_limit(memstart_addr, first_memblock_size);    /* Reserve MEMBLOCK regions used by kernel, initrd, dt, etc... */    memblock_reserve(PHYSICAL_START, __pa(klimit) - PHYSICAL_START);    /* If relocatable, reserve first 32k for interrupt vectors etc. */    if (PHYSICAL_START > MEMORY_START)        memblock_reserve(MEMORY_START, 0x8000);    reserve_kdump_trampoline();#ifdef CONFIG_FA_DUMP    /*     * If we fail to reserve memory for firmware-assisted dump then     * fallback to kexec based kdump.     */    if (fadump_reserve_mem() == 0)#endif        reserve_crashkernel();    early_reserve_mem();    /*     * Ensure that total memory size is page-aligned, because otherwise     * mark_bootmem() gets upset.     */    limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE);    memblock_enforce_memory_limit(limit);    memblock_allow_resize();    memblock_dump_all();    DBG("Phys. mem: %llx\n", memblock_phys_mem_size());    /* We may need to relocate the flat tree, do it now.     * FIXME .. and the initrd too? */    move_device_tree();    allocate_pacas();    DBG("Scanning CPUs ...\n");    /* Retrieve CPU related informations from the flat tree     * (altivec support, boot CPU ID, ...)     */    of_scan_flat_dt(early_init_dt_scan_cpus, NULL);#if defined(CONFIG_SMP) && defined(CONFIG_PPC64)    /* We'll later wait for secondaries to check in; there are     * NCPUS-1 non-boot CPUs  :-)     */    spinning_secondaries = boot_cpu_count - 1;#endif    DBG(" <- early_init_devtree()\n");}

调用of_scan_flat_dt来遍历dtb中所有节点，调用解析函数early_init_dt_scan_chosen_ppc early_init_dt_scan_mem_ppc early_init_dt_scan_root early_init_dt_scan_cpus，分别获取chosen mem cpus节点信息，完成早期cmdline mem cpu的操作。我们来看一个mem的解析函数，代码如下：

int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,                     int depth, void *data){    unsigned long l;    char *p;    pr_debug("search \"chosen\", depth: %d, uname: %s\n", depth, uname);    if (depth != 1 || !data ||        (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))        return 0;    early_init_dt_check_for_initrd(node);    /* Retrieve command line */    p = of_get_flat_dt_prop(node, "bootargs", &l);    if (p != NULL && l > 0)        strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));    /*     * CONFIG_CMDLINE is meant to be a default in case nothing else     * managed to set the command line, unless CONFIG_CMDLINE_FORCE     * is set in which case we override whatever was found earlier.     */#ifdef CONFIG_CMDLINE#ifndef CONFIG_CMDLINE_FORCE    if (!((char *)data)[0])#endif        strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);#endif /* CONFIG_CMDLINE */    pr_debug("Command line is: %s\n", (char*)data);    /* break now */    return 1;}

对于fdt的处理函数主要在arch/powerpc/kernel/prom.c以及driver/of/fdt.c中。

与之前文章分析tags解析方式对比，可以看出FDT的解析跟tags解析的差别之处在于，
tags是采用注册回调函数方式，解析什么类型tags，则调用该类型对应处理函数。
fdt是采用遍历整个设备树，在处理函数中判断是否是所需要解析的内容，然后进行处理。

0 0