ubi patch for MLC nand power loss (1)
来源:互联网 发布:买淘宝店铺多少钱 编辑:程序博客网 时间:2024/04/29 00:34
最近要出一个关于mlc nand 的powe loss的patch,我们知道,对于mlc nand,ubifs是没法用的,因为如果有powerloss出现,则必会破环原有的数据,如晨ubi的网站:
UBIFS authors never tested UBI/UBIFS on MLC flash devices. Let's considersome specific aspects of MLC NAND flashes:
- [OK] MLC NAND flashes are more "faulty" than SLC, so they usestronger ECC codes; these ECC codes often occupy whole OOB area (as dothe ECC codes on some newer SLC flashes, which are more error-prone thanprevious generations of flash); this is not a problem for UBI/UBIFS,because neither UBIFS nor UBI use OOB area;
- [OK] when the data are written to an eraseblock, they have tobe written sequentially, from the beginning of the eraseblock to the endof it; this is also not a problem because it is exactly what UBI and UBIFSdo (see alsothis section);
- [OK] MLC flashes have rather short eraseblock life-cycles ofjust a few thousand erase cycles; this is not a problem because UBI uses adeterministic wear-leveling algorithm. However, the default 4096 eraseswear-levelling threshold may need to be lessened for MLC.
- [NEED WORK] MLC flashes exhibit bit-flips as a result of"program disturb" and "read disturb" errors (seehere).These errors are sometimes referred to as "reversible" errors in NANDdatasheets, meaning that they disappear once the block in which theyare located is erased; as opposed to "irreversible" errors which aredue to cell wear and cause permanent bit failures.Note that SLC flashes have these same errors, but they are much morecommon on MLC:
- NAND flashes have a so called "read-disturb" property, whichmeans that a NAND page read operation may introduce a persistentbit change, not necessarily located in the page being read;the ECC code would fix it, but more read operationsmay introduce more bit changes and correctable ECC errors may turninto uncorrectable ECC errors; however, when these errors occuron the same page that is being read, this should not be a problembecause UBI is doing scrubbing; in other words, once UBI noticesthat there is a correctable bit-flip in an eraseblock, it movesthe contents of this physical eraseblock to a different physicaleraseblock, and re-maps the corresponding logical eraseblock tothe new physical eraseblock; so UBI refreshes the data and getsrid of bit-flips, thus improving data integrity.
- "Read-disturb" errors can also occur on a page otherthat the one being read, but which is within the sameeraseblock. This is not a problem if page read operations arespread around somewhat evenly within the eraseblock, since thebit-flip will soon be detected and corrected through the"scrubbing" process described above. However, if a particularpage within a block is rarely read, scrubbing will not have achance to fix errors, and they may accumulate over time untilthey become unfixable. This is very similar the next problem.
- NAND flashes also have a "program-disturb" property,which means that if you program a NAND page, you may introducea bit-flip in a different NAND page. The bit change can befixed by ECC, but with time the changes may accumulateand become unfixable. Current UBI bit-flip handling onlypartially helps here, because it is passive, which means thatUBI notices bit-flips only when performing users' read requests.So if you never read the NAND page which accumulates bit-flips,UBI will never notice this.
The read and program disturb issues should be possible to handle byimplementing a kind of "flash crawler" which would read all of the NANDpages in the background from time to time (at UBI level) making UBInotice and fix bit-flips. This is not implemented though, and this canprobably be done from user-space.
- [NEED WORK] There is another aspect of MLC flashes which mayneed closer attention: the "paired pages" problem (e.g., seethisPower Point presentation). Namely, MLC NAND pages are coupled in asense that if you cut power while writing to a page, you corrupt notonly this page, but also one of the previous pages which is paired withthe current one. For example, pages 0 and 3, 1 and 4, 2 and 5, 3 and 6in and so on (in the same eraseblock) may be paired (page distance is4, but there may be other distances). So if you write data to, say,page 3 and cut the power, you may end up with corrupted data in page 0.UBIFS is not ready to handle this problem at the moment and this needssome work.
UBIFS can handle this problem by avoiding using the rest of freespace in LEBs after a sync or commit operation. E.g., if start writingto a new journal LEB, and then have a sync or commit, we should "waste"some amount of free space in this LEB to make sure that the previouspaired page does not contain synced data. This way we guarantee thata power cut will not corrupt the synced or committed data. And the"wasted" free space can be re-used after that LEB has beengarbage-collected. Similar to all the other LEBs we write to (LPT, log,orphan, etc). This would require some work and would make UBIFS slower,so this should probably be optional. The way to attack this issue is toimprove UBIFS power cut emulation and implement "paired-pages"emulation, then use the
integck
test for testing. Afterall the issues are fixed, real power-cut tests could be carriedout. - [NEED WORK] The "unstable bits issue", which is notMLC-specific, describedhere.
关于这一个问题,有一个办法就是在每次写upper page是去对lower page做一次backup,在每次上电时,做一个recover工作。
但对于新版本的内核,为什么之前写的关于这个的一patch,在每次获到rwlock时,出现下面的问题,而且只会第一出现,
=============================================[ INFO: possible recursive locking detected ]3.14.0-xilinx-00012-gfba9419-dirty #111 Not tainted---------------------------------------------ubirmvol/504 is trying to acquire lock: (&le->mutex){+.+...}, at: [<c02edbe4>] leb_write_lock+0x18/0x20but task is already holding lock: (&le->mutex){+.+...}, at: [<c02edbe4>] leb_write_lock+0x18/0x20other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&le->mutex); lock(&le->mutex); *** DEADLOCK *** May be due to missing lock nesting notation2 locks held by ubirmvol/504: #0: (&ubi->device_mutex){+.+...}, at: [<c02ebd60>] ubi_cdev_ioctl+0x2d4/0x90c #1: (&le->mutex){+.+...}, at: [<c02edbe4>] leb_write_lock+0x18/0x20stack backtrace:CPU: 1 PID: 504 Comm: ubirmvol Not tainted 3.14.0-xilinx-00012-gfba9419-dirty #111[<c001e2a4>] (unwind_backtrace) from [<c0019d38>] (show_stack+0x10/0x14)[<c0019d38>] (show_stack) from [<c041b294>] (dump_stack+0x84/0xd4)[<c041b294>] (dump_stack) from [<c0061d6c>] (__lock_acquire+0x1cc0/0x1d58)[<c0061d6c>] (__lock_acquire) from [<c00622b8>] (lock_acquire+0x60/0x74)[<c00622b8>] (lock_acquire) from [<c041fde8>] (down_write+0x40/0x54)[<c041fde8>] (down_write) from [<c02edbe4>] (leb_write_lock+0x18/0x20)[<c02edbe4>] (leb_write_lock) from [<c02f7198>] (ubi_backup_data_to_backup_volume+0xf4/0x47c)[<c02f7198>] (ubi_backup_data_to_backup_volume) from [<c02f1180>] (ubi_io_write+0x340/0x6c4)[<c02f1180>] (ubi_io_write) from [<c02ee668>] (ubi_eba_write_leb+0x540/0x6b0)[<c02ee668>] (ubi_eba_write_leb) from [<c02e7560>] (ubi_change_vtbl_record+0xc8/0x12c)[<c02e7560>] (ubi_change_vtbl_record) from [<c02e8de8>] (ubi_remove_volume+0x100/0x1f0)[<c02e8de8>] (ubi_remove_volume) from [<c02ebd6c>] (ubi_cdev_ioctl+0x2e0/0x90c)[<c02ebd6c>] (ubi_cdev_ioctl) from [<c00de480>] (vfs_ioctl+0x18/0x34)[<c00de480>] (vfs_ioctl) from [<c00df0b8>] (do_vfs_ioctl+0x5b8/0x600)[<c00df0b8>] (do_vfs_ioctl) from [<c00df138>] (SyS_ioctl+0x38/0x54)[<c00df138>] (SyS_ioctl) from [<c00163c0>] (ret_fast_syscall+0x0/0x48)
这个问题.....................................................
- ubi patch for MLC nand power loss (1)
- nand ubi -1 nand基础
- MLC NAND
- MLC NAND调试(S3C6410)
- slc mlc tlc nand
- SLC和MLC NAND Flash
- nand ubi - 5 kernel和ubi
- nand ubi -2 ubi管理下的SLC nand
- NAND中MLC与SLC的差别
- ST7105不支持MLC的Nand Flash
- NAND Flash SLC、MLC技术解析
- NAND FLASH之MLC 与SLC差异
- nand ubi -4 kernel和mtd
- smpl Sudden momentary power loss
- SLC和MLC闪存nand flsah的区别和历史
- NAND FLASH 的MLC和SLC架构对比
- NAND FLASH的MLC和SLC架构区别
- SLC和MLC闪存nand flsah的区别和历史
- Android手机一键Root原理分析
- Javascript模块化编程(二):AMD规范
- SQLServer2012学习记录
- Java环境变量的配置
- iOS中KVC 和 KVO 简单介绍
- ubi patch for MLC nand power loss (1)
- MySQL 的 EXPLAIN 用法
- Java BASE64加密解密
- Javascript模块化编程(三):require.js的用法
- 将ImageView变成灰色竟是如此简单
- python发邮件实例(包括:文本、html、图片、附件、SSL、群邮件)
- 通过js控制radio类型按钮选中状态
- 调优案例分析
- Length of Last Word - LeetCode