Debugging hibernation and suspend

来源:互联网 发布:mysql可以没有主键吗 编辑:程序博客网 时间:2024/05/21 00:19
Debugging hibernation and suspend(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL1. Testing hibernation (aka suspend to disk or STD)To check if hibernation works, you can try to hibernate in the "reboot" mode:# echo reboot > /sys/power/disk# echo disk > /sys/power/stateand the system should create a hibernation image, reboot, resume and get back tothe command prompt where you have started the transition.  If that happens,hibernation is most likely to work correctly.  Still, you need to repeat thetest at least a couple of times in a row for confidence.  [This is necessary,because some problems only show up on a second attempt at suspending andresuming the system.]  Moreover, hibernating in the "reboot" and "shutdown"modes causes the PM core to skip some platform-related callbacks which on ACPIsystems might be necessary to make hibernation work.  Thus, if your machine failsto hibernate or resume in the "reboot" mode, you should try the "platform" mode:# echo platform > /sys/power/disk# echo disk > /sys/power/statewhich is the default and recommended mode of hibernation.Unfortunately, the "platform" mode of hibernation does not work on some systemswith broken BIOSes.  In such cases the "shutdown" mode of hibernation mightwork:# echo shutdown > /sys/power/disk# echo disk > /sys/power/state(it is similar to the "reboot" mode, but it requires you to press the powerbutton to make the system resume).If neither "platform" nor "shutdown" hibernation mode works, you will need toidentify what goes wrong.a) Test modes of hibernationTo find out why hibernation fails on your system, you can use a special testingfacility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,there is the file /sys/power/pm_test that can be used to make the hibernationcore run in a test mode.  There are 5 test modes available:freezer- test the freezing of processesdevices- test the freezing of processes and suspending of devicesplatform- test the freezing of processes, suspending of devices and platform  global control methods(*)processors- test the freezing of processes, suspending of devices, platform  global control methods(*) and the disabling of nonboot CPUscore- test the freezing of processes, suspending of devices, platform global  control methods(*), the disabling of nonboot CPUs and suspending of  platform/system devices(*) the platform global control methods are only available on ACPI systems    and are only tested if the hibernation mode is set to "platform"To use one of them it is necessary to write the corresponding string to/sys/power/pm_test (eg. "devices" to test the freezing of processes andsuspending devices) and issue the standard hibernation commands.  For example,to use the "devices" test mode along with the "platform" mode of hibernation,you should do the following:# echo devices > /sys/power/pm_test# echo platform > /sys/power/disk# echo disk > /sys/power/stateThen, the kernel will try to freeze processes, suspend devices, wait a fewseconds (5 by default, but configurable by the suspend.pm_test_delay moduleparameter), resume devices and thaw processes.  If "platform" is written to/sys/power/pm_test , then after suspending devices the kernel will additionallyinvoke the global control methods (eg. ACPI global control methods) used toprepare the platform firmware for hibernation.  Next, it will wait aconfigurable number of seconds and invoke the platform (eg. ACPI) globalmethods used to cancel hibernation etc.Writing "none" to /sys/power/pm_test causes the kernel to switch to the normalhibernation/suspend operations.  Also, when open for reading, /sys/power/pm_testcontains a space-separated list of all available tests (including "none" thatrepresents the normal functionality) in which the current test level isindicated by square brackets.Generally, as you can see, each test level is more "invasive" than the previousone and the "core" level tests the hardware and drivers as deeply as possiblewithout creating a hibernation image.  Obviously, if the "devices" test fails,the "platform" test will fail as well and so on.  Thus, as a rule of thumb, youshould try the test modes starting from "freezer", through "devices", "platform"and "processors" up to "core" (repeat the test on each level a couple of timesto make sure that any random factors are avoided).If the "freezer" test fails, there is a task that cannot be frozen (in that caseit usually is possible to identify the offending task by analysing the output ofdmesg obtained after the failing test).  Failure at this level usually meansthat there is a problem with the tasks freezer subsystem that should bereported.If the "devices" test fails, most likely there is a driver that cannot suspendor resume its device (in the latter case the system may hang or become unstableafter the test, so please take that into consideration).  To find this driver,you can carry out a binary search according to the rules:- if the test fails, unload a half of the drivers currently loaded and repeat(that would probably involve rebooting the system, so always note what drivershave been loaded before the test),- if the test succeeds, load a half of the drivers you have unloaded mostrecently and repeat.Once you have found the failing driver (there can be more than just one ofthem), you have to unload it every time before hibernation.  In that case pleasemake sure to report the problem with the driver.It is also possible that the "devices" test will still fail after you haveunloaded all modules. In that case, you may want to look in your kernelconfiguration for the drivers that can be compiled as modules (and test againwith these drivers compiled as modules).  You may also try to use some specialkernel command line options such as "noapic", "noacpi" or even "acpi=off".If the "platform" test fails, there is a problem with the handling of theplatform (eg. ACPI) firmware on your system.  In that case the "platform" modeof hibernation is not likely to work.  You can try the "shutdown" mode, but thatis rather a poor man's workaround.If the "processors" test fails, the disabling/enabling of nonboot CPUs does notwork (of course, this only may be an issue on SMP systems) and the problemshould be reported.  In that case you can also try to switch the nonboot CPUsoff and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes andsee if that works.If the "core" test fails, which means that suspending of the system/platformdevices has failed (these devices are suspended on one CPU with interrupts off),the problem is most probably hardware-related and serious, so it should bereported.A failure of any of the "platform", "processors" or "core" tests may cause yoursystem to hang or become unstable, so please beware.  Such a failure usuallyindicates a serious problem that very well may be related to the hardware, butplease report it anyway.b) Testing minimal configurationIf all of the hibernation test modes work, you can boot the system with the"init=/bin/bash" command line parameter and attempt to hibernate in the"reboot", "shutdown" and "platform" modes.  If that does not work, thereprobably is a problem with a driver statically compiled into the kernel and youcan try to compile more drivers as modules, so that they can be testedindividually.  Otherwise, there is a problem with a modular driver and you canfind it by loading a half of the modules you normally use and binary searchingin accordance with the algorithm:- if there are n modules loaded and the attempt to suspend and resume fails,unload n/2 of the modules and try again (that would probably involve rebootingthe system),- if there are n modules loaded and the attempt to suspend and resume succeeds,load n/2 modules more and try again.Again, if you find the offending module(s), it(they) must be unloaded every timebefore hibernation, and please report the problem with it(them).c) Using the "test_resume" hibernation option/sys/power/disk generally tells the kernel what to do after creating ahibernation image.  One of the available options is "test_resume" whichcauses the just created image to be used for immediate restoration.  Namely,after doing:# echo test_resume > /sys/power/disk# echo disk > /sys/power/statea hibernation image will be created and a resume from it will be triggeredimmediately without involving the platform firmware in any way.That test can be used to check if failures to resume from hibernation arerelated to bad interactions with the platform firmware.  That is, if the aboveworks every time, but resume from actual hibernation does not work or isunreliable, the platform firmware may be responsible for the failures.On architectures and platforms that support using different kernels to restorehibernation images (that is, the kernel used to read the image from storage andload it into memory is different from the one included in the image) or supportkernel address space randomization, it also can be used to check if failuresto resume may be related to the differences between the restore and imagekernels.d) Advanced debuggingIn case that hibernation does not work on your system even in the minimalconfiguration and compiling more drivers as modules is not practical or somemodules cannot be unloaded, you can use one of the more advanced debuggingtechniques to find the problem.  First, if there is a serial port in your box,you can boot the kernel with the 'no_console_suspend' parameter and try to logkernel messages using the serial console.  This may provide you with someinformation about the reasons of the suspend (resume) failure.  Alternatively,it may be possible to use a FireWire port for debugging with firescope(http://v3.sk/~lkundrak/firescope/).  On x86 it is also possible touse the PM_TRACE mechanism documented in Documentation/power/s2ram.txt .2. Testing suspend to RAM (STR)To verify that the STR works, it is generally more convenient to use the s2ramtool available from http://suspend.sf.net and documented athttp://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK).Namely, after writing "freezer", "devices", "platform", "processors", or "core"into /sys/power/pm_test (available if the kernel is compiled withCONFIG_PM_DEBUG set) the suspend code will work in the test mode correspondingto given string.  The STR test modes are defined in the same way as forhibernation, so please refer to Section 1 for more information about them.  Inparticular, the "core" test allows you to test everything except for the actualinvocation of the platform firmware in order to put the system into the sleepstate.Among other things, the testing with the help of /sys/power/pm_test may allowyou to identify drivers that fail to suspend or resume their devices.  Theyshould be unloaded every time before an STR transition.Next, you can follow the instructions at S2RAM_LINK to test the system, but ifit does not work "out of the box", you may need to boot it with"init=/bin/bash" and test s2ram in the minimal configuration.  In that case,you may be able to search for failing drivers by following the procedureanalogous to the one described in section 1.  If you find some failing drivers,you will have to unload them every time before an STR transition (ie. beforeyou run s2ram), and please report the problems with them.There is a debugfs entry which shows the suspend to RAM statistics. Here is anexample of its output.# mount -t debugfs none /sys/kernel/debug# cat /sys/kernel/debug/suspend_statssuccess: 20fail: 5failed_freeze: 0failed_prepare: 0failed_suspend: 5failed_suspend_noirq: 0failed_resume: 0failed_resume_noirq: 0failures:  last_failed_dev:alarmadc  last_failed_errno:-16-16  last_failed_step:suspendsuspendField success means the success number of suspend to RAM, and field fail meansthe failure number. Others are the failure number of different steps of suspendto RAM. suspend_stats just lists the last 2 failed devices, error number andfailed step of suspend.