The Linux SG_IO ioctl in the 2.6 series
来源:互联网 发布:中国linux公社 编辑:程序博客网 时间:2024/05/19 23:53
http://gmd20.blog.163.com/blog/static/1684392320100227396270/
原文地址:http://sg.danny.cz/sg/sg_io.html
The Linux SG_IO ioctl in the 2.6 series
- The Linux SG_IO ioctl in the 2.6 series
- Introduction
- SCSI and related command sets
- SG_IO ioctl overview
- SG_IO ioctl in the sg driver
- SG_IO ioctl differences
- open() considerations
- SCSI command permissions
- CAP_SYS_RAWIO from a user process
- SG_IO and the st driver
- Maximum transfer size per command
- Conclusion
Introduction
The SG_IO ioctl permits user applications to send SCSI commands to a device. In the linux 2.4 series this ioctl was onThe information in this page is valid for linux kernel 2.6.16 .
SCSI and related command sets
All SCSI devices should respond to an INQUIRY command and part of their response is the so-called peripheral device type. This is used by the linux kernel to decide which upper level driver controls the device. There are also devices that belong to other (i.e. not considered SCSI) transports that use SCSI command sets, the primary examples of this are (S-)ATAPI CD and DVD drives. Not all peripheral device types map to upper level drivers and devices of these types are usually accessed via the SCSI generic (sg) driver.SCSI (draft) standards are found at www.t10.org . SCSI commands common to all SCSI devices are found in SPC-4 while those specific to block devices are found in SBC-2, those for CD/DVD drives are found in MMC-5 and those for SCSI tape drives are found in SSC-3.
The major non-SCSI command set in the storage area is for ATA non-packet devices which are typically disks. ATApacket devices use ATAPI which in the vast majority of cases carry a SCSI command set. The most recent draft ATA command set standard is ATA8-ACS and can be found at www.t13.org . To complicate things (non-packet) ATA devices may have their native command set translated into SCSI. This can happen in the kernel (e.g. libata in linux) or in an intermediate device (e.g. in a USB external disk enclosure). Yet another possibility are disks whose firmware can be changed to allow them to use either the SCSI or ATA command set, this may happen in the SAS/SATA area since the physical (cabling) and phy (electrical signalling) levels are so similar.
SG_IO ioctl overview
The third argument given to the SG_IO ioctl is a pointer to an instance of the sg_io_hdr structure which is defined in the <scsi/sg.h> header file. The execution of the SG_IO ioctl can viewed as going through three phases:- do sanity checks on the metadata in the sg_io_hdr instance; read the input fields and the da
ta pointed to by some of those fields; build a SCSI command and issue it to the device - wait for either a response from the device, the command to timeout or the user to terminate the process (or thread) that invoked the SG_IO ioctl
- write the output fields and in some cases write da
ta to locations pointed to by some fields, then return
Now we will assume that the SCSI command involves user da
- da
ta is read from the user space in phase 1 into kernel buffers and DMA-ed to the device in phase 2, or - da
ta is read from the device into kernel buffers in phase 2 and written into the user space in phase 3
The sg_io_hdr structure has 22 fields (members) but typically on
unsigned char sense_b[32];
unsigned char turCmbBlk[] = {TUR_CMD, 0, 0, 0, 0, 0};
struct sg_io_hdr io_hdr;
memset(&io_hdr, 0, sizeof(struct sg_io_hdr));
io_hdr.interface_id = 'S';
io_hdr.cmd_len = sizeof(turCmbBlk);
io_hdr.mx_sb_len = sizeof(sense_b);
io_hdr.dxfer_direction = SG_DXFER_NONE;
io_hdr.cmdp = turCmbBlk;
io_hdr.sbp = sense_b;
io_hdr.timeout = DEF_TIMEOUT;
if (ioctl(fd, SG_IO, &io_hdr) < 0) {
The memset() call is pretty imp
Below is a grouping of imp
Command block (historically referred to as the "cdb"):
- cmdp - pointer to cdb (the SCSI command block)
- cmd_len - length (in bytes) of cdb
- dxferp - pointer to user da
ta to start reading from or start writing to - dxfer_len - number of bytes to transfer
- dxfer_direction - whether to read from device (into user memory) or write to device (from user memory) or transfer no da
ta: DXFER_FROM_DEV, DXFER_TO_DEV or DXFER_NONE respectively - resid - requested number of bytes to transfer (i.e. dxfer_len) less the actual number transferred
- status - SCSI status returned from the device
- host_status - error from Host Bus Adapter including initiator (port)
- driver_status - driver (mid level or low level driver) error and suggestion mask
- sbp - pointer to start writing sense da
ta to - mx_sb_len - maximum number of bytes to write to sbp
- sb_len_wr - actual number of bytes written to sbp
SG_IO ioctl in the sg driver
Linux kernel 2.4.0 was the first production kernel in which the SG_IO ioctl appeared in the SCSI generic (sg) driver. The sg driver itself has been in linux since around 1993. An instance of the sg_io_hdr structure in the sg driver can either be:- pointed to by the third argument of the SG_IO ioctl
- pointed to by the second argument of UNIX write() or read() system calls which have a file descriptor of a sg device node as their first argument
- a new metadata structure (sg_io_hdr) as an alternative to the original mixed metadata and da
ta structure (sg_header) - the SG_IO ioctl that used the new metadata structure and was synchronous: it sent a SCSI command and waited for its reply
A significant feature of the SG_IO ioctl in the sg driver is that it is user interruptible. This means between issuing a command (e.g. a long duration command like a disk format) and its response arriving a user could hit control-C on the associated application. The kernel would remain stable and resources would be cleared up at the appropriate time. The sg driver does not attempt to abort such a command that is "in flight", it simply throws away the response and cleans up. Naturally the user has no direct way of finding out whether an interrupted command succeeded or not, by there may be indirect ways.
A warning may also be in order here: a long duration command such as format would typically be given a long timeout value. If the user interrupted the application that sent the format command then the device may remain busy doing the format (especially if the IMMED bit is not set). So if the user then sent a short duration command such as TEST UNIT READY or REQUEST SENSE to see what the device was doing, these commands may timeout. This would invoke the SCSI subsystem error handler which would most likely send a device reset, thus aborting the format, to get the device's attention. This is probably not what the user had in mind!
SG_IO ioctl differences
In the following table, sg_io_hdr structure fields are listed in the order they appear in that structure. Basically the "in" fields appear at the top of the structure and are read in phase 1. The latter fields are termed as "out" and are written by the SG_IO implementation in phase 3.unsigned intminornumber of bytes of da
(if = 0)time in milliseconds that the SCSI mid-level will wait for a response. If that timer expires before the command finishes, then the command may be aborted, the device (and maybe others on the same interconnect) may be reset depending on error handler settings. Dangerous stuff, the SG_IO ioctl has no control (through this interface) of exactly what happens. In the sg driver a timeout value of 0 means 0 milliseconds, in the block layer (currently) it means 60 seconds.flagsinunsigned intyesBlock layer SG_IO ioctl ignores this field; the sg driver uses it to request special services like direct IO or mmap-ed transfers. It is a bit mask.pack_idin -> outint unused (for user space program tag)usr_ptrin -> outvoid * unused (for user space pointer tag)statusoutunsigned char SCSI command status, zero implies GOODmasked_statusoutunsigned char Logically: masked_status == ((status & 0x3e) >> 1). Old linux SCSI subsystem usage, deprecated.msg_statusoutunsigned char SCSI parallel interface (SPI) message status (very old, deprecated)sb_len_wroutunsigned char actual length of sense da
The DID_* and DRIVER_* error and suggestion codes (associated with host_status and driver_status) are discussed in more detail in the SCSI-Generic-HOWTO document.
open() considerations
Various drivers have different characteristics when a device node is opened. OnOpening a file in linux with flags of zero implies the O_RDONLY flag and hence read on
A user with CAP_SYS_RAWIO capability (normally associated with the "root" user) bypasses all command sniffing and other access controls that would otherwise lead to EACCES or EPERM errors. With the sg driver such a user may still need to open() a device node with O_RDWR (rather than O_RDONLY) to use all SCSI commands.
notes
sd
notes
st
notes
cdrom
notes
Comments<none> or
O_RDONLY1, 23,43,53,6best to add O_NONBLOCK. For a device with removable media (e.g. tape drive) that depends on whether the drive or its media is being accessed.O_RDONLY | O_NONBLOCK1,733,133recommended when SCSI commands are recognized as reading information from the deviceO_RDWR24,8,95,8,96,8,9again, could be better to add O_NONBLOCKO_RDWR | O_NONBLOCK78,98,9,138,9recommended when arbitrary (including vendor specific) SCSI commands are to be sent<< interaction with O_EXCL>>10111211on
Notes:
- on subsequent SG_IO ioctl calls, the sg driver will on
ly allow SCSI commands in its allow_ops array, others result in EPERM (operation not permitted) in errno. See below . - if previous open() of this sg device node still holds O_EXCL then this open() waits until it clears.
- on subsequent SG_IO ioctl calls, the block layer will on
ly allow SCSI commands listed as "safe_for_read" in the verify_command() function in the drivers/block/scsi_ioctl.c file; others result in EPERM (operation not permitted) in errno. See below . - if removable media and it is not present then yields ENOMEDIUM (no medium found)
- if a tape is not present in drive then yields EIO (input/output error), if tape is "in use" then yields EBUSY (resource busy). On
ly on e open file descriptor is allowed per st device node at a time (although dup() can be used). - if tray closed and media is not present then yields ENOMEDIUM (no medium found); if tray open then tries to close it and if no media present then yields ENOMEDIUM
- if previous open() of this sg device node still holds O_EXCL then yields EBUSY (resource busy).
- on subsequent SG_IO ioctl calls, the block layer will allow SCSI commands listed as either "safe_for_read" or "safe_for_write". For other SCSI commands the user requires the CAP_SYS_RAWIO capability (usually associated with the "root" user); if not yields EPERM (operation not permitted). The first instance of other SCSI commands since boot, sends an annoying "scsi: unknown opcode" message to the log.
- if the media or drive is marked as not writable then yields EROFS (read-on
ly file system). - if sg device node already has exclusive lock then a subsequent attempt to open(O_EXCL) will wait unless O_NONBLOCK is given in which case it yields EBUSY (resource busy)
- implemented at block device level (which knows about partitions within devices). If a previous open(O_EXCL) is active then a subsequent open(O_EXCL) yields EBUSY (resource busy). Mounted file systems typically open a device/partition with O_EXCL; as long as an application using the SG_IO ioctl does not also try and use the O_EXCL flag then it will be allowed access to the device.
- the st driver does not support (i.e. ignores) the O_EXCL flag. However the fact that it on
ly permits on e active open() per tape device is similar functionality. - if tape is "in use" then yields EBUSY (resource busy). On
ly on e open file descriptor is allowed per st device node at a time.
The first successful open on a sd or a cdrom device node that has removable media will send a PREVENT ALLOW MEDIUM REMOVAL (prevent) SCSI command to the device. If successful, this will inhibit a subsequent START STOP UNIT (eject) SCSI command and de-activate the eject button on the drive. In emergencies, the SG_IO ioctl can be used to defeat this act
The open() flag O_NDELAY has the same value and meaning as O_NONBLOCK. Other flags such as O_DIRECT, O_TRUNC and O_APPEND have no effect on the SG_IO ioctl.
SCSI command permissions
In linux a user onHere is a table of SCSI commands that don't need the user to have write permissions (or in some cases CAP_SYS_RAWIO capability which usually equates to "root" user):
requires (except st)CommentsBLANKMMC-4O_RDWRO_RDWR CLOSE TRACK/SESSIONMMC-4O_RDWRO_RDWR ERASEMMC-4O_RDWRO_RDWR FLUSH CACHESBC-3, MMC-4O_RDWRO_RDWRReally SYNCHRONIZE CACHE commandFORMAT UNITSBC-3, MMC-4O_RDWRO_RDWRdefault command timeout may not be long enoughGET CONFIGURATIONMMC-4O_RDWRO_RDONLYreads CD/DVD metadataGET EVENT STATUS NOTIFICATIONMMC-4O_RDWRO_RDONLY GET PERFORMANCEMMC-4O_RDWRO_RDONLY INQUIRYSPC-4O_RDONLYO_RDONLYAll SCSI devices should respond to this commandLOAD UNLOAD MEDIUMMMC-4O_RDWRO_RDWRMEDIUM may be replaced by CD, DVD or nothingLOG SELECTSPC-4O_RDWRO_RDWRused to change logging or clear logged da
various "REPORT ..." commands such as REPORT SUPPORTED OPERATION CODES in hereMODE SELECT (6+10)SPC-4O_RDWRO_RDWRUsed to change SCSI device metadataMODE SENSE (6+10)SPC-4O_RDONLYO_RDONLYUsed to read SCSI device metadataPAUSE RESUMEMMC-4O_RDWRO_RDONLY PLAY AUDIO (10)MMC-4O_RDWRO_RDONLY PLAY AUDIO MSFMMC-4O_RDWRO_RDONLY PLAY AUDIO TI??O_RDWRO_RDONLYopcode 0x48, unassigned to any spec in SPC-4PLAY CDMMC-2O_RDWRO_RDONLYold, now SPARE IN in SPC-4PREVENT ALLOW MEDIUM REMOVALSPC-4, MMC-4O_RDWRO_RDWRsd, st and cdrom drivers use this internallyREAD (6+10+12+16)SBC-3O_RDONLYO_RDONLYREAD(16) requires O_RDWR with the sg driver before lk2.6.11READ BUFFERSPC-4O_RDONLYO_RDONLY READ BUFFER CAPACITYMMC-4O_RDWRO_RDONLY READ CAPACITY(10)SBC-3, MMC-4O_RDONLYO_RDONLY READ CAPACITY(16)SBC-3,
MMC-4O_RDONLYCAP_SYS_RAWIOwithin SERVICE ACT
Any other SCSI command (opcode) not mentioned for the sg driver needs O_RDWR. Any other SCSI command (opcode) not mentioned for the block layer SG_IO ioctl needs a user with CAP_SYS_RAWIO capability. All "block" SG_IO ioctl calls on st device nodes need a user with CAP_SYS_RAWIO capability. If a user does not have sufficient permissions to execute a SCSI command via the SG_IO ioctl then the system calls fails (i.e. no SCSI command is sent) and errno is set to EPERM (operation not permitted).
Both the sg driver and the block layer SG_IO co
CAP_SYS_RAWIO from a user process
While root processes usually have CAP_SYS_RAWIO, processes running under a user's ID (i.e. non-root) typically don't. Hence non-root processes may not be able to use SG_IO to send SCSI commands that require CAP_SYS_RAWIO. This may occur even if the permission bits of the device node file allow for read or write access, user processes will receive EPERM when using SG_IO.By default the capability to assign capabilities to other processes (CAP_SETPCAP) is limited to very few processes, such as certain kernel threads. Changing this default would require to change and recompile the kernel.
Processes which are forked by a root process and call setuid later will lose the CAP_SYS_RAWIO capability the parent root process (and the child before the setuid) had. However, the child can preserve the capabilities of the root process in the permitted set and raise it after the call of setuid:
/* ... in child after fork(), still running as root ... */
prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0);
setuid(...);
cap_set_proc(cap_from_text("cap_sys_rawio+ep"));
This way a user process with a parent root process can 'get back' the required capabilities to directly send SCSI commands to a device via SG_IO.
The above technique may be of use to daemons that are started with root permissions (most are) and then changes to another user after a fork(). It is not obvious to the author how utilities that use the SG_IO ioctl on device nodes that require CAP_SYS_RAWIO for some or all SCSI commands (e.g. nodes associated with the sd and st drivers) can use the above technique.
SG_IO and the st driver
In order to implement its user space API, the st driver has to maintain information about where the read head is with respect to the structural elements of the tape (filemarks, beginning of tape, end of daSo mixing st driver read, write and ioctl commands with SCSI commands sent via SG_IO that change the state of the tape is not recommended. This applies whether the SG_IO SCSI commands are sent via st or sg device nodes.
Maximum transfer size per command
The largest amount of daIn the past, Linux used a single, "big-enough", block of memory for the source or destination of large da
The Linux SCSI subsystem imposes a 128 element limit on scatter gather lists via its SCSI_MAX_PHYS_SEGMENTS define. The way various memory pools are allocated by the linux SCSI subsystem, SCSI_MAX_PHYS_SEGMENTS could be increased to 256. Associated with each type of HBA there is normally a low level driver (LLD). Each LLD can further limit the maximum number of elements with the scsi_host_template::sg_tablesize field. Prior to lk 2.6.16 the sg and st drivers used the .sg_tablesize field on
User space memory may be allocated as the source and/or destination for DMA transfers from the HBA (i.e. direct IO). Even if the user space allocated a large amount of memory with a single malloc(), the HBA DMA element typically has a different view of memory. This view may well contain many "page" size discontinuous pieces. This has the effect of using up, or perhaps exhausting, scatter-gather elements.
The sg driver attempts to build scatter gather lists with each element up to SG_SCATTER_SZ bytes large. This define is found in include/scsi/sg.h and has been set to 32 KB for some years. That is 8 times the page size (of 4 KB) on the i386 architecture. Some users who need really large transfers increase this define (and it is best to keep it a power of 2). However since lk 2.6.16 another limit comes into play: the MAX_SEGMENT_SIZE define which is set to 64 KB. MAX_SEGMENT_SIZE is a default and can be overridden by the LLD calling blk_queue_max_segment_size().
In lk 2.6.16 two further LLD parameters come into play even when the sg (and st) driver is used. These are scsi_host_template::max_sectors and scsi_host_template::use_clustering .
The .max_sectors setting in the LLD is the maximum number of 512 byte sectors allowed in a single SCSI command's scatter gather lists (for da
The .use_clustering field should be set to ENABLE_CLUSTERING . If not, the block subsystem rebuilds the scatter gather list it gets from the sg driver with page size (e.g. 4 KB) elements. [Actually is does that anyway, but when ENABLE_CLUSTERING is set, it coalesces them again!]
Conclusion
In some situations, sending commands via the SG_IO ioctl may interfere with a higher level driver's use of a device. Users of the SG_IO ioctl should be aware that they are using a powerful, but low level facility, and write coReturn to main page.
Last updated: 26th July 2008
- The Linux SG_IO ioctl in the 2.6 series
- The c++ in-depth series
- The C++ In-Depth Series
- The C++ In-Depth Series
- The Morgan Kaufmann Series Ebooks in Computer Graphics
- The Video4Linux2 API series
- The Swift Beginner Series
- The symble '-' in Linux
- The new way of ioctl()---ioctl新方法
- add the disk in the linux
- The Test in the LINUX Shell
- Somethings about the Process in the Linux
- Win32 Series - The Caret (Not the Cursor)
- Mini Series: The Lost Room
- joj 2677 The Natural Series
- Win32 Series - The Device Context
- Win32 Series - Capturing the Mouse
- Win32 Series - The Mouse Wheel
- nyist-34
- 树莓派搭建 Google TV
- eclipse3.6配置tomcat6.0
- Java提高:你能正确使用String、StringBuffer、StringBuilder吗
- group by后使用rollup子句总结
- The Linux SG_IO ioctl in the 2.6 series
- 启动一个服务监控android系统的打印日志--实现卸载软件提示
- 视图编程指南View Programming Guide for iOS-1
- 正确地做事与做正确的事同样重要
- 总有一天你将破蛹而出
- php 数组常用操作 (合并,拆分,追加,查找,删除...)
- 海思海思 Hi264DecFrame 函数 求助
- tcp/ip协议详解第二章数据链路层读书笔记
- Android 实用工具Hierarchy Viewer实战