Driver porting: supporting mmap()

来源:互联网 发布:塔人网络接收奇迹世界 编辑:程序博客网 时间:2024/05/18 19:44

http://lwn.net/Articles/28746/


Occasionally, a device driver will need to map an address range into a userprocess's space. This mapping can be done to give the process directaccess to a device's I/O memory area, or to the driver's DMA buffers. 2.6features a number of changes to the virtual memory subsystem, but, for mostdrivers, supporing mmap() will be relatively painless.

Using remap_page_range()

There are two techniques in use for implementing mmap(); often thesimpler of the two is usingremap_page_range(). This functioncreates a set of page table entries covering a given physical addressrange. The prototype ofremap_page_range() changed slightly in2.5.3; the relevant virtual memory area (VMA) pointer must be passed as thefirst parameter:

    int remap_page_range(struct vm_area_struct *vma, unsigned long from,         unsigned long to, unsigned long size,  pgprot_t prot);

remap_page_range() is now explicitly documented as requiring thatthe memory management semaphore (usuallycurrent->mm->mmap_sem) be held when the function is called.Drivers will almost invariably callremap_page_range() from theirmmap() method, where that semaphore is already held. So, in otherwords, driver writers do not normally need to worry about acquiringmmap_sem themselves. If you useremap_page_range() fromsomewhere other than your mmap() method, however, do be sure youhave acquired the semaphore first.

Note that, if you are remapping into I/O space, you may want to use:

    int io_remap_page_range(struct vm_area_struct *vma, unsigned long from,            unsigned long to, unsigned long size,     pgprot_t prot);

On all architectures other than SPARC, io_remap_page_range() isjust another name forremap_page_range(). On SPARC systems,however, io_remap_page_range() uses the systems I/O mappinghardware to provide access to I/O memory.

remap_page_range() retains its longstanding limitation: it cannotbe used to remap most system RAM. Thus, it works well for I/O memoryareas, but not for internal buffers. For that case, it is necessary todefine anopage() method. (Yes, if you are curious, the "markpages reserved" hack still works as a way of getting around thislimitation, but its use is strongly discouraged).

Using vm_operations

The other way of implementing mmap is to override the default VMAoperations to set up a driver-specificnopage() method. Thatmethod will be called to deal with page faults in the mapped area; it isexpected to return astruct page pointer to satisfy the fault. Thenopage() approach is flexible, but it cannot be used to remap I/Oregions; only memory represented in the system memory map can be mapped inthis way.

The nopage() method made it through the entire 2.5 developmentseries without changes, only to be modified in the 2.6.1 release. The prototype for thatfunction used to be:

    struct page *(*nopage)(struct vm_area_struct *area,                            unsigned long address,    int unused);

As of 2.6.1, the unused argument is no longer unused, and theprototype has changed to:

    struct page *(*nopage)(struct vm_area_struct *area,                    unsigned long address,    int *type);

The type argument is now used to return the type of the pagefault; VM_FAULT_MINOR would indicate a minor fault - one where thepage was in memory, and all that was needed was a page table fixup. Areturn ofVM_FAULT_MAJOR would, instead, indicate that the pagehad to be fetched from disk. Driver code usingnopage() toimplement a device mapping would probably return VM_FAULT_MINOR.In-tree code checks whethertype is NULL before assigningthe fault type; other users would be well advised to do the same.

There are a couple of other things worth mentioning. One is that thevm_operations_struct is rather smaller than it was in 2.4.0; theprotect(),swapout(),sync(),unmap(), andwppage()methods have all gone away (they were actually deleted in 2.4.2). Devicedrivers made little use of these methods, and should not be affected bytheir removal.

There is also one new vm_operations_struct method:

    int (*populate)(struct vm_area_struct *area, unsigned long address,                     unsigned long len, pgprot_t prot, unsigned long pgoff,     int nonblock);

The populate() method was added in 2.5.46; its purpose is to"prefault" pages within a VMA. A device driver could certainly implementthis method by simply invoking itsnopage() method for each pagewithin the given range, then using:

    int install_page(struct mm_struct *mm, struct vm_area_struct *vma,                      unsigned long addr, struct page *page,      pgprot_t prot);

to create the page table entries. In practice, however, there is no realadvantage to doing things in this way. No driver in the mainline (2.5.67)kernel tree implements thepopulate() method.

Finally, one use of nopage() is to allow a user process to map akernel buffer which was created withvmalloc(). In the past, adriver had to walk through the page tables to find astruct pagecorresponding to a vmalloc() address. As of 2.5.5 (and 2.4.19),however, all that is needed is a call to:

    struct page *vmalloc_to_page(void *address);

This call is not a variant of vmalloc() - it allocates no memory.It simply returns a pointer to thestruct page associated with anaddress obtained from vmalloc().

原创粉丝点击