精华区文章阅读

发信人: Zinux (Linux技工), 信区: Embedded_system
标  题: Porting Linux 2.0 Drivers To Linux 2.2
发信站: 哈工大紫丁香 (2001年10月26日18:26:35 星期五), 站内信件

Linux Magazine (http://www.linux-mag.com) May 1999

Copyright Linux Magazine ?1999

GEARHEADS ONLY
Porting Linux 2.0 Drivers To Linux 2.2: Changes and New Features
by Alan Cox

The desire for more speed and better multi-processor support has
caused inevitable changes in Linux, resulting in the development of
the current kernel -- Linux 2.2. As a driver author, you may initially
avoid taking advantage of the latest kernel changes. Ultimately,
however, you'll probably end up re-writing your driver to stay current
with kernel design, to improve your driver's performance, and to take
advantage of the ever-increasing opportunities that appear on the
advantage of the ever-increasing opportunities that appear on the
horizon for Linux users in 1999.

This is a comprehensive background on the changes that you, as a
driver author, will need to make in order to port a driver from Linux
2.0 or 2.1 to the new 2.2 kernel. Even if you're writing 2.0 based
add-on drivers and not necessarily familiar with the Linux kernel to
begin with, you'll have a fighting chance of making your driver work
with 2.2. And hopefully you'll learn a little something along the way.

None of the changes in 2.2 are gratuitous. Where possible
compatibility modes exist for old methods. These modes provide
warnings that the driver ought to be updated to the new methods.
Several drivers in 2.2.0 still produce these warnings, so you
shouldn't feel bad if your driver produces them, especially if all you
want is to make it work.

Access To User Space

When comparing older kernels with 2.2, the most obvious change you
will see is that the pair verify_area() and memcpy_to/from_user have
mutated. This is because of the good old need for speed, and also
creates a convenient place to clean up some complicated SMP race
creates a convenient place to clean up some complicated SMP race
conditions. Contrary to rumor, it doesn't exist just to annoy device
driver writers.

With Linux 2.0 the processor walked the list of memory owned by a
process to determine if an access to user space was legal. This was
done to ensure that the read or write didn't succeed when it
shouldn't, because when an illegal read or write does succeed, the
resulting "Ooops" message isn't pretty.

Linux 2.2 changes the rules of the game. Since the memory management
hardware can do most of the checking on a 486 or higher, it would be
silly to do it with the software as well, especially since most
accesses are legal.

Instead the kernel builds tables that contain information about what
addresses may fault, and where to jump if they do.

When a user passes an invalid address, a basic sanity check is
performed to ensure that it is not a kernel address. Once verified,
the kernel can trust the values given, knowing it can still recover.
The actual mechanics are not trivial, involving some interesting
abuses of the ELF binary format and some clever inline assembler
abuses of the ELF binary format and some clever inline assembler
tricks. Fortunately they're all wrapped up nicely for you.

Figure 1A contains an example of a driver written for 2.0. Under 2.2,
the same driver will be written as in Figure 1B.

Figure 1A

struct thing my_thing;

if(verify_area(VERIFY_READ, userptr, sizeof(my_thing)))
       return -EFAULT;
memcpy_fromuser(&my_thing, userptr, sizeof(my_thing));

Figure 1B

#include <asm/uaccess.h>

struct thing my_thing;

if(copy_from_user(&my_thing, userptr, sizeof(my_thing)))
       return -EFAULT;
       return -EFAULT;

The copy_from_user function returns zero on a successful copy, or if
it faults, it returns the number of bytes it was unable to copy. It is
both cleaner and faster, with all the magical fault catching concealed
by the driver author. Copy_to_user works the same way.

Linux 2.0 also has a set of functions -- get_user() and put_user() --
that did the same things for native C types as the memcpy functions.
These still exist, but their behavior has changed (and may be why you
now have hundreds of warnings in your partially ported driver!).

Previously, get_user() returned the value of the object. So it would
read something like Figure 2A.

Figure 2A

if(verify_area(VERIFY_READ, pointer, sizeof(*pointer)))
       return -EFAULT;
c=get_user(pointer);
switch(c)
switch(c)
{
       ..

Figure 2B

if(get_user(c, pointer))
       return -EFAULT;
switch(c)
{
       ..

In 2.2 the get_user function handles the fault checking, so it needs
to return two different pieces of information. The arguments have
changed, and it now returns zero on a successful read and -EFAULT
otherwise. Figure 2A is replaced by Figure 2B.

The 2.0 put_user function has been given the same treatment. The fact
that it returns -EFAULT or zero can be very useful since many routines
can now simply use:
can now simply use:

return put_user(value,
                pointer)

to get the desired error/success return to userspace.

File Operations Changes

Almost every device driver, except for the network drivers, interacts
with the file system. The file system layers have changed somewhat,
although the impact on a device driver that doesn't wish to get
involved are minimal.

First, many drivers need to obtain the inode of a passed file handle.
In 2.0 this was done with:

struct file *filp;
struct inode *inode;

inode = filp->f_inode;

In 2.2 these are handled via the directory cache (dcache), a namespace
cache of active and recently accessed files. This makes things like
the find command much faster. Fortunately, the change from a driver
point of view is nice and simple:

inode =
  filp->f_dentry->d_inode;

For file systems the changes are major, and a review of the changes
involved in porting file systems deserves an article unto itself.

The read and write operations have changed only a little. They now
pass the file offset pointer as an argument instead of relying on the
one in the file handle. It may well be that the pointer indicates the
one in the file handle. It may well be that the pointer indicates the
offset in the file handle, but you don't need to worry about that,
because the POSIX standard defines pread/pwrite operations that allow
you to automatically seek and fetch data at a given position. In the
conventional UNIX API, the seek (selecting offset) and the read of the
data were separate events. Care had to be taken with a threaded
program so that when two threads accessed a file that they didn't end
up seeking and then having the other thread move the file position
before they could read it. Pread/pwrite negates this problem.

The drivers that care about file position (which is not all of them --
a file position is not meaningful to a tty, for example) should be
using the passed offset pointer instead of changing filp->f_pos.

The release (close) operation is called, as before, on the last close
of a file. A small change here is that it is entitled to return a
failure code, which can be returned via close(). The handle must still
be closed, but it allows you to report that the close stumbled across
a problem.

There is also a flush operation which is invoked when any given
process closes its copy of the file handle. At the moment this is only
used for NFS writes where the close of the file may be the only point
used for NFS writes where the close of the file may be the only point
at which you discover that a write fails because the remote disk is
full.

In most cases this functionality shouldn't be needed.

Finally, the disappearance of the select method will be very visible
to device drivers. This method, and indeed the whole of select in the
kernel has been replaced by the more scalable, but arguably less
elegant, system 5 based poll. The change is not visible to end users
because the kernel emulates the old select call with poll to extend
compatibility.

The changes made for poll in most cases can be applied fairly
mechanically to any device. The fundamental API change is mostly
invisible to the device driver author.

2.0 based driver code was called with a select_table as the final
argument. This has become a poll_table, although the functionality is
basically the same. It is used to keep a list of events that may cause
the status of the poll() return to change. The wait queue which
indicates something may have occurred is added to the poll table
using:
using:

struct file *filp;
poll_table *wait;
struct wait_queue *queue;

poll_wait(filp, queue,wait);

The poll handler should then check what events are presently true. The
main events are listed in Figure 3.

Figure 3

POLLERR - an error is pending
POLLHUP - a hangup occurred

POLLIN - input data exists
POLLRDNORM - normal readable data exists

POLLPRI - a "priority" message is waiting (used for urgent data on
sockets)
sockets)

POLLOUT - output is possible (there is room)
POLLWRNORM - there is space to output normal data.

Finally, the poll handler returns a mask of these events. The poll
function will be called whenever a process is polling a file and the
kernel code thinks the status may have changed.

When the required events are true, the poll system call will clean up
the tables without driver assistance and then return to the user.

Figure 4 contains a simple example for a read only device (the bus
mouse driver) from both kernels 2.0 and 2.2.

Figure 4

Linux 2.0 - select

/*
  * select for mouse input
  * select for mouse input
  */
static int mouse_select(struct inode *inode, struct file *file,
                              int sel_type, select_table * wait)
{
         if (sel_type == SEL_IN) {
                 if (mouse.ready)
                       return 1;
                 select_wait(&mouse.wait, wait);
         }
         return 0;
}

Linux 2.2 - poll

/*
  * poll for mouse input
  */
static unsigned int mouse_poll(struct file *file, poll_table * wait)
{
       poll_wait(file, &mouse.wait, wait);
       if (mouse.ready)
             return POLLIN | POLLRDNORM;
             return POLLIN | POLLRDNORM;
       return 0;
}

Init Functions

A lot of drivers contain code executed only at start-up time. In Linux
2.2 based drivers, you can mark these functions and code with __init
and __initdata. The kernel build uses more ELF and compiler tricks to
collect these functions at link time and throws them away after
booting to make more memory available for applications. Some
platforms, however, don't support __init and __initdata. For those
platforms, they are ignored.

Including <asm/init.h>and marking initialization data and code with
these can often save you 5 to 10 percent of the total size of a device
driver. A typical 2.2 kernel build throws some 40K of initialization
code away at boot time.
code away at boot time.

Interrupt Handlers

With older systems you could assume (although you probably shouldn't
have) that a PC would have 16 interrupts. You cannot assume this with
Linux 2.2. Because Linux 2.2 uses the APIC interrupt controller on
multiprocessor machines, you might have 64 interrupt lines or more. In
other words, do not assume anything about the number of interrupts.

In the new 2.2 kernel, the notion of fast interrupts is gone. If you
set the SA_INTERRUPT flag to indicate your interrupt is fast, then
interrupts will be disabled on that processor while your interrupt is
handled, but the remaining semantics of a "fast" interrupt are not
emulated. And normally, this shouldn't matter.

A lot of 2.0 based code looks something like Figure 5A. The dev_id
field in the interrupt structure is specifically intended to pass this
kind of information -- thus avoiding the need for device<->interrupt
tables. Such tables do not work for PCI where an interrupt is likely
to be shared by two instances of the same device. Instead use the call
described in Figure 5B.

Figure 5A

void my_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
     struct my_device *dev=my_devices[irq];
     ...

Figure 5B

request_irq(irq, my_interrupt, SA_SHIRQ, "mythingy", dev);

This will call the interrupt handler withdev_id holding the value of
dev that is passed to the request function.

There is one last thing to worry about with interrupts in 2.2 based
drivers, and if you are using multi-processor machines it may require
some thought. Under Linux 2.2 an interrupt can be executing in
parallel with other kernel code. This is different from 2.0, which
used global locking to make the SMP transition simple.
used global locking to make the SMP transition simple.

You are still guaranteed that cli() and sti() will protect a section
of code and prevent the kernel from running an interrupt handler
during the protected block, but you are no longer guaranteed that an
interrupt handler itself will prevent other kernel code from running.
To handle this you will need to use spinlocks.

Spinlocks and SMP

For the sake of expediency, we'll only review the basic spinlocks as a
recipe for handling interlocking between an interrupt handler and the
kernel code. On a single processor machine these functions are turned
into the conventional cli/sti functions and have no overhead. However,
you should probably test them with an SMP build (even on a single CPU
machine) to be sure they work correctly.

A spinlock is a type: spinlock_t. It is initialized with the function:

spinlock_t lock;
spin_lock_init(&lock);

This sets the lock up and indicates that it is not being held.

When you want to use a spinlock you must grab it. The function sits in
a tight loop until it grabs the lock. In the event that you're using
the lock from both interrupt and non-interrupt contexts, you'll need
to disable that interrupt or all local interrupts when grabbing the
lock. This is common enough that a number of functions cover it.

To grab a lock:

spin_lock(&lock);

To release a lock:

spin_unlock(&lock);

To grab a lock, save the irq mask and disable local interrupts:
To grab a lock, save the irq mask and disable local interrupts:

unsigned long flags;
spin_lock_irqsave(&lock,flags);

and to restore it:

spin_unlock_irqrestore(&lock,flags);

The normal use of such code can be seen in Figure 6.

Figure 6

void my_interrupt(int irq, void *dev_id, struct pt_regs *regs)
     {
         struct my_device *dev=dev_id;
         spin_lock(&dev->lock);

         /* Do the same things as we always did in 2.0
         /* Do the same things as we always did in 2.0
            knowing user code grabbing the lock will be held up until
... */

         spin_unlock(&dev->lock);
     }

Figure 7 contains a non-interrupt context where you need to protect
small sections of code from the interrupt handler running in parallel.
Note the use of the irq disabling version of the lock. This is very
important -- without it we may take the lock in the user code, then
start an interrupt. The interrupt routine will spin forever, trying to
get a lock that is not going to be released (because we're stuck in
the interrupt so we can't be running the user code). When this
happens, you have to reboot, and you won't be happy. Use the right
version of the spinlocks for the sake of general user happiness.

Figure 7
Figure 7

struct my_device *dev;
unsigned long flags;

spin_lock_irqsave(&dev->lock, flags);

/* The interrupt cannot interfere here */
/* Do the things we did in 2.0 */

spin_unlock_irqrestore(dev->lock, flags);

The spinlocks guarantee one additional thing. They are marked with the
required magic to tell gcc that they are memory barriers. Even if you
are not using volatile types, gcc will write any values from registers
to their final destination before unlocking. It will also read values
directly from memory, not from saved copies in registers made before
the lock is taken. This means that you don't need to worry about any
misery-producing optimization surprises the compiler might otherwise
invent. In Linux 2.0 the cli() sti() and restore_flags() functions
have this property, and in Linux 2.2 this continues to be true.
have this property, and in Linux 2.2 this continues to be true.

The io_request lock

The io_request lock is a spin lock that is taken by the kernel when
queuing a request to a block device (a hard disk, a floppy disk, or
similar devices which can contain a file system). If the driver is
unaware of the lock it will perform the way it does with 2.0. The I/O
operation will

remain single-threaded. If the driver is aware of and uses the lock,
then it can get the advantages of parallel I/O operations across
multiple processors.

The lock protects the request queue,so a driver can safely drop it
once it has copied or processed the request queue entry. In some cases
this is done by the device driver. In others, SCSI for example, by the
supporting code.

There are two reasons to drop the lock. First, it results in better
performance from your device driver. Secondly, it keeps interrupts
enabled during your device operation. You may need to do this because
the device is very slow or because you have to use busy loops with
the device is very slow or because you have to use busy loops with
timeouts.

The lock is dropped with:

spin_unlock_irq
  (&io_request_lock)

and taken with:

spin_lock_irq
  (&io_request_lock)

These are variants on the functions covered in the Spinlock section.
The spin_lock_irq always disables the interrupts on that processor;
the spin_unlock_irq always restores them.

Several SCSI drivers make use of this because of things like timeout
handling. The NCR5380, for example, drops the lock during the various
handling. The NCR5380, for example, drops the lock during the various
delay loops required to control the relatively primitive controller it
uses.

Figure 8

spin_unlock_irq(&io_request_lock);

     while(!(NCR5380_read(INITIATOR_
COMMAND_REG)& ICR_ARBITRATION_PROGRESS)
             && time_before(jiffies,timeout));

spin_lock_irq(&io_request_lock);

An example can be seen in Figure 8. This allows the timer to continue
running, and other processes can continue while the ancient 5380
hardware whirs into action. Because it drops the I/O request lock in
its own handlers, it also claims it again in its interrupt function.
The interrupt function in this case also manipulates the request queue
and so must protect itself from another processor which may also be
queuing blocks for the device.

And that's it: a general overview of some of the changes you'll need
to know about in order to take advantage of the latest advances in the
Linux 2.2 Kernel.

--

  puke!
  技工而已

※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: 202.118.239.152]

Embedded 版 (精华区)