How the PCI bus works

来源：互联网发布：香港网络电视编辑：程序博客网时间：2024/05/18 09:09

Peoplecan download this article from below link.

http://sorubank.ege.edu.tr/~charmansah/dersler/pci_sunum1.pdf

It's not writtend by me. I post thisarticle here just for reference.

How the PCI Bus Works

The acronym PCI stands for PeripheralComponent Interconnect, which apply describes what it does.

PCI was designed to satisfy therequirement for a standard interface for connecting peripherals to aPC, capable of sustaining the high data transfer rates needed bymodern graphics controllers, storage media, network interface cardsand other devices.

PCI Design

PCI's designers decided to avoid thesedifficulties altogether by making PCI an asynchronous bus. The topspeed for most PCI cards is 33MHz. The PCI 2.1 specification madeprovision for a doubling of the speed to 66MHz, but support forthis higher speed was optional.

At 33MHz, with a 32-bit data bus, thetheoretical maximum data transfer rate of the PCI bus is 132MB/sec.At 66MHz, with a 64-bit data path, the top speed would be528MB/sec. The PCI bus can run at lower speeds. In a system clockedat 25MHz, for example, the bus could also run at this speed. Thiswas an important consideration at the time PCI was being developed.

Peripherals must be designed to workover the entire range of permitted speeds. In the original PCIspecification the lower limit to the speed range was given as 16MHz;in PCI revision 2.0 this was reduced to 0MHz. This supports 'green'power saving modes by allowing the system to run at reduced speedfor lower power consumption, or to be put into 'suspend' mode(0MHz), without any bus status

information being lost.

The number of devices on a PCI busdepends on the load. In practice this means three or four slots,plus an on-board disk controller and a secondary bus. Up to 256 PCIbusses can be linked together, though, to provide extra slots, sothis is not really a limitation.

PCI Connector

The 32-bit PCI connector has 124 pins(62 per side). The pin-outs are arranged so that every signal pin isadjacent to a power or ground rail, which helps to reduceelectromagnetic interference (EMI) by capacitive decoupling. Thecompact size is obtained by multiplexing the 32 address and datalines so

they share the same 32 pins. Thisallows PCI cards to be made short enough to install in portablePCs.

PCI makes provision for both standard5V and low power 3.3V boards. Separate slots are needed for the twotypes of card: these are keyed to prevent cards being inserted inthe wrong type of slot.

Slots for 5V cards have a key towardsthe end furthest from the backplane, while 3.3V slots are keyed at asimilar position at the other end.

The cards themselves have acorresponding keyway in the key position. Universal cards able tooperate at either 3.3V or 5V have keyways in both positions, and somay be installed in either type of slot.

The 64-bit PCI connector has a further64 pins (32 per side) which follow on from the standard 32-bit slotin a similar manner to the IBM AT extension to the original IBM PC8-bit slot. The extension contains mainly another 32 multiplexedaddress and data lines, plus extra power and ground rails.

Signals present on the 32-bit part ofthe connector allow a 64-bit card to be detected and used (albeitwith reduced performance) in a 32-bit slot.

Besides the interlacing of power,ground and signal traces, PCI uses another innovation, reflectedwave switching, to reduce power consumption and the EMI problemassociated with fast, high power digital electronics. Circuit traceson a PCI board are unterminated. This means that a signaltravelling along a trace meets a high impedance at the end, and isconsequently reflected back along the trace instead of

being absorbed. By careful design, thelogic gates are placed at the points where the incident andreflected waves reinforce each other. Because the voltages of thetwo waves add, the logic drivers need only produce a signal of halfthe needed voltage level, which reduces the power needed by asimilar

fraction.

A pair of pins on the bus connectorallow the system to determine the power requirements of theinstalled hardware. Interpreted as two bits they permit a total offour combinations showing that the slot is either empty, orcontains a board with a power consumption of up to 7.5W, 15W or25W.

Host to PCI bridge

The PCI designers used a bridge toconnect the PCI bus to the processor bus. This is the reason why PCIis not a local bus. The advantage is that the bus design can beindependent of that of the processor. To interface a PCI bus to anew type of processor requires only a new bridge chip.

The benefits of this are that systemsusing non-Intel processors can also use the PCI bus and are able totake advantage of peripheral add-ins designed for the PC market. Onthe PCI bus devices are described as initiators or targets.Initiators are devices that can initiate a bus transaction, such asthe host to PCI bridge and intelligent I/O boards called busmasters. Some devices may only be targets: they can speak

only when they are spoken to.

Another key feature of PCI is that alldata transfers are burst transfers. This means that data is sent inchunks of one, two, four or eight bytes (according to the highestcommon capability of the devices and the width of the data bus),one chunk per bus cycle. This is the fastest method of transferringdata, and contrasts with the non-burst modes used in older PC busdesigns, where data is transferred using a sequence of alternatingaddress and data cycles.

The PCI bus design places no limits onthe length of burst transfers. This is a big improvement over theVL-Bus which due to the design of the 486 processor was limited toa maximum burst of 4 cycles. Most real-world data transfers are ofblocks longer than 4 x 32 bits, so the overhead associated with eachburst transfer will affect the overall transfer rate. Typically theVL-Bus could manage only 40 -

50MB/sec. In fact, early PCI systemswere only capable of similar performance due to implementationrestrictions. Later PCI systems were able to deliver sustainedtransfer rates of over 100MB/sec.

A PCI implementation may employ avariety of techniques to improve performance. For example, bridgesmay include a posted-write buffer which allows the a bus master topost memory writes to the bridge at burst speed and not merely thespeed of the target device. To ensure data consistency, the bridgewill not permit a read to take place until all posted writes havebeen flushed to their destination addresses.

The bus may also combine separatememory writes of 8- or 16-bit values into single 32-bit memorytransactions to optimise bus and memory performance. The PCIspecification states that data must be written to the target in theoriginal order, before it was combined. It also recommends thatthis feature, if present, should be capable of being disabled incase it causes problems. I/O writes are not combined in thisfashion.

The integrity of data on the PCI bus ischecked using a single parity bit which protects the 32 address/datalines and four Command/Byte Enable signals. A further parity bitprotects the additional lines of the 64-bit extension wherepresent.

Bus transactions

Let's look at what happens during a PCIdata transfer or bus transaction. First, the initiating device hasto get permission to have control of the bus. This is determinedduring the process of bus arbitration. A function called thearbiter, which is part of the PCI chip set, decides which device isallowed to

initiate a transaction next. Thearbiter uses an algorithm designed to avoid deadlocks and preventone or more devices from monopolising the bus to the exclusion ofothers.

Having gained control of the bus, aninitiator then places the target address and a code representing thetransfer type on the bus. Other PCI devices determine, by decodingthe address and the command type information, whether they are theintended target for the transfer. The target device claims thetransaction by asserting a device select signal.

Once the target has sent itsacknowledgement, the bus transaction enters the data phase. Duringthis phase the data is transferred. The transfer can be terminatedeither by the initiator, when the transfer is completed or when itspermission to use the bus is withdrawn by the arbiter, or by thetarget if it is unable to accept any more data for the time being.If the latter, the transfer must be restarted as a

separate transaction. One of the rulesof PCI protocol is that a target must terminate a transaction andrelease the bus if it is unable to process any more data, so a slowtarget device cannot hog the bus and prevent others from using it.

Note that although all PCI datatransfers are burst transfers, a device does not have to be able toaccept long bursts of data. A target device can terminate the dataphase after one cycle if it wants to. Such behaviour would beperfectly acceptable in a non-performance-critical device. Evenhigh

performance devices may have toterminate a burst, since their data buffers will be of finite sizeand if they cannot process the data as quickly as it is sent thesebuffers will eventually fill up.

Non-PCI devices

This description of a PCI bustransaction assumes that both initiating and target devices are PCIdevices. However, even today most PCs require an ISA or otherexpansion bus in order to be able to install legacy peripherals.This is achieved using a PCI to expansion bus bridge. In thisconfiguration,

the PCI bus is the primary bus, and thelegacy bus is the secondary bus.

If an initiator begins a transactionfor a device that is on a secondary expansion bus, no PCI device willacknowledge that it is the target. One of two things could happennext. The PCI to expansion bus bridge could claim the transactionon behalf of its own peripherals; however, this would require thatthe bridge be programmed with the addresses of all the devices onthe other side of it. This is a possibility in the case of MCA, EISAand plug-and-play ISA boards. However, ordinary ISA boards are notplug-and-play so a PCI to ISA bridge can have no knowledge of whatmemory and I/O addresses are on the ISA bus.

The method normally used to handletransactions destined for an ISA expansion bus is to use a processof subtractive decoding, or "if nobody else wants it, it mustbe for me." The expansion bus bridge claims the transaction ifit is for a memory address in the first 16MB of address space or anI/O port address in the first 64KB, and no PCI device has claimedthe transaction within a set delay period. The delay period dependson the speed of the PCI device address decoders, which can take fromone to three clock cycles to respond with an acknowledgement.

The speed, and hence the length of thedelay, is determined during the power on configuration process. Thepresence of even one slow device will require a delay of four busclock cycles for every ISA bus transfer. This will degrade theperformance of peripherals on the ISA expansion bus.

Bus arbitration

All PCI devices capable of initiating adata transfer are bus masters. This means that they can take controlof the bus to perform a data transfer without requiring theassistance of the CPU. Reducing the need for the CPU to becomeinvolved in transferring large volumes of data has performancebenefits.

Because there is usually more than onebus master in a PCI system, a method of arbitration is needed toresolve conflicts when two or more devices want to transfer data atthe same time. This isn't as easy as it might sound. An arbiter hasto handle all the possible situations that may occur between agroup of communicating devices, as well as ensuring that bus accessis granted fairly.

The main objective of arbitration is toensure that all devices are given access to the bus when they needit. Too long a delay could harm performance or cause otherproblems. Every PCI bus master contains a configuration registerwhich specifies its maximum latency: the time within which the

device should be allowed to transferits data. To reduce bus latency, PCI uses hidden arbitration. This

means that arbitration can take placewhilst another bus transaction is going on, so that the next devicecan begin transferring data the instant the bus is free.

When the arbiter grants a device accessto the bus, the device's GNT# signal is asserted. The device thenstarts monitoring the state of other bus signals (FRAME# and RDY#)to determine when the bus is free. Once the bus is free, andassuming the GNT# signal is still asserted, the device can begin itstransaction.

The arbiter uses the maximum latencyregister to determine priority levels. If a bus master requests thebus after access has already been granted to a device with a highermaximum latency, then as long as the bus is still busy and thefirst device's transaction has not yet started, the arbiter canpreempt the first device and award the GNT# signal to the one thatneeds it more urgently.

Whilst a bus transaction is in progressanother mechanism ensures that bus masters cannot hog the bus andprevent other devices from getting access when they need it. Eachbus master has another configuration register called the latencytimer. This is set to the minimum number of cycles for which thedevice will be guaranteed access to the bus.

The latency timer register isdecremented with every bus cycle. When the arbiter wants to allowanother device to access the bus, it removes the GNT# signal fromthe active device. If the register is zero or less when this occurs,the device knows that it has had its guaranteed minimum period ofbus access, and must complete the current data cycle and immediatelyrelinquish control of the bus. If the register value is positive,the device may continue with its transfer but only until theregister value reaches zero, when it must release the bus for thenext device.

Command types

Most often, we think of bustransactions in terms of blocks of data being transferred from onelocation to another. In fact, there are a number of different typesof information that can be transferred across a bus. On the PCI bus,four signal lines called Command/Byte Enable are used to indicatethe transaction type. Of the 16 possible values, 12 are currentlydefined. (During the data phase, these lines are used to show whichof the bytes on the 32-bit bus contain valid data, hence the 'ByteEnable.')

The I/O Read, I/O Write, Memory Readand Memory Write transactions should need little explanation.However, for memory transfers there are also commands called MemoryRead Line, Memory Read Multiple and Memory Write and Invalidate.These commands convey additional information about how the data tobe transferred relates to that held in cache memory, and so allowthe cache controller to operate more efficiently.

The PCI bus has the capability toaccess memory targets in address space beyond 4GB, even when using a32-bit PCI slot. The Dual-Address Cycle command indicates that a64-bit address is being placed on the bus in two 32-bit halves.

There is a Special Cycle command whichis used to broadcast messages to devices on the PCI bus. In thiscommand, the address has no validity but the first 16 bits of thedata contain a message type and the remaining 16 bits can containmessage-specific data. This command is mainly used to inform devicesthat the system is about to shut down.

The Interrupt Acknowledge command isused by the host to PCI bridge to obtain further information aboutan interrupt request from an interrupting PCI device. In a PCcompatible system, a device requesting an interrupt does so byraising one of the interrupt request lines IRQ0 to IRQ15. The IRQis

converted by the programmable interruptcontroller to a single signal to the processor, INTR. The processorresponds to this signal by requesting the controller to supply aninterrupt vector address: the address in memory of the softwareroutine for handling the interrupt. In an ISA or VL-Bus system thereis a direct connection between the CPU and the interruptcontroller.

On a PCI system, the processor'sinterrupt vector request is passed to the host to PCI bridge. Thebridge responds by obtaining control of the bus and initiating aninterrupt acknowledge transaction. The PCI target containing theinterrupt controller claims the transaction, and sends a signalemulating the interrupt vector request to the interrupt controllerchip. The interrupt vector address is then placed by

the controller on to the data bus. Fromthere it is read by the host to PCI bridge, which then terminatesthe transaction and passes the vector to the processor.

Interrupt handling

The concept of 16 discrete IRQ lines,each uniquely assigned to a device, is peculiar to the ISA bus andits derivatives. The CPU sees only a single interrupt signal,obtains an interrupt vector address and then processes the interruptroutine at that address. The use of 16 lines was the method chosenby the designers of the original IBM PC to tell the interruptcontroller which address to supply.

Each PCI slot has four interrupt linesconnected to it, designated INTA# to INTD#. The first (or only)interrupt-using function on a PCI board must be connected to INTA#.

The other three lines allow up to fourfunctions to be combined on one board using INTA# - INTD# in thatorder. The PCI interrupt lines and the output from the ISAinterrupt controller are combined in a programmable interruptrouter, which generates the single interrupt signal for the CPU.How they are combined is not defined by the PCI specification. PCIinterrupts are edge-triggered and therefore shareable, so some ofthem may be connected together.

The IBM PC architecture expectsparticular devices to use particular IRQs (e.g. the primary diskcontroller must use IRQ14). Furthermore, because ISA interrupt linescannot be shared, PC interrupt routines expect that when they arecalled, they are servicing their specific device and no other.

This means that in a PC, the INTx#lines in each PCI slot – or those that are being used - must eachbe mapped to a separate IRQ which the operating system or driversoftware will expect the device in that slot to use. This is usuallydone using the BIOS Setup utility. Some early PCI systems which didnot have this facility required an ISA 'paddle-board' to be usedwith add-ins like caching disk controllers to ensure they wereconnected to the appropriate IRQ.

Integrated (on-board) devices arehard-configured to use the appropriate interrupts. Were it not forthe fact that specific devices must use specific IRQs, PCIconfiguration would be completely automatic as the interrupt levelcould be assigned by the system at start-up.

Configuration Space

PCI was designed as a plug-and-play,self-configuring system. In support of this it defines an area ofaddressable ROM and RAM called configuration space, which can beinterrogated to obtain information about a device and written to inorder to configure it.

Each PCI device has a block of 256bytes of configuration space: 16 32-bit doublewords of headerinformation plus 48 doublewords of device-specific configurationregisters. The header contains a vendor ID and type of device code,flags which show whether the device generates interrupts, whether

the device is 66MHz-capable and otherlow level performance-related information, the base addresslocations of I/O ports, RAM and expansion ROM, the maximum latencyregister (mentioned earlier) and other similar general information.ROMs can contain code for different processor architectures, and aconfiguration register shows which ones are supported.

Configuration space is completelyseparate from memory and I/O space, and can only be accessed usingthe PCI bus Configuration Read and Write commands. Intel x86processors cannot access configuration space directly, so the PCIspecification defines two methods by which this can be achieved. Thepreferred method, used by current implementations, is to write thetarget address to the 32-bit I/O port at 0CFBh, and then read orwrite the doubleword through I/O port 0CFCh. A second method, usedby early PCI chip sets but now discouraged by the PCIspecification, involves using I/O ports at 0CF8h and 0CFAh to mapthe configuration spaces of up to 16 PCI devices into the I/O

range C000h to CFFFh, from where thedata may be read or written.

The above information was given forinterest only. The correct way for software such as device driversor diagnostic programs to access a device's configuration space isusing the functions provided in the PCI BIOS. The PCI BIOS functioncode is 0B1h. If a program calls interrupt 1Ah with the AX registerset to 0B101h the carry bit will be clear on return if the PCI BIOSis present, and the 32-bit EDX register will contain the ASCIIcharacters " PCI." Register BX will contain the major andminor BIOS revision version. Register AL will be odd (bit 0 set) ifthe system supports the preferred configuration space addressingmechanism.

Using other subfunctions of BIOSfunction 0B1h programs can search for a device and obtain itslocation, find devices of a particular class, read and write toconfiguration space, generate a PCI bus special cycle, discover howPCI interrupts have been assigned to IRQ lines, and set a PCIdevice's interrupt to a particular IRQ. Normally, of course, thesefunctions would only be carried out by system software.