- 1 PCI subsystem
- 2 Overview of the PCI bus
- 3 BAR assignment in Linux
- 4 PCI startup sequence
- 5 PCI driver interface
- 6 Setting up the host-PCI controller
- 7 Board-specific functions and variables
- 8 Very broken PCI devices ...
- 9 PCI fixups
- 10 Legacy devices
- 11 Multiple PCI busses
- 12 External links
The PCI subsystem is perhaps the most complex code you have to deal with during the porting process. The key to making the PCI subsystem work properly is a good understanding of the PCI bus itself, the code layout, and the execution flow in Linux. Like many other parts of porting, you will find in the end, the actual code writen is minimumal.
Overview of the PCI bus
Here we summarize some facts of the PCI bus:
- The PCI bus has three separate address spaces, config, I/O, and memory space.
- Every PCI device responds to config commands, and it can respond to I/O accesses and/or memory accesses.
- During the boot time, the BIOS or the OS sets the base address registers (BARs) through configuration space. BARs determine address ranges in I/O or memory space that a device should respond to. Obviously, those ranges should not be duplicated anywhere else in the I/O space or memory space on the same PCI bus.
- Multiple PCI buses can be connected through PCI-PCI bridges.
BAR assignment in Linux
On all IBM PC-compatible machines, BARs are assigned by the BIOS. Linux simply scans through the buses and records the BAR values.
Some MIPS boards adopt similar approaches, where BARs are assigned by firmware. However, the quality of BAR assignment by firmware vary quite a bit. Some firmware simply assigns BARs to on-board PCI devices and ignore all add-on PCI cards. In that case, Linux cannot solely rely on the firmware's assignment.
There is another issue of depending on the firmware assignment. You need to stick with the address range setup by the firmware. In other words, if the firmware assigns PCI memory space from 0x10000000 to 0x14000000, you cannot easily move it to a different address space somewhere else in Linux.
There three ways to possibly fix this:
- The first way is to fix the BAR assignment manually in your board setup routine. This only works if your board does not have a PCI slot to plug in an arbitrary PCI card. You need to carefully examine the existing PCI resource assignment done by firmware so that you do not assign overlapping address ranges.
- The second way is to do a complete PCI resource assignment *before* Linux starts PCI bus scanning. In other words, we discard any PCI resource assignment done in firmware, if there is any, and do a new assignment by ourselves. This approach gives us complete control over the address range and resource allocation. With the CONFIG_PCI_AUTO option used in 'arch/mips/config-shared.in' and 'arch/mips/kernel/pci_auto.c' file, it turns out to be quite easy to do. This approach is the focus of this chapter.
- Another approach is to call the 'pci_assign_unassigned_resources()' function, which is defined in the 'drivers/pci/setup-bus.c' file in recent 2.4.x kernels *after* Linux completes the PCI bus scan. With earlier versions of this function, Linux will assign resources to PCI devices whose BAR have *not* been properly assigned. With the recent versions (that have "optimal" resource assignment based on sizes), this PCI routine apparently does a complete resource re-assignment. In other words, it does almost exactly the same as what the 'pci_auto.c' file does.
[DEBATE] The 'pci_auto' and 'assign_unassigned_resource' approaches have their own advantages and disadvantages. Ideally, the whole PCI subsystem should be completely re-written so that several things can be taken into consideration which are not currently addressed:
- A notion of a host-PCI controller, and supporting multiple host-PCI controllers.
- A kernel-independent abstraction layer to access configuration space, distinguishing Type 0 and Type 1 configuration accesses. This removes the need for 'pci_dev' in the lowest-level PCI routines.
- Pass 1 scan to do bus number assignment and record bus topology through the top-level host-PCI controller structure.
- Pass 2 scan to discover all other PCI devices and assign the resources along the way.
- If we want to do optimal resource assignment, we need to do the resource assignment in Pass 3 instead of Pass 2.
- A complete PCI device list is built during the above passes.
- The address range for PCI memory space and I/O space are set when host-PCI controller structures are initially created.
PCI startup sequence
1. do_basic_setup() calls pci_init(), which is defined in 'drivers/pci/pci.c'. 2. pci_init() first calls pcibios_init(). If you enable CONFIG_NEW_PCI, pcibios_init() is implemented in the 'arch/mips/kernel/pci.c' file. Otherwise you need to provide it in your own board-dependent code. 3. Optionally, pcibios_init() may call pciauto_assign_resources() to do a complete PCI resource assignment. 4. Somewhere inside pcibios_init(), pci_scan_bus() is called. If a machine has multiple host-PCI controllers, pci_scan_bus() should be called for each of the top-level PCI buses. Apparently, bus numbers should have already been setup before pci_scan_bus() can properly run. 5. Optionally, after pci_scan_bus() is called, pcibios_init() may choose to call pci_assign_unassigned_resources() to do a complete PCI resource assignment. 6. pcibios_init() will do some more fixups (resources, IRQs, etc.). 7. Returning from pcibios_init(), pci_init() will do a final round of device-based fixups.
In this chapter, we focus on the approach where both CONFIG_NEW_PCI and CONFIG_PCI_AUTO are enabled.
PCI driver interface
All of this work for PCI is to eventually setup a structure where all PCI device drivers can run happily. Knowing how PCI device drivers access PCI resources can greatly help you understand how you should do PCI initialization and setup.
Defined in 'include/asm-mips/io.h'. Drivers use these macros to read/write into PCI IO space. The address arguments are addresses in PCI IO space, and should correspond to one of the BAR values of the device. On MIPS machines, we assume PCI IO space is mapped into a contiguous physical address block. The base address of the block is mips_io_port_base, which you need to set up at the beginning of board setup time. The proper way to set it up is call set_io_port_base().
Defined in the 'include/asm-mips/io.h' file. Drivers use them to access PCI memory space. On MIPS machines, we assume PCI memory space is 1:1 mapped into a block of physical address. Therefore those macros are equivalent to direct physical memory access.
- pci_read_config_word() and friends Defined in 'include/linux.pci.h' (through PCI_OP()). Drivers use them to read or write the configuration registers of devices. Low-level routines are abstracted as struct pci_ops, where each board must supply one.
- pci_map_single() and friends Defined in 'include/asm-mips/pci.h'. Drivers use these macros to map a virtual address to a bus address (so you can tell the device to do DMA). They don't usually affect PCI porting.
Setting up the host-PCI controller
As we can see from the above discussion, we need to set up the host-PCI controller such that:
1. It has a 1:1 mapping between PCI memory space and CPU physical address space 2. It maps the beginning part of PCI IO space into a address block in physical address space.
The Host-PCI controller usually allows you to map PCI memory space into a window in physical address space. Let us say the base address is ba_mem and size is sz_mem. It usually allows you to translate that address into another one with a fixed offset, say off. (Note off can be both positive or negative). So if a driver accesses the address ba_mem+x (0 <= x < sz_mem), the host-PCI controller will intercept the command and translate it into a PCI memory access at address ba_mem+off+x.
To maintain 1:1 mapping, it implies we must set up the PCI addressing such that off is 0. Also note that with this setup, we cannot access the PCI memory range [0x0, ba_mem] and [ba_mem + sz_mem, 0xffffffff].
Additionaly, we must also make system RAM visible on the PCI memory bus at address 0x0 (assuming that is the address in physical address space) in order for PCI devices to do DMA transfers.
The beginning part of PCI IO space is usually mapped into another window in physical address space, say [ba_io, ba_io + sz_io]. In other words, a range [0, sz_io] in PCI IO space corresponds to the range [ba_io, ba_io + sz_io]. Obviously, mips_io_port_base should be set to ba_io.
The above setup is typically done in the board-specific setup routine (i.e., <board>_setup()). You typically also setup ioport_resource and iomem_resource as well:
ioport_resource.start = 0x0; ioport_resource.end = sz_io; iomem_resource.start = 0x0; iomem_resource.end = sz_mem;
These variables are the roots of all IO and memory resources (roughly corresponding to the ancient ISA IO space and ISA memory space). For simplicity you can also set the end to be 0xffffffff.
Board-specific functions and variables
Here is a list of board-specific functions you must implement. Again, I assume this board has CONFIG_NEW_PCI and CONFIG_PCI_AUTO options enabled.
- struct pci_ops my_pci_ops
You implement six functions to fill into this structure, which is needed by pci_scan_bus() and pciauto_assign_resources(). Note that you need to dintinguish type 0 or type 1 configuration in those functions. You can typically check that by checking whether the bus's parent (dev->bus->parent) is NULL.
You need to define this array in order to use pciauto resource assignment. For each top-level PCI bus, you need to supply an element data to this array. The array ends with an all-NULL element. Each element is a structure that consists a pci_ops, pci_io_resource, and pci_mem_resource, which usually represents a top-level PCI bus connected to CPU. pci_ops defines the functions access the PCI bus's config space. pci_io_resource and pci_mem_resource specifies the address range that pciauto will use to assign to the BARs of PCI devices. For pci_io_resource, it starts with 0x0 (or 0x1000 to leave some room for legacy ISA devices), and it ends at sz_io. For pci_mem_resource, it starts at ba_mem and ends at ba_mem + sz_mem. Note these addresses are in PCI IO and PCI memory space respectively. However, since we maintain 1:1 mapping between PCI memory space and CPU physical address space, pci_mem_resource also represents the PCI memory space window in CPU physical address space.
This routine is passed to pci_scan_bus() and invoked right after when the PCI device is discovered. This is place where you can do some device-specific fixup (BARs, pci_dev structure, etc). Note you can do the same fixup in pcibios_fixup(). I recommand leave this function empty unless you have some specific need that requires immediate fixup.
This function is invoked after pci_scan_bus() is done, i.e., all PCI bridges and devices are discovered by Linux. Here you can enumerate through PCI devices and do device based fixup. Or you can do bus or contoller related fixups.
This is the place to fix up PCI related IRQs. It is invoked after pci_scan_bus() is done and next to pcibios_fixup(). A typical strategy is to assign irq based on the slot number and possibly bus number if there are more than top-level buses in the system. Note that you can do the fixup in pcibios_fixup() as well and leave this function empty.
Return 1 to indicate that all bus numbers have been assigned by pciauto. [TODO:Should this function be included in pciauto by default?] [TODO: the driver/pci/pci.c file seems to have a typo when it calls this function. The logic is inversed.]
Very broken PCI devices ...
Some ill-behaviored PCI device may spoil the party. The best way to deal with them is to signle them out in the PCI configuration routines (in pci_ops). Inside those routines you check for the slot number and function number corresponding to the bad devices and return some NULL numbers. Very few devices are broken to the point where this is needed; the SGI IOC3 is one because it doesn't fully decode the PCI configspace in violation of the PCI spec.
For less severe cases Linux offers something called fixups. Fixups are meant to deal with issues like broken PCI config headers and similar. There are three types of fixups.
Header fixups are called after the PCI code has read the headers from the PCI device's config space. This type of fixup can be used for example when headers don't identify the device class or if a device is likely to be badly missconfigured after reset. Be careful, don't touch anything else than the config space at this time; the PCI bus is still being configured!
These are called just before device drivers are being initialized. So why not simply putting those fixups into the driver code you're asking? Remember drivers in Linux can be modules and even if they're compiled in they're treated pretty much the same as a module. So a final fixup is for exampe a good method to ensure a device is a good citizen on a bus.
These are being called by the PCI code when the PCI device is being enabled by calling
drivers/pci/quirks.c contains a large number of examples for the use of PCI fixups. This file only contains fixups for devices that are being used on multiple architectures. If a device is only in use on a single architecture or even on a single board then other places outside drivers/pci might be preferable.
If you have multiple primary PCI busses make sure to have the PCI bus with the legacy devices as the first bus. Linux isn't very good at dealing with legacy devices on multiple PCI busses, to say the least.
Multiple PCI busses
If you have multiple top-level PCI buses, it is tricky to do PCI IO assignment. Assume you have two PCI buses and their IO spaces are mapped into [ba_io1, ba_io1 + sz_io1] and [ba_io2, ba_io2+sz_io2] (ba_io1 + sz_io1 <= ba_io2). You need to
- Set mips_io_port_base to be ba_io1
- Set pci_io_resource to be [0x0, sz_io1] for the first PCI bus
- Set pci_io_resource to be [ba_io2 - ba_io1, ba_io2 - ba_io1 + sz_io2] for the 2nd PCI bus. In addition, the legacy ISA devices on the second PCI bus cannot be used without modifying their drivers.
Next page: Debugging