PCI Subsystem

From LinuxMIPS
Revision as of 22:45, 8 November 2004 by Ralf (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Board support - PCI subsystem

The PCI subsystem is perhaps the most complex code you have to deal with during the porting process. The key to making the PCI subsystem work properly is a good understanding of the PCI bus itself, the code layout, and the execution flow in Linux. Like many other parts of porting, you will find in the end, the actual code writen is minimumal.


Pete Popov wrote a fine document on PCI and Linux/MIPS. It is under 'Documentation/mips/pci/pci.README'. It is highly recommended reading.

For those who want to know more about the PCI bus itself, I recommend the book PCI System Architecture published by MindShare Inc.

Overview of the PCI bus

Here we summarize some facts of the PCI bus:

  • The PCI bus has three separate address spaces, config, I/O, and memory space.
  • Every PCI device responds to config commands, and it can respond to I/O accesses and/or memory accesses.
  • During the boot time, the BIOS or the OS sets the base address registers (BARs) through configuration space. BARs determine address ranges in I/O or memory space that a device should respond to. Obviously, those ranges should not be duplicated anywhere else in the I/O space or memory space on the same PCI bus.
  • Multiple PCI buses can be connected through PCI-PCI bridges.

BAR assignment in Linux

On all IBM PC-compatible machines, BARs are assigned by the BIOS. Linux simply scans through the buses and records the BAR values.

Some MIPS boards adopt similar approaches, where BARs are assigned by firmware. However, the quality of BAR assignment by firmware vary quite a bit. Some firmware simply assigns BARs to on-board PCI devices and ignore all add-on PCI cards. In that case, Linux cannot solely rely on the firmware's assignment.

There is another issue of depending on the firmware assignment. You need to stick with the address range setup by the firmware. In other words, if the firmware assigns PCI memory space from 0x10000000 to 0x14000000, you cannot easily move it to a different address space somewhere else in Linux.

There three ways to possibly fix this:

  • The first way is to fix the BAR assignment manually in your board setup routine. This only works if your board does not have a PCI slot to plug in an arbitrary PCI card. You need to carefully examine the existing PCI resource assignment done by firmware so that you do not assign overlapping address ranges.
  • The second way is to do a complete PCI resource assignment *before* Linux starts PCI bus scanning. In other words, we discard any PCI resource assignment done in firmware, if there is any, and do a new assignment by ourselves. This approach gives us complete control over the address range and resource allocation. With the CONFIG_PCI_AUTO option used in 'arch/mips/config-shared.in' and 'arch/mips/kernel/pci_auto.c' file, it turns out to be quite easy to do. This approach is the focus of this chapter.
  • Another approach is to call the 'pci_assign_unassigned_resources()' function, which is defined in the 'drivers/pci/setup-bus.c' file in recent 2.4.x kernels *after* Linux completes the PCI bus scan. With earlier versions of this function, Linux will assign resources to PCI devices whose BAR have *not* been properly assigned. With the recent versions (that have "optimal" resource assignment based on sizes), this PCI routine apparently does a complete resource re-assignment. In other words, it does almost exactly the same as what the 'pci_auto.c' file does.

[DEBATE] The 'pci_auto' and 'assign_unassigned_resource' approaches have their own advantages and disadvantages. Ideally, the whole PCI subsystem should be completely re-written so that several things can be taken into consideration which are not currently addressed:

  • A notion of a host-PCI controller, and supporting multiple host-PCI controllers.
  • A kernel-independent abstraction layer to access configuration space, distinguishing Type 0 and Type 1 configuration accesses. This removes the need for 'pci_dev' in the lowest-level PCI routines.
  • Pass 1 scan to do bus number assignment and record bus topology through the top-level host-PCI controller structure.
  • Pass 2 scan to discover all other PCI devices and assign the resources along the way.
  • If we want to do optimal resource assignment, we need to do the resource assignment in Pass 3 instead of Pass 2.
  • A complete PCI device list is built during the above passes.
  • The address range for PCI memory space and I/O space are set when host-PCI controller structures are initially created.

PCI startup sequence

1. do_basic_setup() calls pci_init(), which is defined in 'drivers/pci/pci.c'.
2. pci_init() first calls pcibios_init(). If you enable CONFIG_NEW_PCI,
   pcibios_init() is implemented in the 'arch/mips/kernel/pci.c' file.
   Otherwise you need to provide it in your own board-dependent code.
3. Optionally, pcibios_init() may call pciauto_assign_resources() to do a
   complete PCI resource assignment.
4. Somewhere inside pcibios_init(), pci_scan_bus() is called. If a machine has
   multiple host-PCI controllers, pci_scan_bus() should be called for each of
   the top-level PCI buses. Apparently, bus numbers should have already been
   setup before pci_scan_bus() can properly run.
5. Optionally, after pci_scan_bus() is called, pcibios_init() may choose to
   call pci_assign_unassigned_resources() to do a complete PCI resource
6. pcibios_init() will do some more fixups (resources, IRQs, etc.).
7. Returning from pcibios_init(), pci_init() will do a final round of
   device-based fixups.

In this chapter, we focus on the approach where both CONFIG_NEW_PCI and CONFIG_PCI_AUTO are enabled.

PCI driver interface

All of this work for PCI is to eventually setup a structure where all PCI device drivers can run happily. Knowing how PCI device drivers access PCI resources can greatly help you understand how you should do PCI initialization and setup.

  • inb()/outb()/inw()/outw()/inl()/outl()

Defined in 'include/asm-mips/io.h'. Drivers use these macros to read/write into PCI IO space. The address arguments are addresses in PCI IO space, and should correspond to one of the BAR values of the device. On MIPS machines, we assume PCI IO space is mapped into a contiguous physical address block. The base address of the block is mips_io_port_base, which you need to set up at the beginning of board setup time. The proper way to set it up is call set_io_port_base().

  • readb()/writeb()/readw()/writew()/readl()/writel()

Defined in the 'include/asm-mips/io.h' file. Drivers use them to access PCI memory space. On MIPS machines, we assume PCI memory space is 1:1 mapped into a block of physical address. Therefore those macros are equivalent to direct physical memory access.

  • pci_read_config_word() and friends Defined in 'include/linux.pci.h' (through PCI_OP()). Drivers use them to read or write the configuration registers of devices. Low-level routines are abstracted as struct pci_ops, where each board must supply one.
  • pci_map_single() and friends Defined in 'include/asm-mips/pci.h'. Drivers use these macros to map a virtual address to a bus address (so you can tell the device to do DMA). They don't usually affect PCI porting.

Setting up the host-PCI controller

As we can see from the above discussion, we need to set up the host-PCI controller such that:

1. It has a 1:1 mapping between PCI memory space and CPU physical address
2. It maps the beginning part of PCI IO space into a address block in physical
   address space.

The Host-PCI controller usually allows you to map PCI memory space into a window in physical address space. Let us say the base address is ba_mem and size is sz_mem. It usually allows you to translate that address into another one with a fixed offset, say off. (Note off can be both positive or negative). So if a driver accesses the address ba_mem+x (0 <= x < sz_mem), the host-PCI controller will intercept the command and translate it into a PCI memory access at address ba_mem+off+x.

To maintain 1:1 mapping, it implies we must set up the PCI addressing such that off is 0. Also note that with this setup, we cannot access the PCI memory range [0x0, ba_mem] and [ba_mem + sz_mem, 0xffffffff].

Additionaly, we must also make system RAM visible on the PCI memory bus at address 0x0 (assuming that is the address in physical address space) in order for PCI devices to do DMA transfers.

The beginning part of PCI IO space is usually mapped into another window in physical address space, say [ba_io, ba_io + sz_io]. In other words, a range [0, sz_io] in PCI IO space corresponds to the range [ba_io, ba_io + sz_io]. Obviously, mips_io_port_base should be set to ba_io.

The above setup is typically done in the board-specific setup routine (i.e., <board>_setup()). You typically also setup ioport_resource and iomem_resource as well:

           ioport_resource.start = 0x0;
           ioport_resource.end = sz_io;
           iomem_resource.start = 0x0;
           iomem_resource.end = sz_mem;

These variables are the roots of all IO and memory resources (roughly corresponding to the ancient ISA IO space and ISA memory space). For simplicity you can also set the end to be 0xffffffff.

Board-specific functions and variables

Here is a list of board-specific functions you must implement. Again, I assume this board has CONFIG_NEW_PCI and CONFIG_PCI_AUTO options enabled.

  • struct pci_ops my_pci_ops

You implement six functions to fill into this structure, which is needed by pci_scan_bus() and pciauto_assign_resources(). Note that you need to dintinguish type 0 or type 1 configuration in those functions. You can typically check that by checking whether the bus's parent (dev->bus->parent) is NULL.

  • mips_pci_channels[]

You need to define this array in order to use pciauto resource assignment. For each top-level PCI bus, you need to supply an element data to this array. The array ends with an all-NULL element. Each element is a structure that consists a pci_ops, pci_io_resource, and pci_mem_resource, which usually represents a top-level PCI bus connected to CPU. pci_ops defines the functions access the PCI bus's config space. pci_io_resource and pci_mem_resource specifies the address range that pciauto will use to assign to the BARs of PCI devices. For pci_io_resource, it starts with 0x0 (or 0x1000 to leave some room for legacy ISA devices), and it ends at sz_io. For pci_mem_resource, it starts at ba_mem and ends at ba_mem + sz_mem. Note these addresses are in PCI IO and PCI memory space respectively. However, since we maintain 1:1 mapping between PCI memory space and CPU physical address space, pci_mem_resource also represents the PCI memory space window in CPU physical address space.

  • pcibios_fixup_resources()

This routine is passed to pci_scan_bus() and invoked right after when the PCI device is discovered. This is place where you can do some device-specific fixup (BARs, pci_dev structure, etc). Note you can do the same fixup in pcibios_fixup(). I recommand leave this function empty unless you have some specific need that requires immediate fixup.

  • pcibios_fixup()

This function is invoked after pci_scan_bus() is done, i.e., all PCI bridges and devices are discovered by Linux. Here you can enumerate through PCI devices and do device based fixup. Or you can do bus or contoller related fixups.

  • pcibios_fixup_irqs()

This is the place to fix up PCI related IRQs. It is invoked after pci_scan_bus() is done and next to pcibios_fixup(). A typical strategy is to assign irq based on the slot number and possibly bus number if there are more than top-level buses in the system. Note that you can do the fixup in pcibios_fixup() as well and leave this function empty.

  • pcibios_assign_all_busses()

Return 1 to indicate that all bus numbers have been assigned by pciauto. [TODO:Should this function be included in pciauto by default?] [TODO: the driver/pci/pci.c file seems to have a typo when it calls this function. The logic is inversed.]

Tips for writting PCI code

  • Some ill-behaviored PCI device may spoil the party. The best way to deal with them is to signle them out in the PCI configuration routines (in pci_ops). Inside those routines you check for the slot number and function number corresponding to the bad devices and return some NULL numbers.
  • If you have mulitple top-level PCI buses, it is tricky to do PCI IO assignment. Assume you have two PCI buses and their IO spaces are mapped into [ba_io1, ba_io1 + sz_io1] and [ba_io2, ba_io2+sz_io2] (ba_io1 + sz_io1 <= ba_io2). You need to
    1. Set mips_io_port_base to be ba_io1
    2. Set pci_io_resource to be [0x0, sz_io1] for the first PCI bus
    3. Set pci_io_resource to be [ba_io2 - ba_io1, ba_io2 - ba_io1 + sz_io2]
       for the 2nd PCI bus. In addition, the legacy ISA devices on the second
       PCI bus cannot be used without modifying their drivers.

Next page: Debugging