Volatile is evil
The "+m" constraint is basically undocumented, and while I think it has existed internally in gcc for a long time (gcc inline asms are actually fairly closely related to the internal gcc code generation templates, but the internal templates generally have capabilities that the inline asms don't have), I don't think it has been actually supported for inline asms all that time.
But "+m" means "I will both read and write from this memory location", which is exactly what we want to have for things like atomic_add() and friends.
However, because it is badly documented, and because it didn't even exist long ago, we have lots of code (and lots of people) that doesn't even know about "+m". So we have code that fakes out the "both read and write" part by marking things either volatile, or doing
- :"=m" (*ptr)
- :"m" (*ptr)
in the constraints (or, in many cases, both).
So we tend to have "volatile" for a couple of different reasons:
- the above kind of "we couldn't tell the inline assembly well enough what the instruction actually does, so we just tell gcc to not mess with it".
- This generally should be replaced with using "+m", so that gcc just knows that we both read and write to the location, and that allows gcc to generate the best possible code, while still generating correct code because gcc knows what is going on and doesn't think the write totally clobbers the old value.
- many functions are used with random data, and if the caller has a "volatile long array" (bad - but hey, it happens), then a function that acts on that array, like the bitops functions, need to take a an argument like "volatile long *".
- So for example, "test_bit()", which can act both on volatile arrays and on const arrays, will look like
- int test_bit(int nr, const volatile void * addr);
- which some people think is strange ("Both 'const' and 'volatile'? Isn't that a contradiction in terms?"), but the fact is, it reflects the different callers, not necessarily test_bit() itself.
- some functions actually really want the volatile access. The x86 IO functions are the best actual example of this:
- static inline unsigned int readl(const volatile void __iomem *addr)
- return *(volatile unsigned int __force *) addr;
- which actually has a combination of the above reason (the incoming argument is already marked "volatile" just because the caller may have marked it that way) and the cast to volatile would be there regardless of the calling convention "volatile", because in this case we actually use it as a way to avoid inline assembly (which a number of other architectures need to do, and which x86 too needs to do for the PIO accesses, but we can avoid it in this case)
So those are all real reasons to use volatile, although the first one is obviously a case where we no longer should (but at least we have reasonably good historical reasons for why we did).
The thing to note that is all of the above reasons are basically "volatile" markers on the code. We haven't really marked any data volatile, we're just saying that certain code will want to act on the data in a certain way.
And I think that's generally a sign of "good" use of volatile - it's making it obvious that certain specific use of a data may have certain rules.
As mentioned, there is one case where it is valid to use "volatile" on real data: it's when you have a "I don't care about locking, and I'm not accessing IO memory or something else that may need special care" situation. In the kernel, that one special case used to basically be the "jiffies" counter. There's nothing to "lock" there - it just keeps ticking. And it's still obviously normal memory, so there's no question about any special rules for accesses. And there are no SMP memory ordering issues for reading it (for the low bits), since the "jiffies" counter is not really tied to anything else, so there are no larger "coherency" rules either. So in that ONE case, "volatile" is actually fine. We really don't care if we read the old value or the new value when it changes, and there's no reason to try to synchronize with anythign else.
There may be some other cases where that would be true, but quite frankly, I can't think of any. If the CPU were to have a built-in random number generator mapped into memory, that would fall under the same kind of rules, but that's basically it.
One final word: in user space, because of how signal handlers work, "volatile" can still make sense for exactly the same reasons that "jiffies" makes sense in the kernel. You may, for example, have a signal handler that updates some flag in memory, and that would basically look exactly like the "jiffies" case for your program.
(In fact, because signals are very expensive to block, you may have more of a reason to use a "jiffies" like flag in user space than you have in kernel. In the kernel, you'd tend to use a spinlock to protect things. In user space, with signals, you may have to use some non-locking algorithm, where the generation count etc migth well look like "jiffies").
- The original posting of Linus Torvalds