linux-mips
[Top] [All Lists]

RE: "exportfs -a" -> stale NFS filehandle

To: "Ralf Baechle" <ralf@linux-mips.org>
Subject: RE: "exportfs -a" -> stale NFS filehandle
From: "Kaz Kylheku" <kaz@zeugmasystems.com>
Date: Mon, 19 Nov 2007 14:26:24 -0800
Cc: <linux-mips@linux-mips.org>
In-reply-to: <DDFD17CC94A9BD49A82147DDF7D545C54DC8F6@exchange.ZeugmaSystems.local>
Original-recipient: rfc822;linux-mips@linux-mips.org
Sender: linux-mips-bounce@linux-mips.org
Thread-index: AcgnwCCtCdy5qIVVTMaI/zfLoW8HSAAAB8CwAM5jgNA=
Thread-topic: "exportfs -a" -> stale NFS filehandle
Last week, I wrote:
> Ralf Baechle wrote:
>> On Thu, Nov 15, 2007 at 11:26:06AM -0800, Kaz Kylheku wrote:
>> 
>>> After backing out the nfsutils patch, the diskless node does boot.
>>> 
>>> However, the original "exportfs -a" problem comes back!
>>> 
>>> So this problem is not resolved simply by using the correct compat
>>> routine; it's deeper. 
>>> 
>>> Sigh.
>> 
>> Thanks for testing anyway!
> 
> I'm continuing to dig into the problem.
> 
> The export logic doesn't even go through nfsctl() anyway,
> which is why I
> originally hadn't even suspected that syscall.
> 
> The nfsexport() function in nfsutils first tries opening
> "/proc/net/rpc/nfsd.fh./channel". If that works, it uses that, via a
> text-based protocol. Only if that interface doesn't exist does it fall
> back on the nfsctl(NFSCTL_EXPORT, ...) interface.

Basically, the export table is being mismanaged. Simply restarting NFS
(service nfs restart) will cause this problem to appear.

When the system is first booted up and NFS is started in runlevel 3 by
the nfs init script, the exportfs command correctly populates the export
table based on the /etc/exports file.

However, after that, further management of the export table fails. Doing
an "exportfs -a" clears it out. You can see the table in
/proc/net/rpc/nfsd.export/content. Before the operation, the table has
valid entries. After the operation, it simply clears out and stays
empty. 

This is in spite of the fact that the exportfs command seems to be doing
exactly what it did the first time when NFS was successfully started
(i.e. it's a kernel problem; user space is doing the same thing that
worked before).

I verified that by turning on various additional tracing with sysctl
(sunrpc.nfsd_debug), and I added some extra traces to the function that
adds exports (svc_export_parse) to view the messages that are coming
down the nfsd.fh/channel pipe in /proc.

So the summary is that this problem appears to be some kind of
corruption of the RPC cache for exports.

I did see the kernel crash with an alignment exception once upon
reproducing the problem, but haven't been able to repro that.

<Prev in Thread] Current Thread [Next in Thread>