Jan-Benedict Glaw wrote:
> >
> > So can somebody tell me what the heck just happened? After the ext3
> > recovery done before the mount,
> > .autofsck is still on the disk, so the rc.sysinit script of course assumes
> > the shutdown was unclean,
>
> This ".autofsck" file seems to be a userland approach to detect a system
> which wasn't shutted down completely. Even this is fine. What's *not*
> okay is that there are still errors remaining. It seems your filesystem
> has been damaged before (and in no means which could have been handled
> by the journal).
>
> > and pops the 5-second question. However, if I to be safe push "Y" here to
> > get my filesystem check (which
> > I guess should be unnecessary, due to the ext3 recovery just run, right?),
> > strange things happen and
> > fsck reports the "corrupted orphan list... " error.
>
> Wrong. The journal should prevent you from actually loosing things at
> hard-power-off situations. It does *not* cover things like silent data
> corruption, which may have lead to this breakage.
>
> > Is there something wrong here, or how should the system behave?
>
> Everything with journal recovery is fine here. The failing fsck is a
> different problem (a journal doesn't preven you to do a fsck at a
> regular basis. It's only to not be forced to to it if you don't have the
> time to do this *now* (on crash)).
>
> So there seems do be some corruption (caused by whatever) going on at
> your system:-(
>
> Watch out if this happens again soon after you've completed the fsck.
>
I can reproduce this anytime by just pushing the reset button and checking the
filesystem
at reboot after ext3 recovery has run. However, if I just do regular fsck's
(without unclean
shutdowns) nothing seems to be wrong. So I am pretty sure it is something which
goes wrong in conjunction with the unclean shutdowns.
Is ext3 journal recovery really supposed to recover everything to a state where
fsck returns no
errors, or is it potentially leaving non-fatal errors in the filesystem (e.g.
lost inodes which just
reduces capacity, but does not cause further corruption if the filesystem is
used) which will then
be picked up by a later fsck when one has time to run it?
What does the error "Inodes that were part of a corrupted orphan linked list
found." actually
mean? Is this a fatal error, or a non-critical error along the lines I
described above (an error
which does not get any worse if the filesystem is used)?
Is there anybody with ext3 up and running who would volunteer to do a couple of
unclean
shutdowns and see if the recovery works without any fsck errors present
afterwards?
/Hartvig
|