On Sat, 12 Jan 2019 16:02:40 +0800 (GMT+08:00)
徐成华 <firstname.lastname@example.org> wrote:
> > > For Loongson 3A1000 and 3A3000, when a memory access instruction
> > > (load, store, or prefetch)'s executing occurs between the
> > > execution of LL and SC, the success or failure of SC is not
> > > predictable. Although programmer would not insert memory access
> > > instructions between LL and SC, the memory instructions before LL
> > > in program-order, may dynamically executed between the execution
> > > of LL/SC, so a memory fence(SYNC) is needed before LL/LLD to
> > > avoid this situation.
> > >
> > > Since 3A3000, we improved our hardware design to handle this case.
> > > But we later deduce a rarely circumstance that some speculatively
> > > executed memory instructions due to branch misprediction between
> > > LL/SC still fall into the above case, so a memory fence(SYNC) at
> > > branch-target(if its target is not between LL/SC) is needed for
> > > 3A1000 and 3A3000.
> > Thank you - that description is really helpful.
> > I have a few follow-up questions if you don't mind:
> > 1) Is it correct to say that the only consequence of the bug is
> > that an SC might fail when it ought to have succeeded?
here is an example:
both cpu1 and cpu2 simutaneously run atomic_add by 1 on same
variable, this bug cause both sc run by two cpus (in atomic_add)
succeed at same time( sc return 1), and the variable is only added by 1,
which is wrong and unacceptable.( it should be added by 2)
I think sc do it wrong, instead of failing to to it;
> Unfortunately, the SC succeeded when it should fail that cause a
> functional error.
> > 2) Does that mean placing a sync before the LL is purely a
> > performance optimization? ie. if we don't have the sync & the SC
> > fails then we'll retry the LL/SC anyway, and this time not have the
> > reordered instruction from before the LL to cause a problem.
> It's functional bug not performance bug.
> > 3) In the speculative execution case would it also work to place a
> > sync before the branch instruction, instead of at the branch
> > target? In some cases this might be nicer since the workaround
> > would be contained within the LL/SC loop, but I guess it could
> > potentially add more overhead if the branch is conditional & not
> > taken.
> Yes, it more overhead so we don't use that.
> > 4) When we talk about branches here, is it really just branch
> > instructions that are affected or will the CPU speculate past
> > jump instructions too?
> No, bug only expose when real program-order is still ll/sc,
> unconditional branch or jump is not really ll/sc, so it not affected.
> > I just want to be sure that we work around this properly, and
> > document it in the kernel so that it's clear to developers why the
> > workaround exists & how to avoid introducing bugs for these CPUs in
> > future.
> > > Our processor is continually evolving and we aim to to remove all
> > > these workaround-SYNCs around LL/SC for new-come processor.
> > I'm very glad to hear that :)
> > I hope one day I can get my hands on a nice Loongson laptop to test
> > with.
> We can ship one to you as a gift when the laptop is stable.
> > Thanks,
> > Paul