NetBSD Problem Report #52147
From mlelstv@hoppa.1st.de Sun Apr 9 10:23:10 2017
Return-Path: <mlelstv@hoppa.1st.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 794777A1C0
for <gnats-bugs@gnats.NetBSD.org>; Sun, 9 Apr 2017 10:23:10 +0000 (UTC)
Message-Id: <20170409102247.70CE09A@hoppa.1st.de>
Date: Sun, 9 Apr 2017 12:22:47 +0200 (CEST)
From: mlelstv@serpens.de
Reply-To: mlelstv@serpens.de
To: gnats-bugs@NetBSD.org
Subject: deadlock when booting from USB disk
X-Send-Pr-Version: 3.95
>Number: 52147
>Category: kern
>Synopsis: deadlock when booting from USB disk
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Apr 09 10:25:00 +0000 2017
>Closed-Date: Sat May 13 20:45:04 +0000 2017
>Last-Modified: Sat May 13 20:45:04 +0000 2017
>Originator: Michael van Elst
>Release: NetBSD 7.99.67
>Organization:
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
>Environment:
System: NetBSD hoppa 7.99.67 NetBSD 7.99.67 (HOPPA) #6: Sun Apr 9 02:46:58 CEST 2017 mlelstv@gossam:/home/netbsd-current/obj.evbarm/home/netbsd-current/src/sys/arch/evbarm/compile/HOPPA evbarm
Architecture: earmv6hf
Machine: evbarm
>Description:
The latest changes to sd(4) to support FUA/DPO have a funny side effect
when booting from USB disk (on a modular kernel).
The requested settings trigger a SCSI error, since USB disks rarely support
these commands.
The scsipi layer tries to write error messages to the console using the
scsiverbose module.
On the first message, the scsiverbose module needs to be loaded from
the same disk that is currently in error processing.
-> instant deadlock.
>How-To-Repeat:
Boot a system, in this case an RPI, with root on a USB disk.
>Fix:
There are several errors.
The FUA/DP0 support needs some refinement to not cause error messages
on devices that do not support these settings.
There needs to be some notion of "module load path is unavailable" so
that autoloading a module doesn't happen when this would cause a
deadlock. I think scsipi is the only such place for now. A quick
solution is to just load scsiverbose unconditionally instead of
on-demand.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sun, 09 Apr 2017 14:34:02 +0000
Responsible-Changed-Why:
I'll look at this. It's a bug in the autoload - the recent changes only
trigger MODE SENSE for page 8 (Caching page) via DIOCGCACHE call when WAPBL
filesystem is mounted, nothing really special.
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@NetBSD.org
Cc: jdolecek@NetBSD.org, kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: kern/52147 (deadlock when booting from USB disk)
Date: Sun, 9 Apr 2017 18:12:19 +0200
On Sun, Apr 09, 2017 at 02:34:03PM +0000, jdolecek@NetBSD.org wrote:
> Synopsis: deadlock when booting from USB disk
>
> I'll look at this. It's a bug in the autoload - the recent changes only
> trigger MODE SENSE for page 8 (Caching page) via DIOCGCACHE call when WAPBL
> filesystem is mounted, nothing really special.
That MODE SENSE fails with Invalid request.
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52147 CVS commit: src/sys/dev/scsipi
Date: Mon, 10 Apr 2017 18:20:43 +0000
Module Name: src
Committed By: jdolecek
Date: Mon Apr 10 18:20:43 UTC 2017
Modified Files:
src/sys/dev/scsipi: sd.c
Log Message:
execute the cache page MODE SENSE with XS_CTL_SILENT; it's pretty normal
for e.g. USB sticks thus showing error is not really useful, and the pretty
printing triggers autoload of scsiverbose module and immediate deadlock when
the DIOCGCACHE call is made by WAPBL during root mount
adresses PR kern/52147 by Michael van Elst
To generate a diff of this commit:
cvs rdiff -u -r1.323 -r1.324 src/sys/dev/scsipi/sd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Mon, 10 Apr 2017 21:29:51 +0000
State-Changed-Why:
I've committed a fix. Can you confirm it resolves your problem? I' want
to also deal with the scsiverbose autoload, but I'll do it separately.
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/52147 CVS commit: src/sys/dev/scsipi
Date: Mon, 10 Apr 2017 21:53:38 +0000
Module Name: src
Committed By: jdolecek
Date: Mon Apr 10 21:53:37 UTC 2017
Modified Files:
src/sys/dev/scsipi: scsipiconf.c
Log Message:
just do not autoload scsiverbose module, it causes deadlock if it happens
while root fs is being mounted
adresses second part of PR kern/52147 by Michael van Elst, thank you
To generate a diff of this commit:
cvs rdiff -u -r1.42 -r1.43 src/sys/dev/scsipi/scsipiconf.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/52147: deadlock when booting from USB disk
Date: Wed, 12 Apr 2017 05:25:52 +0000 (UTC)
christos@astron.com (Christos Zoulas) writes:
>>There needs to be some notion of "module load path is unavailable" so
>>that autoloading a module doesn't happen when this would cause a
>>deadlock. I think scsipi is the only such place for now. A quick
>>solution is to just load scsiverbose unconditionally instead of
>>on-demand.
>Can you print the deadlock path? Or instructions how to reproduce it?
This here happened on RPI with root on an a USB drive and filesystems
using WAPBL. The deadlock occurs shortly after starting userland
when a journal is played back (e.g. when root is remounted).
This could happen on all archs with SCSI disks.
The journal play back triggered a SCSI error (bad MODE SENSE) which triggers
a scsiverbose message which triggers the autoload but which cannot access
the sd device because REQUEST SENSE processing has the periph frozen.
Also, the message is printed synchronously in the completion thread, so
even when you offload the module loading, no other error on the same scsi
bus could be processed.
A workaround was to put scsiverbose into /etc/modules.conf which is
done while root is still read-only.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, mlelstv@serpens.de
Cc:
Subject: Re: kern/52147: deadlock when booting from USB disk
Date: Wed, 12 Apr 2017 08:25:18 -0400
On Apr 12, 5:30am, mlelstv@serpens.de (Michael van Elst) wrote:
-- Subject: Re: kern/52147: deadlock when booting from USB disk
| >Can you print the deadlock path? Or instructions how to reproduce it?
|
| This here happened on RPI with root on an a USB drive and filesystems
| using WAPBL. The deadlock occurs shortly after starting userland
| when a journal is played back (e.g. when root is remounted).
|
| This could happen on all archs with SCSI disks.
Yes, I understand.
| The journal play back triggered a SCSI error (bad MODE SENSE) which triggers
| a scsiverbose message which triggers the autoload but which cannot access
| the sd device because REQUEST SENSE processing has the periph frozen.
| Also, the message is printed synchronously in the completion thread, so
| even when you offload the module loading, no other error on the same scsi
| bus could be processed.
|
| A workaround was to put scsiverbose into /etc/modules.conf which is
| done while root is still read-only.
But hasn't the bad MODE SENSE been fixed now? I.e. the code was changed
back not to do MODE SENSE? Or do we need the notion of "root is currently
being mounted"?
christos
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@netbsd.org, Jaromir Dolecek <jdolecek@netbsd.org>, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, Michael van Elst <mlelstv@serpens.de>
Subject: Re: kern/52147: deadlock when booting from USB disk
Date: Wed, 12 Apr 2017 20:08:43 +0200
Yes, the offending MODE SENSE was silenced, it doesn't try to print
anything out. But it could easily be reintroduced if we add there
something else, which would cause the same problem again. Having
something inherently deadlocking is quite dangerous.
Given that scsiverbose is only cosmetic and completely optional (code
prints some relevant info out even without it), I think it makes sense
to leave it up to explicit user action to have it loaded.
2017-04-12 14:25 GMT+02:00 Christos Zoulas <christos@zoulas.com>:
> On Apr 12, 5:30am, mlelstv@serpens.de (Michael van Elst) wrote:
> -- Subject: Re: kern/52147: deadlock when booting from USB disk
>
> | >Can you print the deadlock path? Or instructions how to reproduce it?
> |
> | This here happened on RPI with root on an a USB drive and filesystems
> | using WAPBL. The deadlock occurs shortly after starting userland
> | when a journal is played back (e.g. when root is remounted).
> |
> | This could happen on all archs with SCSI disks.
>
> Yes, I understand.
>
> | The journal play back triggered a SCSI error (bad MODE SENSE) which triggers
> | a scsiverbose message which triggers the autoload but which cannot access
> | the sd device because REQUEST SENSE processing has the periph frozen.
> | Also, the message is printed synchronously in the completion thread, so
> | even when you offload the module loading, no other error on the same scsi
> | bus could be processed.
> |
> | A workaround was to put scsiverbose into /etc/modules.conf which is
> | done while root is still read-only.
>
> But hasn't the bad MODE SENSE been fixed now? I.e. the code was changed
> back not to do MODE SENSE? Or do we need the notion of "root is currently
> being mounted"?
>
> christos
From: christos@zoulas.com (Christos Zoulas)
To: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
Cc: gnats-bugs@netbsd.org, Jaromir Dolecek <jdolecek@netbsd.org>,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Michael van Elst <mlelstv@serpens.de>
Subject: Re: kern/52147: deadlock when booting from USB disk
Date: Wed, 12 Apr 2017 14:25:13 -0400
On Apr 12, 8:08pm, jaromir.dolecek@gmail.com (=?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?=) wrote:
-- Subject: Re: kern/52147: deadlock when booting from USB disk
| Yes, the offending MODE SENSE was silenced, it doesn't try to print
| anything out. But it could easily be reintroduced if we add there
| something else, which would cause the same problem again. Having
| something inherently deadlocking is quite dangerous.
|
| Given that scsiverbose is only cosmetic and completely optional (code
| prints some relevant info out even without it), I think it makes sense
| to leave it up to explicit user action to have it loaded.
I think it is better to just fix it not to deadlock, this is why I was
asking for a stack trace... This has never been a problem so far; it
got introduced by adding the mode-sense code. Yes, I agree it could
be re-introduced again, but manually loading modules goes against the
principle of having module loading/unloading be seamless and not noticed
by the user.
christos
From: Michael van Elst <mlelstv@serpens.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/52147: deadlock when booting from USB disk
Date: Thu, 13 Apr 2017 00:00:19 +0200
On Wed, Apr 12, 2017 at 08:25:18AM -0400, Christos Zoulas wrote:
> But hasn't the bad MODE SENSE been fixed now? I.e. the code was changed
> back not to do MODE SENSE? Or do we need the notion of "root is currently
> being mounted"?
AFAIK the code has been changed to be quiet in case of an error. That
solves the issue as far as it is created by the MODE SENSE command.
But I think any other SCSI error would trigger the same problem.
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: Michael van Elst <mlelstv@serpens.de>
To: Christos Zoulas <christos@zoulas.com>
Cc: =?iso-8859-1?Q?Jarom=EDr?= Dole?ek <jaromir.dolecek@gmail.com>,
gnats-bugs@netbsd.org, Jaromir Dolecek <jdolecek@netbsd.org>,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/52147: deadlock when booting from USB disk
Date: Thu, 13 Apr 2017 01:15:55 +0200
On Wed, Apr 12, 2017 at 02:25:13PM -0400, Christos Zoulas wrote:
> I think it is better to just fix it not to deadlock, this is why I was
> asking for a stack trace...
Typed manually from the screenshots....
mi_switch
sleepq_block
cv_wait
biowait
breadn
ufs_blkatoff
ufs_lookup
VOP_LOOKUP
lookup_once
namei_tryemulroot
namei
vn_open
kobj_load_vfs
module_load_vfs
module_do_load
module_autoload
scsipi_print_sense_stub
scsipi_interpret_sense
scsipi_complete
scsipi_execute_xs
scsipi_command
scsipi_mode_sense_big
sd_mode_sense
sdioctl
spec_ioctl
VOP_IOCTL
spec_ioctl
VOP_IOCTL
wapbl_start
ffs_wapbl_start
ffs_mount
VFS_MOUNT
do_sys_mount
sys___mount50
syscall
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, jdolecek@NetBSD.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, mlelstv@serpens.de
Cc:
Subject: Re: kern/52147: deadlock when booting from USB disk
Date: Wed, 12 Apr 2017 20:04:50 -0400
On Apr 12, 11:20pm, mlelstv@serpens.de (Michael van Elst) wrote:
-- Subject: Re: kern/52147: deadlock when booting from USB disk
Looks like this should do it?
christos
Index: kern_module.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_module.c,v
retrieving revision 1.123
diff -u -u -r1.123 kern_module.c
--- kern_module.c 11 Apr 2017 21:15:57 -0000 1.123
+++ kern_module.c 13 Apr 2017 00:04:02 -0000
@@ -609,7 +609,7 @@
{
int error;
- if (rootvp == NULL) {
+ if (rootvp == NULL || rootvp->v_mount == NULL) {
#ifdef DIAGNOSTIC
printf("%s: trying to load `%s' before root is mounted\n",
__func__, filename);
@@ -617,6 +617,14 @@
return EPERM;
}
+ if (fstrans_getstate(rootvp->v_vmount) != FSTRANS_NORMAL) {
+#ifdef DIAGNOSTIC
+ printf("%s: trying to load `%s' while root is suspended\n",
+ __func__, filename);
+#endif
+ return EPERM;
+ }
+
kernconfig_lock();
/* Nothing if the user has disabled it. */
State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 13 May 2017 20:45:04 +0000
State-Changed-Why:
Fixed, thanks for report.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.