NetBSD Problem Report #47646
From jmmv@netbsd.org Thu Mar 14 06:36:41 2013
Return-Path: <jmmv@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id D4D8863EE4D
for <gnats-bugs@gnats.NetBSD.org>; Thu, 14 Mar 2013 06:36:40 +0000 (UTC)
Message-Id: <20130314063640.A07E814A32C@mail.netbsd.org>
Date: Thu, 14 Mar 2013 06:36:40 +0000 (UTC)
From: jmmv@netbsd.org
Reply-To: jmmv@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Test dev/scsipi/t_cd:noisyeject broken
X-Send-Pr-Version: 3.95
>Number: 47646
>Category: kern
>Synopsis: Test dev/scsipi/t_cd:noisyeject broken
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Mar 14 06:40:00 +0000 2013
>Closed-Date: Fri Mar 15 16:19:36 +0000 2013
>Last-Modified: Fri Mar 15 16:20:03 +0000 2013
>Originator: Julio Merino
>Release: NetBSD 6.99.17
>Organization:
>Environment:
System: NetBSD nbair.virtual.network 6.99.17 NetBSD 6.99.17 (GENERIC) #10: Thu Mar 14 13:50:55 JST 2013 jmmv@nbair.virtual.network:/home/jmmv/os/netbsd/amd64/obj/home/jmmv/os/netbsd/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
The dev/scsipi/t_cd:noisyeject test is broken. It segfaults, and
I do not know if this is due to the test being broken or due to
some problem in rump or the device driver (according to the stack
trace). But the test cannot remain broken, so filing this report
to mark the test as known to fail.
The stack trace looks like this:
[...]
feature 0 having length 133
feature 18825 having length 73
feature 59893 having length 255
feature 9224 having length 19
feature 35271 having length 157
feature 30013 having length 1
feature 72 having length 157
feature 0 having length 133
feature 72 haProcess with PID 21857 dumped core; attempting to gather stack trace
[New process 1]
[New process 24]
[New process 23]
[New process 22]
[New process 21]
[New process 20]
[New process 19]
[New process 18]
[New process 17]
[New process 16]
[New process 15]
[New process 14]
[New process 13]
[New process 12]
[New process 11]
[New process 10]
[New process 9]
[New process 8]
[New process 7]
[New process 6]
[New process 5]
[New process 4]
[New process 3]
[New process 2]
Core was generated by `t_cd'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f7ff7412ade in ?? () from /usr/lib/librumpdev_scsipi.so.0
#0 0x00007f7ff7412ade in ?? () from /usr/lib/librumpdev_scsipi.so.0
#1 0x00007f7ff64666d7 in rumpns_bdev_ioctl () from /usr/lib/librump.so.0
#2 0x00007f7ff70055a1 in rumpns_readdisklabel () from /usr/lib/librumpdev_disk.so.0
#3 0x00007f7ff7410552 in ?? () from /usr/lib/librumpdev_scsipi.so.0
#4 0x00007f7ff74149c0 in ?? () from /usr/lib/librumpdev_scsipi.so.0
#5 0x00007f7ff6466880 in rumpns_cdev_open () from /usr/lib/librump.so.0
#6 0x00007f7ff6826696 in rumpns_spec_open () from /usr/lib/librumpvfs.so.0
#7 0x00007f7ff645218c in rumpns_VOP_OPEN () from /usr/lib/librump.so.0
#8 0x00007f7ff682cc8d in rumpns_vn_open () from /usr/lib/librumpvfs.so.0
#9 0x00007f7ff6831622 in ?? () from /usr/lib/librumpvfs.so.0
#10 0x00007f7ff6831769 in ?? () from /usr/lib/librumpvfs.so.0
#11 0x00007f7ff6835107 in rumpns_sys_open () from /usr/lib/librumpvfs.so.0
#12 0x00007f7ff64819f1 in ?? () from /usr/lib/librump.so.0
#13 0x00007f7ff6481b43 in rump___sysimpl_open () from /usr/lib/librump.so.0
#14 0x00000000004010c3 in atfu_noisyeject_body ()
#15 0x00007f7ff540a39a in atf_tc_run () from /usr/lib/libatf-c.so.0
#16 0x00007f7ff5406e7b in atf_tp_main () from /usr/lib/libatf-c.so.0
#17 0x0000000000400f22 in ___start ()
#18 0x00007f7ff7ffa000 in ?? ()
#19 0x0000000000000006 in ?? ()
#20 0x00007f7ffffffb20 in ?? ()
#21 0x00007f7ffffffb6d in ?? ()
#22 0x00007f7ffffffb70 in ?? ()
#23 0x00007f7ffffffb97 in ?? ()
#24 0x00007f7ffffffb9a in ?? ()
#25 0x00007f7ffffffbac in ?? ()
#26 0x0000000000000000 in ?? ()
GDB exited successfully
t_cd:noisyeject -> broken: Premature exit; test case received signal 11 (core dumped)
>How-To-Repeat:
$ kyua debug -k /usr/tests/Kyuafile dev/scsipi/t_cd:noisyeject
either on i386 or amd64. Or look at the test logs online.
>Fix:
Dunno.
>Release-Note:
>Audit-Trail:
From: "Julio Merino" <jmmv@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/47646 CVS commit: src/tests/dev/scsipi
Date: Thu, 14 Mar 2013 06:43:32 +0000
Module Name: src
Committed By: jmmv
Date: Thu Mar 14 06:43:32 UTC 2013
Modified Files:
src/tests/dev/scsipi: t_cd.c
Log Message:
Expect a crash in the noisyeject test. Mention PR kern/47646.
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/tests/dev/scsipi/t_cd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Thu, 14 Mar 2013 10:12:10 +0100
When did this start happening?
I have never seen it so far.
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: pooka@NetBSD.org
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Thu, 14 Mar 2013 10:41:50 +0100
To answer the question:
build: OK with 358301 lines of log, install: OK, tests: 3456 passed, 80 skipped, 42 expected_failure, 21 failed, ATF output: raw, xml, html
commit 2013.03.10.16.27.11 pooka src/sys/rump/librump/rumpkern/Makefile.rumpkern 1.122
commit 2013.03.10.16.27.11 pooka src/sys/rump/librump/rumpkern/memalloc.c 1.19
commit 2013.03.10.16.51.31 pooka src/sys/rump/librump/rumpkern/rump.c 1.256
commit 2013.03.10.16.51.31 pooka src/sys/rump/librump/rumpkern/rump_private.h 1.74
commit 2013.03.10.16.51.31 pooka src/sys/rump/librump/rumpkern/threads.c 1.18
commit 2013.03.10.17.05.12 pooka src/sys/rump/librump/rumpkern/Makefile.rumpkern 1.123
commit 2013.03.10.17.05.12 pooka src/sys/rump/librump/rumpkern/memalloc.c 1.20
commit 2013.03.10.17.05.12 pooka src/sys/rump/librump/rumpkern/rump.c 1.257
commit 2013.03.10.17.05.12 pooka src/sys/rump/librump/rumpkern/vm.c 1.137
commit 2013.03.10.17.55.42 pooka src/sys/kern/kern_uidinfo.c 1.8
build: OK with 358770 lines of log, install: OK, tests: 3454 passed, 80 skipped, 42 expected_failure, 23 failed, ATF output: raw, xml, html
The first one ran the test OK, with the second one it dumped core.
Antti?
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: pooka@NetBSD.org
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Thu, 14 Mar 2013 15:56:54 +0100
Note that it does not happen on all archs - the test now fails for me
with:
tc-end: 1363272941.977413, noisyeject, failed, Test case was expected to receive a termination signal but it continued execution
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Thu, 14 Mar 2013 22:34:33 +0100
It dies in:
#0 0x00007f7ff74185b2 in mmc_getdiscinfo (periph=0x7f7ff6301f08,
mmc_discinfo=0x7f7fffffd0b0)
at /usr/src/sys/rump/dev/lib/libscsipi/../../../../dev/scsipi/cd.c:3056
#1 0x00007f7ff741515d in cdioctl (dev=1539, cmd=1081631516,
addr=0x7f7fffffd0b0, flag=-2147483648, l=0x7f7ff7b4b000)
at /usr/src/sys/rump/dev/lib/libscsipi/../../../../dev/scsipi/cd.c:1624
#2 0x00007f7ff64666d7 in rumpns_bdev_ioctl () from /usr/lib/librump.so.0
#3 0x00007f7ff70055a1 in rumpns_readdisklabel ()
from /usr/lib/librumpdev_disk.so.0
#4 0x00007f7ff741566a in cdgetdisklabel (cd=0x7f7ff6b26300)
at /usr/src/sys/rump/dev/lib/libscsipi/../../../../dev/scsipi/cd.c:1765
especially:
3051 fpos = &gc->feature_desc[0];
3052 while (pos < features_len - 4) {
3053 gcf = (struct scsipi_get_conf_feature *) fpos;
and both pos and features_len seem to be way out of bounds:
(gdb) print features_len
$1 = 10547445
(gdb) print pos
$2 = 1087470
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: Manuel Bouyer <bouyer@antioche.eu.org>
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Fri, 15 Mar 2013 11:03:53 +0100
--gBBFr7Ir9EOA20Yy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
And this patch works around it - looks like somewhere on the way an
error return value from cdstrategy is lost. With this change, the
feature_len comes up as 0 and we break out of the loop imediately.
Manuel?
Martin
--gBBFr7Ir9EOA20Yy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=patch
Index: cd.c
===================================================================
RCS file: /cvsroot/src/sys/dev/scsipi/cd.c,v
retrieving revision 1.309
diff -u -r1.309 cd.c
--- cd.c 6 May 2012 17:23:10 -0000 1.309
+++ cd.c 15 Mar 2013 10:00:32 -0000
@@ -3030,6 +3030,7 @@
/* determine mmc profile and class */
flags = XS_CTL_DATA_IN;
memset(&gc_cmd, 0, sizeof(gc_cmd));
+ _lto4b(0, gc->data_len);
gc_cmd.opcode = GET_CONFIGURATION;
_lto2b(last_feature, gc_cmd.start_at_feature);
_lto2b(feat_tbl_len, gc_cmd.data_len);
--gBBFr7Ir9EOA20Yy--
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: Manuel Bouyer <bouyer@antioche.eu.org>, Antti Kantee <pooka@NetBSD.org>
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Fri, 15 Mar 2013 11:32:16 +0100
Actually there is no error involved, the value comes from here:
182 case SCSI_READ_6_COMMAND: {
183 #ifdef USE_TOSI_ISO
184 struct scsi_rw_6 *param = (void *)cmd;
185
186 printf("reading %d bytes from %d\n",
187 param->length * CDBLOCKSIZE,
188 _3btol(param->addr) * CDBLOCKSIZE);
189 rumpuser_pread(isofd, xs->data,
190 param->length * CDBLOCKSIZE,
191 _3btol(param->addr) * CDBLOCKSIZE,
192 &error);
193 #endif
194
195 break;
with this backtrace:
#0 scsitest_request (chan=0x7f7ff6b26480, req=ADAPTER_REQ_RUN_XFER,
arg=0x7f7ff7be2f00) at /usr/src/sys/rump/dev/lib/libscsitest/scsitest.c:205
#1 0x00007f7ff741fda5 in scsipi_run_queue (chan=0x7f7ff6b26480)
at /usr/src/sys/rump/dev/lib/libscsipi/../../../../dev/scsipi/scsipi_base.c:1845
#2 0x00007f7ff742010c in scsipi_execute_xs (xs=0x7f7ff7be2f00)
at /usr/src/sys/rump/dev/lib/libscsipi/../../../../dev/scsipi/scsipi_base.c:1970
#3 0x00007f7ff7420f3b in scsipi_command (periph=0x7f7ff6301f08,
cmd=0x7f7fffffcae0, cmdlen=10, data_addr=0x7f7ff630e808 "", datalen=1024,
retries=4, timeout=30000, bp=0x0, flags=4096)
at /usr/src/sys/rump/dev/lib/libscsipi/../../../../dev/scsipi/scsipiconf.c:101
#4 0x00007f7ff741854f in mmc_getdiscinfo (periph=0x7f7ff6301f08,
mmc_discinfo=0x7f7fffffd080)
at /usr/src/sys/rump/dev/lib/libscsipi/../../../../dev/scsipi/cd.c:3038
and the xs is:
$8 = {channel_q = {tqe_next = 0x0, tqe_prev = 0x7f7ff6b26548}, device_q = {
tqe_next = 0x0, tqe_prev = 0x7f7ff6301f90}, xs_callout = {_c_store = {0x0,
0x0, 0x0, 0x0, 0x7f7ff66c68a0, 0x100000000, 0x11deeba1, 0x0, 0x0, 0x0}},
xs_control = 4096, xs_status = 0, xs_periph = 0x7f7ff6301f08,
xs_retries = 4, xs_requeuecnt = 0, timeout = 30000, cmd = 0x7f7ff7be2fec,
cmdlen = 10, data = 0x7f7ff630e808 "", datalen = 1024, resid = 1024,
error = XS_NOERROR, bp = 0x0, sense = {scsi_sense = {
...
So the (simulated) device, when told to read 1024 bytes, reads 0 bytes
and signals no error - is that OK?
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: Antti Kantee <pooka@NetBSD.org>, Manuel Bouyer <bouyer@NetBSD.org>
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Fri, 15 Mar 2013 12:08:53 +0100
--pf9I7BMVVzbSWLtt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Is there some easy way (easier than allocating a temporary buf) to get
the residual here? If so we should test that and make sure we got enough
of an answer to be meaningfull.
I propose the attached patch to fix this the issue in two, unrelated ways.
Martin
--pf9I7BMVVzbSWLtt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=patch
Index: cd.c
===================================================================
RCS file: /cvsroot/src/sys/dev/scsipi/cd.c,v
retrieving revision 1.309
diff -u -r1.309 cd.c
--- cd.c 6 May 2012 17:23:10 -0000 1.309
+++ cd.c 15 Mar 2013 11:03:17 -0000
@@ -3033,6 +3033,7 @@
gc_cmd.opcode = GET_CONFIGURATION;
_lto2b(last_feature, gc_cmd.start_at_feature);
_lto2b(feat_tbl_len, gc_cmd.data_len);
+ memset(gc, 0, feat_tbl_len);
error = scsipi_command(periph,
(void *)&gc_cmd, sizeof(gc_cmd),
@@ -3044,7 +3045,7 @@
}
features_len = _4btol(gc->data_len);
- if (features_len < 4)
+ if (features_len < 4 || features_len > feat_tbl_len)
break;
pos = 0;
--pf9I7BMVVzbSWLtt--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org, Antti Kantee <pooka@NetBSD.org>
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Fri, 15 Mar 2013 15:51:29 +0100
On Fri, Mar 15, 2013 at 11:32:16AM +0100, Martin Husemann wrote:
> [...]
>
> So the (simulated) device, when told to read 1024 bytes, reads 0 bytes
> and signals no error - is that OK?
it's not, it should at last return the feature header. See
http://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDoQFjAA&url=ftp%3A%2F%2Fftp.t10.org%2Ft10%2Fdocument.97%2F97-263r0.pdf&ei=LDNDUcbmF8eQhQeG84DgCA&usg=AFQjCNEFP6HJkVddmEJfQ1fCNHPYRTnonA&bvm=bv.43828540,d.ZG4
(document T10/97-263r0)
I guess the feature header could have a 0 data length, which would break
out of the loop immediatly.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org, Antti Kantee <pooka@NetBSD.org>
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Fri, 15 Mar 2013 15:54:03 +0100
On Fri, Mar 15, 2013 at 12:08:53PM +0100, Martin Husemann wrote:
> Is there some easy way (easier than allocating a temporary buf) to get
> the residual here? If so we should test that and make sure we got enough
> of an answer to be meaningfull.
>
> I propose the attached patch to fix this the issue in two, unrelated ways.
I think the problem comes from the emulator, which doesn't return the
mandatory feature header. But zeroing out the buffer before the
command, and sanity-checking the result is probably good to have anyway.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Martin Husemann <martin@duskware.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, Antti Kantee <pooka@NetBSD.org>
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Fri, 15 Mar 2013 16:43:04 +0100
--+g7M9IMkV8truYOl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Ok, so additionally to the last patch do this?
Martin
--+g7M9IMkV8truYOl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=patch
Index: scsitest.c
===================================================================
RCS file: /cvsroot/src/sys/rump/dev/lib/libscsitest/scsitest.c,v
retrieving revision 1.5
diff -u -r1.5 scsitest.c
--- scsitest.c 27 Oct 2012 17:18:40 -0000 1.5
+++ scsitest.c 15 Mar 2013 15:42:15 -0000
@@ -176,7 +176,7 @@
break;
}
case GET_CONFIGURATION: {
-
+ memset(xs->data, 0, sizeof(struct scsipi_get_conf_data));
break;
}
case SCSI_READ_6_COMMAND: {
--+g7M9IMkV8truYOl--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@NetBSD.org, Antti Kantee <pooka@NetBSD.org>
Subject: Re: kern/47646: Test dev/scsipi/t_cd:noisyeject broken
Date: Fri, 15 Mar 2013 17:04:36 +0100
On Fri, Mar 15, 2013 at 04:43:04PM +0100, Martin Husemann wrote:
> Ok, so additionally to the last patch do this?
Yes, this should fix the root cause of the problem.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/47646 CVS commit: src/sys/rump/dev/lib/libscsitest
Date: Fri, 15 Mar 2013 16:14:13 +0000
Module Name: src
Committed By: martin
Date: Fri Mar 15 16:14:13 UTC 2013
Modified Files:
src/sys/rump/dev/lib/libscsitest: scsitest.c
Log Message:
For the MMC GET_CONFIGURATION command, return a nulled feature header.
Fixes PR kern/47646.
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/sys/rump/dev/lib/libscsitest/scsitest.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Fri, 15 Mar 2013 16:19:36 +0000
State-Changed-Why:
fixed
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/47646 CVS commit: src/sys/dev/scsipi
Date: Fri, 15 Mar 2013 16:16:12 +0000
Module Name: src
Committed By: martin
Date: Fri Mar 15 16:16:12 UTC 2013
Modified Files:
src/sys/dev/scsipi: cd.c
Log Message:
Prompted by PR kern/47646, zero the value buffer before running the
GET_CONFIGURATION command and check for an unexpectedly large feature
length answer afterwards.
To generate a diff of this commit:
cvs rdiff -u -r1.309 -r1.310 src/sys/dev/scsipi/cd.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.