NetBSD Problem Report #53280

From gson@gson.org  Sat May 12 14:49:57 2018
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 696A67A1CE
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 12 May 2018 14:49:57 +0000 (UTC)
Message-Id: <20180512144951.8556E9899B9@guava.gson.org>
Date: Sat, 12 May 2018 17:49:51 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: amd64 panics since recent compat/netbsd32 commits
X-Send-Pr-Version: 3.95

>Number:         53280
>Category:       kern
>Synopsis:       amd64 panics since recent compat/netbsd32 commits
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    christos
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 12 14:50:00 +0000 2018
>Closed-Date:    Sun May 13 15:43:46 +0000 2018
>Last-Modified:  Mon May 14 13:15:01 +0000 2018
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2018.05.10.02.36.26
>Organization:
>Environment:
System: NetBSD guava.gson.org 7.1.1 NetBSD 7.1.1 (GENERIC.201712222334Z) amd64
Architecture: x86_64
Machine: amd64
>Description:

Running the lib/libbpfjit/t_bpfjit test, NetBSD/amd64 panics with:

    libbpfjit_opt_ld_abs_3: [0.009291s] Passed.
    libbpfj[ 893.0457576] panic: mbuf 0xffffe407d8f81800 already freed
[ 893.1124434] cpu0: Begin traceback...
[ 893.1572333] vpanic() at netbsd:vpanic+0x16f
i[ 893.2072275] snprintf() at netbsd:snprintf
t[ 893.2561867] m_cat() at netbsd:m_cat
[ 893.2968113] m_freem() at netbsd:m_freem+0xe
_[ 893.3468085] unp_send() at netbsd:unp_send+0x9b
[ 893.4009719] sosend() at netbsd:sosend+0x83b
[ 893.4509703] do_sys_sendmsg_so() at netbsd:do_sys_sendmsg_so+0x27d
[ 893.5259668] do_sys_sendmsg() at netbsd:do_sys_sendmsg+0x73
[ 893.5895060] netbsd32_sendmsg() at netbsd:netbsd32_sendmsg+0x7b
[ 893.6592946] netbsd32_syscall() at netbsd:netbsd32_syscall+-0x1e5b9e
[ 893.7363745] cpu0: End traceback...

Bisection shows the problem started with the following commits:

  2018.05.10.02.36.07 christos src/sys/compat/netbsd32/netbsd32.h 1.118
  2018.05.10.02.36.07 christos src/sys/compat/netbsd32/netbsd32_compat_20.c 1.37
  2018.05.10.02.36.07 christos src/sys/compat/netbsd32/netbsd32_conv.h 1.35
  2018.05.10.02.36.07 christos src/sys/compat/netbsd32/netbsd32_socket.c 1.46
  2018.05.10.02.36.07 christos src/sys/compat/netbsd32/syscalls.master 1.121
  2018.05.10.02.36.26 christos src/sys/compat/netbsd32/netbsd32_syscall.h 1.135
  2018.05.10.02.36.26 christos src/sys/compat/netbsd32/netbsd32_syscallargs.h 1.135
  2018.05.10.02.36.26 christos src/sys/compat/netbsd32/netbsd32_syscalls.c 1.133
  2018.05.10.02.36.26 christos src/sys/compat/netbsd32/netbsd32_syscalls_autoload.c 1.14
  2018.05.10.02.36.26 christos src/sys/compat/netbsd32/netbsd32_sysent.c 1.133
  2018.05.10.02.36.26 christos src/sys/compat/netbsd32/netbsd32_systrace_args.c 1.25

>How-To-Repeat:

Run the ATF tests.

>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->christos
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Sat, 12 May 2018 14:53:40 +0000
Responsible-Changed-Why:
Over to committer.


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org, alnsn@NetBSD.org
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 18:59:29 +0200

 There is no good reason anything in here uses netbsd32_* at all,
 or am I missing something?

 Martin

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gson@gson.org (Andreas Gustafsson)
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 13:32:07 -0400

 On May 12,  5:00pm, martin@duskware.de (Martin Husemann) wrote:
 -- Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits

 |  There is no good reason anything in here uses netbsd32_* at all,
 |  or am I missing something?

 Are those i386 tests running on amd64? I can't reproduce this with either!

 christos

From: Andreas Gustafsson <gson@gson.org>
To: christos@zoulas.com (Christos Zoulas), gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 20:55:45 +0300

 Christos Zoulas wrote:
 > Are those i386 tests running on amd64?

 This is the standard ATF test suite running on amd64.

 The backtrace in the PR is from my own testbed, which runs on
 real hardware.  Full log:

   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2018/2018.05.10.02.36.26/test.log

 There are also panics under qemu on b5, but triggered by different tests:

   http://releng.netbsd.org/b5reports/amd64/2018/2018.05.10.05.08.53/test.log
   http://releng.netbsd.org/b5reports/amd64/2018/2018.05.11.00.00.17/test.log

 > I can't reproduce this with either!

 Then revert your commits until you can.  I will accept no excuses
 this time.
 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: Andreas Gustafsson <gson@gson.org>, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 13:59:08 -0400

 On May 12,  8:55pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits

 | The backtrace in the PR is from my own testbed, which runs on
 | real hardware.  Full log:
 | 
 |   http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2018/2018.05.10.02.36.26/test.log
 | 
 | There are also panics under qemu on b5, but triggered by different tests:
 | 
 |   http://releng.netbsd.org/b5reports/amd64/2018/2018.05.10.05.08.53/test.log
 |   http://releng.netbsd.org/b5reports/amd64/2018/2018.05.11.00.00.17/test.log
 | 
 | > I can't reproduce this with either!
 | 
 | Then revert your commits until you can.  I will accept no excuses
 | this time.

 I am asking you to explain how to reproduce so I can fix it...
 I don't understand how an amd64 test can call into compat-netbsd32,
 unless it is an i386 binary running on amd64. Please explain!

 christos

From: Andreas Gustafsson <gson@gson.org>
To: christos@zoulas.com (Christos Zoulas), gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 21:15:33 +0300

 Christos Zoulas wrote:
 > I am asking you to explain how to reproduce so I can fix it...
 > I don't understand how an amd64 test can call into compat-netbsd32,
 > unless it is an i386 binary running on amd64. Please explain!

 I don't think you heard me.  If you don't understand what your commit
 did, the right thing to do is to revert it.  And this time, I'm not
 just asking you to do the right thing, I'm insisting on it.
 --
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: Andreas Gustafsson <gson@gson.org>, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 15:05:59 -0400

 On May 12,  9:15pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits

 | I don't think you heard me.  If you don't understand what your commit
 | did, the right thing to do is to revert it.  And this time, I'm not
 | just asking you to do the right thing, I'm insisting on it.

 I can't possibly "hear" you, but I've "read" you. You are running
 the tests and while your work is appreciated, without being able
 to reproduce your environment, I can't fix the issue. I am simply
 asking how it is possible to execute compat-netbsd32 code from a
 64 bit binary!  I understand you are upset about the go issue, but
 keep things technical, and try to be helpful. I added the code
 because nsd needs send and recv mmsg which were missing. So removing
 the code breaks nsd which is not a test... In addition, I have more
 tests that I have not committed because the code is broken in
 different ways (the code before I committed my stuff). I can revert
 the new changes and commit the new tests and then you can have
 different broken tests :-) This is what tests are all about anyway:
 to detect broken code...

 christos

From: Andreas Gustafsson <gson@gson.org>
To: christos@zoulas.com (Christos Zoulas), gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 23:08:21 +0300

 Christos Zoulas wrote:
 > I can't possibly "hear" you, but I've "read" you. You are running
 > the tests and while your work is appreciated, without being able
 > to reproduce your environment, I can't fix the issue.

 Reverting your commit will fix the panic.  Then you can fix the issue
 without everyone having to suffer from the panic while you do it.

 > I am simply asking how it is possible to execute compat-netbsd32
 > code from a 64 bit binary!

 I have no idea, and I don't think finding out should be my
 responsibility.

 > I added the code because nsd needs send and recv mmsg which were
 > missing. So removing the code breaks nsd which is not a test...

 Fixing nsd does not justify making the system panic.

 > In addition, I have more
 > tests that I have not committed because the code is broken in
 > different ways (the code before I committed my stuff). I can revert
 > the new changes and commit the new tests and then you can have
 > different broken tests :-) This is what tests are all about anyway:
 > to detect broken code...

 Yes, that is how it should work.  A few broken tests is vastly better
 than having no test results at all because the system paniced before
 the test run completed.
 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gson@gson.org (Andreas Gustafsson)
Cc: 
Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits
Date: Sat, 12 May 2018 18:11:53 -0400

 On May 12,  8:10pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: kern/53280: amd64 panics since recent compat/netbsd32 commits

 |  I have no idea, and I don't think finding out should be my
 |  responsibility.

 I just want to know at least if this is a 32 bit binary causing this...
 |
 |  > I added the code because nsd needs send and recv mmsg which were
 |  > missing. So removing the code breaks nsd which is not a test...
 |  
 |  Fixing nsd does not justify making the system panic.

 Yes.

 |  > In addition, I have more
 |  > tests that I have not committed because the code is broken in
 |  > different ways (the code before I committed my stuff). I can revert
 |  > the new changes and commit the new tests and then you can have
 |  > different broken tests :-) This is what tests are all about anyway:
 |  > to detect broken code...
 |  
 |  Yes, that is how it should work.  A few broken tests is vastly better
 |  than having no test results at all because the system paniced before
 |  the test run completed.

 I agree, but I could fix it as quickly as reverting if I understood
 what breaks it...

 christos

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Sat, 12 May 2018 20:04:23 -0400

 Module Name:	src
 Committed By:	christos
 Date:		Sun May 13 00:04:23 UTC 2018

 Modified Files:
 	src/sys/compat/netbsd32: netbsd32_socket.c

 Log Message:
 PR/53280: Andreas Gustafsson: Fix panic in the fdpass test. This is probably
 the only 32 bit binary in the tests...


 To generate a diff of this commit:
 cvs rdiff -u -r1.46 -r1.47 src/sys/compat/netbsd32/netbsd32_socket.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: christos@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	Andreas Gustafsson <gson@gson.org>
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Sun, 13 May 2018 12:34:34 +0200

 On Sun, May 13, 2018 at 12:05:00AM +0000, Christos Zoulas wrote:
 >  PR/53280: Andreas Gustafsson: Fix panic in the fdpass test. This is probably
 >  the only 32 bit binary in the tests...

 Which binary was it (and was 32bitness on purpose?)

 Martin

From: christos@zoulas.com (Christos Zoulas)
To: Martin Husemann <martin@duskware.de>, gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Andreas Gustafsson <gson@gson.org>
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Sun, 13 May 2018 08:12:12 -0400

 On May 13, 12:34pm, martin@duskware.de (Martin Husemann) wrote:
 -- Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32

 | Which binary was it (and was 32bitness on purpose?)

 src/tests/net/fdpass (fdpass32). Yes it is testing file descriptor passing
 between 32 <-> 32, 32 <-> 64.

 christos

From: Alexander Nasonov <alnsn@yandex.ru>
To: Christos Zoulas <christos@zoulas.com>
Cc: Martin Husemann <martin@duskware.de>, gnats-bugs@NetBSD.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	Andreas Gustafsson <gson@gson.org>
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Sun, 13 May 2018 13:45:56 +0100

 Christos Zoulas wrote:
 > On May 13, 12:34pm, martin@duskware.de (Martin Husemann) wrote:
 > -- Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
 > 
 > | Which binary was it (and was 32bitness on purpose?)
 > 
 > src/tests/net/fdpass (fdpass32). Yes it is testing file descriptor passing
 > between 32 <-> 32, 32 <-> 64.

 Is it related to t_bfpjit at all as originally reported? I din't put any
 compat logic when I wrote that test.

 -- 
 Alex

From: christos@zoulas.com (Christos Zoulas)
To: Alexander Nasonov <alnsn@yandex.ru>
Cc: Martin Husemann <martin@duskware.de>, gnats-bugs@NetBSD.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Andreas Gustafsson <gson@gson.org>
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Sun, 13 May 2018 09:12:41 -0400

 On May 13,  1:45pm, alnsn@yandex.ru (Alexander Nasonov) wrote:
 -- Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32

 | Is it related to t_bfpjit at all as originally reported? I din't put any
 | compat logic when I wrote that test.

 No, it is not related. This is why I was asking Andreas how he managed
 to get the bpfjit code to run in 32 bit mode on a 64 bit system. I could
 not understand how the bpfjit test could cause this panic.

 christos

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org, christos@zoulas.com, alnsn@yandex.ru, martin@duskware.de
Cc: 
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Sun, 13 May 2018 16:30:01 +0300

 Christos Zoulas wrote:
 >  No, it is not related. This is why I was asking Andreas how he managed
 >  to get the bpfjit code to run in 32 bit mode on a 64 bit system. I could
 >  not understand how the bpfjit test could cause this panic.

 Nor could I at the time, but I just came up with a possible
 explanation.  The bpfjit tests (and subsequent ones) may be generating
 console output fast enough that the 9600 bps serial console can't keep
 up, so it gets buffered in the tty driver, and the serial port is
 still outputting buffered data from the bpfjit test when the fdpass
 test triggers the panic.
 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gson@gson.org (Andreas Gustafsson)
Cc: 
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Sun, 13 May 2018 09:37:08 -0400

 On May 13,  1:35pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32

 |  Nor could I at the time, but I just came up with a possible
 |  explanation.  The bpfjit tests (and subsequent ones) may be generating
 |  console output fast enough that the 9600 bps serial console can't keep
 |  up, so it gets buffered in the tty driver, and the serial port is
 |  still outputting buffered data from the bpfjit test when the fdpass
 |  test triggers the panic.

 This is why we should strive to make tests print less information by default
 (in non-debugging mode). Silence (or near silence) is golden...

 christos

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Sun, 13 May 2018 15:43:46 +0000
State-Changed-Why:
Confirmed fixed, thanks.


From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Mon, 14 May 2018 09:32:58 +0200

 On Sun, May 13, 2018 at 01:40:01PM +0000, Christos Zoulas wrote:
 >  This is why we should strive to make tests print less information by default
 >  (in non-debugging mode). Silence (or near silence) is golden...

 I understand what you mean, but often test failures are hard to reproduce
 elsewhere and it is good to have debugging output available from the
 initial failure.

 FWIW, in my test setups the test output itself is redirected and only the
 atf-report ticker output is visible.

 Martin

From: Andreas Gustafsson <gson@gson.org>
To: Martin Husemann <martin@duskware.de>, gnats-bugs@NetBSD.org
Cc: christos@NetBSD.org
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Mon, 14 May 2018 11:09:24 +0300

 Martin Husemann wrote:
 >  FWIW, in my test setups the test output itself is redirected and only the
 >  atf-report ticker output is visible.

 My test setups are the same.

 For a moment I considered making atf-report do something like
 "if (isatty(fd)) ioctl(fd, TIOCDRAIN)" after each test program,
 but that won't help because there is still buffering in the pipe
 between atf-run and atf-report.
 --
 Andreas Gustafsson, gson@gson.org

From: Martin Husemann <martin@duskware.de>
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@NetBSD.org, christos@NetBSD.org
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Mon, 14 May 2018 10:12:52 +0200

 On Mon, May 14, 2018 at 11:09:24AM +0300, Andreas Gustafsson wrote:
 > Martin Husemann wrote:
 > >  FWIW, in my test setups the test output itself is redirected and only the
 > >  atf-report ticker output is visible.
 > 
 > My test setups are the same.
 > 
 > For a moment I considered making atf-report do something like
 > "if (isatty(fd)) ioctl(fd, TIOCDRAIN)" after each test program,
 > but that won't help because there is still buffering in the pipe
 > between atf-run and atf-report.

 Right. But the main point is: making the test case itself less verbose
 will not help at all.

 Martin

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gson@gson.org (Andreas Gustafsson)
Cc: 
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Mon, 14 May 2018 08:36:09 -0400

 On May 14,  7:35am, martin@duskware.de (Martin Husemann) wrote:
 -- Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32

 |  I understand what you mean, but often test failures are hard to reproduce
 |  elsewhere and it is good to have debugging output available from the
 |  initial failure.

 Right, but there is a cost to printing everytime. Perhaps have a verbose
 option that is uniform amongst tests so at least the test runner can choose.

 christos

From: Andreas Gustafsson <gson@gson.org>
To: christos@zoulas.com (Christos Zoulas), martin@duskware.de, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/53280 CVS commit: src/sys/compat/netbsd32
Date: Mon, 14 May 2018 16:14:11 +0300

 Christos Zoulas wrote:
 > |  I understand what you mean, but often test failures are hard to reproduce
 > |  elsewhere and it is good to have debugging output available from the
 > |  initial failure.
 > 
 > Right, but there is a cost to printing everytime. Perhaps have a verbose
 > option that is uniform amongst tests so at least the test runner can choose.

 Reducing the amount of printing done by the test cases themselves
 wouldn't have helped here, because the output that got buffered did
 not actually contain anything printed *by* the test case, but was just
 the atf-report "ticker" output *listing* each test case, its execution
 time, and outcome, for example:

     libbpfjit_opt_ld_abs_3: [0.009291s] Passed.

 Adding a new atf-report output format with one just line per test
 program rather than one per test case would probably help, but I don't
 think it would be a good trade-off - I grep console logs for the
 outcomes of individual test cases on a regular basis, but the present
 problem of a kernel panic getting attributed to the wrong test has
 only happened once.
 -- 
 Andreas Gustafsson, gson@gson.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.