NetBSD Problem Report #55506
From kardel@Kardel.name Tue Jul 21 06:25:56 2020
Return-Path: <kardel@Kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id AAA741A9213
for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Jul 2020 06:25:56 +0000 (UTC)
Message-Id: <20200721062553.224A144B3E@Andromeda.Kardel.name>
Date: Tue, 21 Jul 2020 08:25:53 +0200 (CEST)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: gpioctl/mcp23s17gpio0/spi0 stalls on cv spixfr
X-Send-Pr-Version: 3.95
>Number: 55506
>Category: kern
>Synopsis: gpioctl/mcp23s17gpio0/spi0 stalls on cv spixfr
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jul 21 06:30:00 +0000 2020
>Closed-Date: Sun Aug 09 19:28:17 +0000 2020
>Last-Modified: Tue Aug 11 19:15:01 +0000 2020
>Originator: Frank Kardel
>Release: NetBSD 9.99.69
>Organization:
>Environment:
System: NetBSD assel 9.99.69 NetBSD 9.99.69 (ASSEL) #1: Mon Jul 20 14:14:32 CEST 2020 kardel@Andromeda:/src/NetBSD/cur/src/obj.evbarm/sys/arch/evbarm/compile/ASSEL evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
This issue has been observed since NetBSD 8.99.xx
The board is a Raspberry Pi 2 Model B Rev 1.1
A program is polling/setting gpio pins on a piface2 board (mcp23s17) gets stuck
on the cv spixfr after some hours (on 8.99.xx it was after some days).
In parallel to the gpio polling program the munin monitoring system executes
"gpioctl <device> <PIN-name>" commands for all pins every 5 minutes.
These gpioctl programs also get stuck (interruptable) at the gpio layer.
The driver is waiting in spi_wait and it cannot be interrupted or does not
time out. It seems that the state machine in spi.c/bcm2835_spi.c does never
reach the (st->st_flags & SPI_F_DONE) != 0 state spi_wait waits for.
Are we missing an interrupt and do we have another issue as a
state handling botch or a bug in the usage of the spi API in mcp23s17.c?
Reducing the clock to 1Mhz from 10Mhz didn't help. Looking at the broadcom
SPI programming notes did not show any obvious errors in the driver.
The stack trace is:
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
80 659 1 0 95 0 8096 1464 spixfr DXs ? 0:38.37 /usr/pkg/sbin/gpiomon -S /tmp/gpiomon -l
crash> bt/t 0t659
trace: pid 659 lid 1 at 0xba8f1b4c
0xba8f1b4c: mi_switch+0xc
0xba8f1b74: sleepq_block+0xb0
0xba8f1b9c: cv_wait+0xa0
0xba8f1bbc: spi_wait+0x3c
0xba8f1c5c: spi_send_recv+0xd4
0xba8f1c84: mcp23s17gpio_read+0x4c
0xba8f1c9c: mcp23s17gpio_gpio_pin_read+0x2c
0xba8f1cf4: gpioioctl+0x1e8
0xba8f1d1c: spec_ioctl+0xa8
0xba8f1d4c: VOP_IOCTL+0x4c
0xba8f1e24: vn_ioctl+0xc0
0xba8f1eec: sys_ioctl+0x420
0xba8f1fac: syscall+0x12c
crash>
>How-To-Repeat:
Mount a piface2 board on a Raspberry Pi 2 Model B Rev 1.1. run NetBSD >= 9.0 and
a gpio polling program getting and setting gpio values and a parallel program fetching
gpio values via gpioctl,
>Fix:
find the state handling issue?
workaround: add an emergency timeout after waiting a while?
>Release-Note:
>Audit-Trail:
From: "Frank Kardel" <kardel@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55506 CVS commit: src/sys
Date: Tue, 4 Aug 2020 13:20:45 +0000
Module Name: src
Committed By: kardel
Date: Tue Aug 4 13:20:45 UTC 2020
Modified Files:
src/sys/arch/arm/broadcom: bcm2835_spi.c
src/sys/dev/spi: spi.c spivar.h
Log Message:
Use mutex for lwp/interrupt coordination. using splX() simply does not work
on multiprocessor systems.
fixes PR kern/55506
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 src/sys/arch/arm/broadcom/bcm2835_spi.c
cvs rdiff -u -r1.14 -r1.15 src/sys/dev/spi/spi.c
cvs rdiff -u -r1.9 -r1.10 src/sys/dev/spi/spivar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Sun, 09 Aug 2020 19:28:17 +0000
State-Changed-Why:
fixed and pulled up to -9
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55506 CVS commit: [netbsd-9] src/sys
Date: Tue, 11 Aug 2020 19:13:43 +0000
Module Name: src
Committed By: martin
Date: Tue Aug 11 19:13:43 UTC 2020
Modified Files:
src/sys/arch/arm/broadcom [netbsd-9]: bcm2835_spi.c
src/sys/dev/spi [netbsd-9]: spi.c spivar.h
Log Message:
Pull up following revision(s) (requested by 1043):
sys/dev/spi/spivar.h: revision 1.10
sys/arch/arm/broadcom/bcm2835_spi.c: revision 1.7
sys/dev/spi/spi.c: revision 1.15
Use mutex for lwp/interrupt coordination. using splX() simply does not work
on multiprocessor systems.
fixes PR kern/55506
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.5.8.1 src/sys/arch/arm/broadcom/bcm2835_spi.c
cvs rdiff -u -r1.11 -r1.11.4.1 src/sys/dev/spi/spi.c
cvs rdiff -u -r1.7 -r1.7.4.1 src/sys/dev/spi/spivar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.