NetBSD Problem Report #39526
From Wolfgang.Stukenbrock@nagler-company.com Fri Sep 12 13:47:01 2008
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id A7DEC63B92A
for <gnats-bugs@gnats.NetBSD.org>; Fri, 12 Sep 2008 13:47:01 +0000 (UTC)
Message-Id: <20080912134653.8F1D84EAA0D@s012.nagler-company.com>
Date: Fri, 12 Sep 2008 15:46:53 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: ahd driver crashes system if it runs out of memory
X-Send-Pr-Version: 3.95
>Number: 39526
>Category: kern
>Synopsis: ahd driver crashes system if it runs out of memory
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Sep 12 13:50:00 +0000 2008
>Last-Modified: Sun Feb 15 04:31:13 +0000 2009
>Originator: Wolfgang Stukenbrock
>Release: NetBSD 4.0
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD s012 4.0 NetBSD 4.0 (NSW-S012) #1: Thu Sep 11 12:21:03 CEST 2008 root@s012:/usr/src/sys/arch/amd64/compile/NSW-S012 amd64
Architecture: x86_64
Machine: amd64
>Description:
I'm running an 29320ALP-R in PCI-X Slot on an Intel Motherboard with 8 GB Ram installed.
(Intel S3210SHLX with E3110 (with the 6MB-Chache fix - see PR kern/39242))
If the system starts using the memory bejong 4 GB the ahd-driver fails to allocate memory for some
structures. The following messages occures:
ahd0: failed to create DMA map for Sense Data structures, error = 12
ahd_createdmamem error (2)
ahd0: failed to create DMA map for SG data structures, error = 12
ahd_createdmamem error (2)
ahd0: failed to create DMA map for hardware SCB structures, error = 12
ahd_createdmamem error (2)
I've added a debug print to the uvm_pglistalloc_simple routine in order to get the cause of the problem and that
reports the following: (example - the number of free pages will vary over time ..)
plistalloc - waiting orig num 1 - num 1 low 0x1000000 high 0x100000000 - free 162 pd_res 1 kres 5
This means, that there was a request for one page, one page is still not found.
The pages should be between 0x1000000 and 0x100000000.
The vm-stat structure reports that there are still 162 pages free, one is reseverd for pagedaemon and
5 pages are reserved for kernel.
So there is physical memory available, but it is outside the range of the requested range.
Therefore the error is returned to the ahd driver - it alwas failes in dma-mem-alloc().
I've tried to enable 64-Bit access by setting AHD_64BIT_ADDRESSING but that does not help.
(I've set the flag if we found a PCI_CAP_PCIX capability as a first try.)
For unknown reasons the AHD_64BIT_ADDRESSING is defined and used in several places, but the code
will never set it. Is this a bug/feature and setting the flag is just missing somewhere or is
the flag obsolete?
Without setting this flag there are much more memory allocation failure, so I thing setting the flag
is just lost in the code by some other changes in the past.
Neverless the crash described below happens all the time.
After repeating the 6 error messages above some times, the following output is on the console:
ahd0: failed to create DMA map for SG data structures, error = 12
ahd_createdmamem error (2)
ahd0: failed to create DMA map for Sense Data structures, error = 12
ahd_createdmamem error (2)
uvm_fault(0xffffffff80606ee0, 0xffff80009b797000, 2) -> e
kernel: page fault trap, code=0
Stopped in pid 9.1 (scsibus0) at netbsd:ahd_alloc_scbs+0x1ed: repe stosq %es:(%rdi)
The system just tries to setup an additional request.
The number of repeated output of the 6 lines above will vary until the crash happens.
An other point that I don't understand is the fact, that even if 64bit adressing is allowed, the
requested range is below 4GB. Is this a bug, or is it nessesary that dma-memory is below 4GB even
with 64bit adressing?
>How-To-Repeat:
Install an ahd-controler (e.g. 29320ALP-R) in a system with 8 GB memory and bring it to heavy load.
After some minutes you will see the crash.
>Fix:
not known to me up to now - sorry.
All I've tried up to now failed to solve the problem at all - I've only some parital success.
Setting AHD_64BIT_ADDRESSING in ahd_pci_attach() at line 403 in sys/dev/pci/ahd_pci.c after
"if (!pci_get_capability(pa->pa_pc, pa->pa_tag, PCI_CAP_PCIX, &bd->pcix_off, NULL))" has succeded, will
remove lots of failed allocation for transfers, but the system will crash anyway after some failed
dma-memory alloction attempts.
>Release-Note:
>Audit-Trail:
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: Wolfgang.Stukenbrock@nagler-company.com
Subject: Re: kern/39526: ahd driver crashes system if it runs out of memory
Date: Thu, 25 Sep 2008 15:23:38 +0200
Hi,
after some additional testing the follwoing patch will lead to a
situation, where the controler can be used in a 64-Bit system with 8GB
main memory without crashing the system shortly after the lower 4GB
memory is used by proesses.
It enables the already present 64-Bit memory access modes of the driver.
This will remove the usage of the upper nibble of the length field for
additional address bits and uses a real 64-Bit pointer.
I don't understand why this change changes anything, because the no
change at all is done to the problematic 32-Bit DMA-structure alloc.
And there the alloc failures will happen ...
But with this fix there will be much less alloc failures as without it.
This fix will not solve the problem at all!
If there is lots of trafic on the SCSI-Bus prior all of the lower 4GB
memory is used up, no problem occures anymore, but it IO on the
controler starts after the lower 4GB memory is used by processes, the
system will crash after a short time.
rcsdiff -r1.1 -u ahd_pci.c
===================================================================
RCS file: RCS/ahd_pci.c,v
retrieving revision 1.1
diff -u -r1.1 ahd_pci.c
--- ahd_pci.c 2008/09/12 14:37:11 1.1
+++ ahd_pci.c 2008/09/25 13:11:06
@@ -400,6 +400,12 @@
ahd->chip |= AHD_PCI;
ahd->bugs &= ~AHD_PCIX_BUG_MASK;
}
+ else if (devconfig & PCI64BIT) {
+ ahd->flags |= AHD_64BIT_ADDRESSING;
+ aprint_normal("\n%s: using 64 Bit addressing modes",
ahd_name(ahd));
+ } else {
+ aprint_normal("\n%s: using normal addressing modes - not
on 64 Bit bus", ahd_name(ahd));
+ }
/*
* Map PCI Registers
@@ -502,7 +508,8 @@
if ((ahd->flags & (AHD_39BIT_ADDRESSING|AHD_64BIT_ADDRESSING))
!= 0) {
uint32_t dvconfig;
- aprint_normal("%s: Enabling 39Bit Addressing\n",
ahd_name(ahd));
+ aprint_normal("%s: Enabling %sBit Addressing\n",
ahd_name(ahd),
+ ((ahd->flags & AHD_64BIT_ADDRESSING) ? "64" : "39"));
dvconfig = pci_conf_read(pa->pa_pc, pa->pa_tag, DEVCONFIG);
dvconfig |= DACEN;
pci_conf_write(pa->pa_pc, pa->pa_tag, DEVCONFIG, dvconfig);
gnats-admin@NetBSD.org wrote:
> Thank you very much for your problem report.
> It has the internal identification `kern/39526'.
> The individual assigned to look at your
> report is: kern-bug-people.
>
>
>>Category: kern
>>Responsible: kern-bug-people
>>Synopsis: ahd driver crashes system if it runs out of memory
>>Arrival-Date: Fri Sep 12 13:50:00 +0000 2008
>>
>
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/39526: ahd driver crashes system if it runs out of memory
Date: Fri, 26 Sep 2008 13:48:36 +0200
Hi again,
now I've a workaround that avoids crashing the system by the ahd driver.
The workaround will allocate all SCB structures during startup if 64-bit
bus is detected. By dooing this it avoids any call to GROW the number of
SCB's later.
That is the place where no memory was found and after some failed
allocations a kernel crash with "fatal page fault in supervisour mode"
happens.
Wit this fix I've go no longer the kernel panic.
So this workaround leads to a stable running driver, but it does not fix
the main cause of the problem!
There seems to be something wrong in either this driver or the uvm stuff!
I assume that the problem is in the uvm stuff, because I've a similar
problem with the ahc driver. It will crash too after more than 4G is
used by the system and the ahc driver will also try to allocate more SCB's.
Output of "diff -u" for ic/aic79xx.c ic/aic79xx_osm.c and pci/ahd_pci.c
below.
Please integrate this workaround into the next possible release. Thanks
in advance.
A similar fix should be added to the ahc driver to work around the
problem there.
W. Stukenbrock
in /usr/src/sys/dev/ic:
--- aic79xx.c 2008/09/26 11:19:07 1.1
+++ aic79xx.c 2008/09/26 11:26:57
@@ -5441,6 +5441,24 @@
ahd_name(ahd));
goto error_exit;
}
+/*
+ * Workaround for the "more than 4GB main memory" crash.
+ * We allocate all SCB's now - no additional request taht may fail
+ * later.
+ * This will work around a bug in either this driver or the uvm part.
+ * Without this, the system will crash after some failed GROW requests with
+ * a "fatal page fault in supoervisor mode".
+ *
+ * This fix requires setting the 64-Bit Flag in ahd_pci.c too.
+ * Perhaps the amount of main memory should be checked here too ..
+ *
+ W. Stukenbrock (09.2008)
+ */
+ if ((ahd->flags & AHD_64BIT_ADDRESSING) != 0) {
+ printf("%s: allocating all SCB entries - 64 bit workaround for 4G memory bug\n",
+ ahd_name(ahd));
+ while (ahd_alloc_scbs(ahd) != 0);
+ }
/*
* Note that we were successfull
===================================================================
--- aic79xx_osm.c 2008/09/26 11:19:07 1.1
+++ aic79xx_osm.c 2008/09/26 10:26:51
@@ -95,7 +95,9 @@
ahd->sc_channel.chan_ntargets = AHD_NUM_TARGETS;
ahd->sc_channel.chan_nluns = 8 /*AHD_NUM_LUNS*/;
ahd->sc_channel.chan_id = ahd->our_id;
- ahd->sc_channel.chan_flags |= SCSIPI_CHAN_CANGROW;
+/* do not wast time if everything is allread allocated ... */
+ if (ahd->scb_data.numscbs < AHD_SCB_MAX_ALLOC)
+ ahd->sc_channel.chan_flags |= SCSIPI_CHAN_CANGROW;
ahd->sc_child = config_found((void *)ahd, &ahd->sc_channel,
scsiprint);
===================================================================
in /usr/src/sys/dev/pci:
--- ahd_pci.c 2008/09/12 14:37:11 1.1
+++ ahd_pci.c 2008/09/26 11:17:42
@@ -400,6 +400,12 @@
ahd->chip |= AHD_PCI;
ahd->bugs &= ~AHD_PCIX_BUG_MASK;
}
+ else if (devconfig & PCI64BIT) {
+ ahd->flags |= AHD_64BIT_ADDRESSING;
+ } else {
+ aprint_normal("\n%s: WARNING: not on 64 bit Bus - may be
a problem with more than 4GB main memory",
+ ahd_name(ahd));
+ }
/*
* Map PCI Registers
@@ -502,7 +508,8 @@
if ((ahd->flags & (AHD_39BIT_ADDRESSING|AHD_64BIT_ADDRESSING))
!= 0) {
uint32_t dvconfig;
- aprint_normal("%s: Enabling 39Bit Addressing\n",
ahd_name(ahd));
+ aprint_normal("%s: Enabling %sBit Addressing\n",
ahd_name(ahd),
+ ((ahd->flags & AHD_64BIT_ADDRESSING) ? "64" : "39"));
dvconfig = pci_conf_read(pa->pa_pc, pa->pa_tag, DEVCONFIG);
dvconfig |= DACEN;
pci_conf_write(pa->pa_pc, pa->pa_tag, DEVCONFIG, dvconfig);
===================================================================
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
Wolfgang.Stukenbrock@nagler-company.com
Subject: Re: kern/39526: ahd driver crashes system if it runs out of memory
Date: Fri, 26 Sep 2008 22:32:07 +0200
On Fri, Sep 26, 2008 at 11:50:04AM +0000, Wolfgang Stukenbrock wrote:
> The following reply was made to PR kern/39526; it has been noted by GNATS.
>
> From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/39526: ahd driver crashes system if it runs out of memory
> Date: Fri, 26 Sep 2008 13:48:36 +0200
>
> Hi again,
>
> now I've a workaround that avoids crashing the system by the ahd driver.
>
> The workaround will allocate all SCB structures during startup if 64-bit
> bus is detected. By dooing this it avoids any call to GROW the number of
> SCB's later.
> That is the place where no memory was found and after some failed
> allocations a kernel crash with "fatal page fault in supervisour mode"
> happens.
>
> Wit this fix I've go no longer the kernel panic.
> So this workaround leads to a stable running driver, but it does not fix
> the main cause of the problem!
> There seems to be something wrong in either this driver or the uvm stuff!
>
> I assume that the problem is in the uvm stuff, because I've a similar
> problem with the ahc driver. It will crash too after more than 4G is
> used by the system and the ahc driver will also try to allocate more SCB's.
No, it's a bug in both ahc and ahd, which don't handle properly errors
from bus_dma().
> in /usr/src/sys/dev/ic:
>
>
> --- aic79xx.c 2008/09/26 11:19:07 1.1
> +++ aic79xx.c 2008/09/26 11:26:57
> @@ -5441,6 +5441,24 @@
> ahd_name(ahd));
> goto error_exit;
> }
> +/*
> + * Workaround for the "more than 4GB main memory" crash.
> + * We allocate all SCB's now - no additional request taht may fail
> + * later.
> + * This will work around a bug in either this driver or the uvm part.
> + * Without this, the system will crash after some failed GROW requests with
> + * a "fatal page fault in supoervisor mode".
> + *
> + * This fix requires setting the 64-Bit Flag in ahd_pci.c too.
> + * Perhaps the amount of main memory should be checked here too ..
> + *
> + W. Stukenbrock (09.2008)
> + */
> + if ((ahd->flags & AHD_64BIT_ADDRESSING) != 0) {
> + printf("%s: allocating all SCB entries - 64 bit workaround for 4G memory bug\n",
> + ahd_name(ahd));
> + while (ahd_alloc_scbs(ahd) != 0);
> + }
Doing this based on AHD_64BIT_ADDRESSING is wrong. It will orrur as
well, if not more, if the adapter can't handle 64bits addresses.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/39526: ahd driver crashes system if it runs out of memory
Date: Mon, 29 Sep 2008 13:57:33 +0200
Hi again.
OK if this is a "known" bug in the ahd and ahc driver, than the
workaround by allocating all possible SCB-structures during startup
should be added in all cases to both drivers - or the main cause of the
problem should be fixed in them as soon as possible. (Sorry, I've not
enougth time to go deeper into that and my knowlegde of the
dma-subsystem (interaction with CPU-cache, PCI-related things, ...) is
poor at this time ... With a short look at the code, I've seen no error
in the calls to free for the partial allocated dma-memory.)
It is no real good idea to have an instable driver in the kernel that
may crash the system if SCSI traffic starts after some large processes
eat up all memory.
And the adaptec cards are used by many people. I've also came back to
them myself, because e.g. the mpt-driver does not support any of the LSI
SCSI-Cards currently sold by LSI ... And I've found no other
SCSI-PCI-X-Cards that are available from whatever vendor in europe taht
are supported by NetBSD.
The fastes way to work around the crashes temporary would be, to
allocate all SCB's at startup.
The workaround should just contain the while loop to allocate all SCB's
from my patch below:
e.g. "while (ahd_alloc_scbs(ahd) != 0);".
Neverless there is a problem with pagedaemon in the kernel, because
pagedaemon ignores any range information of the failed allocation-request.
I've got the system freezing up with the ahc driver if all lower 4GB
memory is allocated, but some memory above 4GB is free.
The command "top" in an other window shows 100% CPU for pagedaemon on on
CPU and 100% for the ahc drive on the second CPU just before nothing
works anymore.
Some debugging output shows that the ahc-driver tries to allocate some
pages that should reside below 4GB, but there is nothing free anymore.
It kicks pagedaemon and blocks. Pagedaemon have a short look at the
system statistics, sees that there is plenty of memory available. It
deciedes to do nothing, just wake up the ahc-driver and waits for the
next request. This is an endless loop ....
Best regards
W. Stukenbrock
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.