- 論壇徽章:
- 0
|
Solaris學(xué)習(xí)筆記(5)
作者: Badcoffee
Email: blog.oliver@gmail.com
Blog: http://blog.csdn.net/yayong
2007年2月
本文介紹使用kmdb和mdb調(diào)試Solaris內(nèi)核的基本方法,kmdb和mdb是Solaris默認(rèn)安裝的內(nèi)核模塊調(diào)試器,可以用于調(diào)試和定位內(nèi)核模塊及驅(qū)動(dòng)程序發(fā)生的錯(cuò)誤。本文僅用于學(xué)習(xí)交流目的,錯(cuò)誤再所難免,如果有勘誤或疑問(wèn)請(qǐng)與作者聯(lián)系。
關(guān)鍵詞:mdb/kmdb/panic/hung/crashdump/dump/kernel debug/Solaris/OpenSolaris
事后分析(Postmortem Debug)是目前主流的商業(yè)操作系統(tǒng)支持的特性之一,windows, Aix, Freebsd都支持CrashDump及事后分析,最近Linux也逐漸加入了Crashdump和分析工具的支持。了解內(nèi)核開(kāi)發(fā)的人都知道,很多內(nèi)核的bug都是很難重現(xiàn),或者說(shuō),是在某個(gè)特定條件下,在一個(gè)微小的時(shí)間窗口內(nèi)才可以重現(xiàn);另外,在重要的商業(yè)客戶(hù)的生產(chǎn)環(huán)境中,不大可能提供給內(nèi)核程序員調(diào)試這類(lèi)crash或者h(yuǎn)ung的機(jī)會(huì),因此,在客戶(hù)提供crash dump文件基礎(chǔ)上,進(jìn)行事后分析就成了解決此類(lèi)問(wèn)題的唯一途徑。
內(nèi)核開(kāi)發(fā),測(cè)試甚至使用中可能會(huì)遇到以下兩類(lèi)極端情況:
系統(tǒng)crash - 例如,Windows的藍(lán)屏,Unix的panic,Linux的opps;
系統(tǒng)hung - 例如,大家通常說(shuō)的死機(jī);
1. 系統(tǒng)Crash的分類(lèi)
System panics & bad traps
Watchdog resets
Dropping out (to boot PROM or bootstrap level)
關(guān)于Panic
1. 為保證數(shù)據(jù)完整性,避免系統(tǒng)進(jìn)入不可預(yù)知的錯(cuò)誤,系統(tǒng)會(huì)panic()
2. panic()只會(huì)在內(nèi)核空間調(diào)用,用戶(hù)程序觸發(fā)panic是可能的,但是只是觸發(fā)而已。
3.系統(tǒng)也會(huì)因?yàn)闄z測(cè)到硬件不應(yīng)該進(jìn)入某種狀態(tài)而panic(),叫bad trap.
panic()的主要工作
1.dump內(nèi)存到device(缺省是swap區(qū)).
2.dump CPU的所有寄存器到device.
3.重新啟動(dòng).
panic的過(guò)程包括
1. 打印實(shí)際panic的消息
2. 打印調(diào)用棧的backtrace
3. 內(nèi)存Dump的消息
4. 嘗試Reboot
2. 關(guān)于hung
系統(tǒng)hung的分類(lèi)
1. 死鎖(deadlock)問(wèn)題
2. 系統(tǒng)資源耗盡
3. 硬件問(wèn)題
發(fā)生hung怎么辦?
1. 確認(rèn)hung的現(xiàn)象,網(wǎng)絡(luò)服務(wù)/ping/console/是否可用?
2. 嘗試產(chǎn)生一個(gè)Crash Dump
如何產(chǎn)生Crash Dump?
1. SPARC系統(tǒng)下,嘗試激活OBP用sync命令產(chǎn)生crash dump;
2. x86系統(tǒng)如果啟動(dòng)時(shí)加載了kmdb,可以激活kmdb產(chǎn)生crash dump;
x86上如何設(shè)置啟動(dòng)時(shí)加載kmdb?
有兩種方法:
1. 修改grub的設(shè)置,加上 "-k", 然后reboot
# vi /boot/grub/menu.lst
#---------- ADDED BY BOOTADM - DO NOT EDIT ----------
title Solaris 10 11/06 X86
root (hd0,0,a)
kernel /platform/i86pc/multiboot -k
module /platform/i86pc/boot_archive
#---------------------END BOOTADM--------------------
2.或者在啟動(dòng)后,退到console模式,運(yùn)行如下命令:
# mdb -K
Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ crypto uppc ufs unix zfs krtld s1394 sppp nca uhci lofs
genunix ip usba specfs pcplusmp nfs random sctp cpu.AuthenticAMD.15 ]
[2]> :c
進(jìn)入kmdb后,內(nèi)核陷入斷點(diǎn),要用:c來(lái)恢復(fù)系統(tǒng)運(yùn)行。
x86下如何激活kmdb并產(chǎn)生crash dump?
kmdb如果已經(jīng)加載,可以用SHIFT+F1+a通過(guò)鍵盤(pán)來(lái)激活;
如果console重定向到SP或者tip line,需要查一下發(fā)送break信號(hào)的字符序列;
# kmdb: target stopped at:
kaif_enter+7: popfl
[3]>
出現(xiàn)提示符后,可以用$<systemdump命令產(chǎn)生crash dump:
[3]> $<systemdump
nopanicdebug: 0 = 0x1
panic[cpu3]/thread=c29b9de0: BAD TRAP: type=e (#pf Page fault) rp=c29d9f1c addr=0 occurred in module "<unknown>" due to a
NULL pointer dereference
sched: #pf Page fault
Bad kernel fault at addr=0x0
pid=0, pc=0x0, sp=0x202, eflags=0x10002
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 0 cr3: 1238c000
gs: 1b0 fs: 0 es: 160 ds: 160
edi: c22e0c80 esi: 20 ebp: c29d9f6c esp: c29d9f54
ebx: f9 edx: 0 ecx: c22e0c11 eax: fec8b7b0
trp: e err: 10 eip: 0 cs: 158
efl: 10002 usp: 202 ss: c29d9f74
c29d9e7c unix:die+a7 (e, c29d9f1c, 0, 3)
c29d9f08 unix:trap+1058 (c29d9f1c, 0, 3)
c29d9f1c unix:cmntrap+9a (1b0, 0, 160, 160, c)
c29d9f6c 0 (c29d9f7c, fe813ced,)
c29d9f74 genunix:kdi_dvec_enter+a (c29d9f88, fe813c8f,)
c29d9f7c unix:debug_enter+32 (0)
c29d9f88 unix:abort_sequence_enter+27 (0)
c29d9fb0 asy:async_rxint+1eb (c19c3d00, f9)
c29d9fd4 asy:asyintr+97 (c19c3d00, 0)
c29b9d5c unix:cmnint+1f7 (c29b01b0, c1600000,)
c29b9db8 unix:cpu_halt+f6 (0, 0, c29b9dd8, fe
c29b9dc8 unix:idle+dc (0, 0)
c29b9dd8 unix:thread_start+8 ()
syncing file systems... done
dumping to /dev/dsk/c0t0d0s1, offset 860356608, content: kernel
100% done: 182452 pages dumped, compression ratio 6.00, dump succeeded
rebooting...
3. 關(guān)于savecore
savecore - 啟動(dòng)時(shí)將panic()例程存在dump device(swap區(qū))上的image保存成文件,存在指定的文件系統(tǒng)的目錄上。
用dumpadm(1M)可以查看dump device和crash dump的保存路徑;
在/var/crash/<hostname>下,可以看到以下幾類(lèi)文件:
unix.X 和 vmunix.X - crash dump文件,其中X是數(shù)字,是dump文件的序號(hào)
bounds - 用來(lái)記錄序號(hào),確定下一次dump的序號(hào)
4. 常用kmdb和mdb命令
大部分kmdb和mdb的命令是一樣的,事后分析就是用mdb來(lái)檢查crash dump文件,找到系統(tǒng)crash或者h(yuǎn)ung的原因。
分析crash dump的第一步就是收集必要的信息:
1. 主機(jī)名(hostname)和操作系統(tǒng)版本
::satus
::showrev
2. 系統(tǒng)硬件信息(hardware configuration)
::prtconf
3. 模塊或驅(qū)動(dòng)信息
::modinfo
4. Crash時(shí)系統(tǒng)消息緩沖區(qū)的消息
該消息緩沖區(qū)是ring buffer,有很多有價(jià)值的信息,可以知道系統(tǒng)crash時(shí)或者之前很長(zhǎng)一段時(shí)間的系統(tǒng)消息。
::msgbuf
進(jìn)一步分析,可能需要查看以下幾方面
1. 調(diào)用棧的backtrace
$c
::stack
::stackregs
2. 內(nèi)核符號(hào)表
::nm
3. 反匯編
<內(nèi)核函數(shù)>::dis
4. CPU寄存器
::regs
5. 調(diào)度隊(duì)列(dispatch queue)
::cpuinfo -v
6. 物理內(nèi)存及slab子系統(tǒng)
::memstat
::kmastat
7. 系統(tǒng)中所有進(jìn)程
::ps
8. 所有內(nèi)核線(xiàn)程
::threadlist
9. 線(xiàn)程狀態(tài)
<kthread_t的地址>::thread
10. 某個(gè)內(nèi)核線(xiàn)程調(diào)用棧
<kthread_t的地址>::findstack -v
<proc_t的地址>::walk thread |::findstack -v
11. 同步對(duì)象的狀態(tài)
<mutex地址>::mutex
<讀寫(xiě)鎖的地址>::rwlock
12. 地址引用查找
<地址>::kgrep
<地址>::whatthread
5. 案例分析
剛買(mǎi)了臺(tái)Dell OptiPlex 740,因?yàn)閟olaris還不支持板載的1000M網(wǎng)卡,所以只好到www.broadcom.com去下載了一個(gè)bcme的驅(qū)動(dòng)來(lái)用。幾天后,發(fā)現(xiàn)在家遠(yuǎn)程訪(fǎng)問(wèn)機(jī)器時(shí),某個(gè)特定的操作會(huì)引起系統(tǒng)重啟,通過(guò)查看系統(tǒng)日志發(fā)現(xiàn)時(shí)系統(tǒng)panic了;
在我的桌面機(jī)上,現(xiàn)在可以找到4次panic的crash dump文件:
# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c0d0s1 (swap)
Savecore directory: /var/crash/palace
Savecore enabled: yes
# cd /var/crash/palace
# ls
bounds unix.0 unix.1 unix.2 unix.3 vmcore.0 vmcore.1 vmcore.2 vmcore.3
用mdb打開(kāi)其中一個(gè)(序號(hào)0的crash dump)來(lái)查看:
# mdb 0
Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba fctl nca lofs zfs random sppp crypto ptm md cpc fcip fcp logindmux ipc nfs audiosup ]
首先,查看消息緩沖區(qū),看panic之前發(fā)生了什么:
> ::msgbuf
MESSAGE
dtrace0 is /pseudo/dtrace@0
pseudo-device: zfs0
zfs0 is /pseudo/zfs@0
pseudo-device: devinfo0
devinfo0 is /pseudo/devinfo@0
xsvc0 at root: space 0 offset 0
xsvc0 is /xsvc@0,0
pseudo-device: rsm0
rsm0 is /pseudo/rsm@0
pseudo-device: pseudo1
pseudo1 is /pseudo/zconsnex@1
pcplusmp: asy (asy) instance 0 vector 0x4 ioapic 0x2 intin 0x4 is bound to cpu 1
ISA-device: asy0
panic[cpu0]/thread=ffffff0003cb9c80:
BAD TRAP: type=e (#pf Page fault) rp=ffffff0003cb9000 addr=86 occurred in module "genunix" due to a NULL pointer dereference
.......................................
............................................
....................................
sched:
#pf Page fault
Bad kernel fault at addr=0x86
pid=0, pc=0xfffffffffba189ff, sp=0xffffff0003cb90f0, eflags=0x10286
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 86 cr3: 2c00000 cr8: c
rdi: 66 rsi: 9 rdx: 66
rcx: 9 r8: 180 r9: fffffffec02023c8
rax: 0 rbx: 18 rbp: ffffff0003cb9100
r10: 2000 r11: 1 r12: fffffffec1dc8fb0
r13: 18 r14: fffffffec1dc9298 r15: fffffffec1dc9288
fsb: 0 gsb: fffffffffbc292d0 ds: 4b
es: 4b fs: 0 gs: 1c3
trp: e err: 0 rip: fffffffffba189ff
cs: 30 rfl: 10286 rsp: ffffff0003cb90f0
ss: 38
ffffff0003cb8ee0 unix:die+c8 ()
ffffff0003cb8ff0 unix:trap+135c ()
ffffff0003cb9000 unix:cmntrap+e9 ()
ffffff0003cb9100 genunix:ddi_dma_unbind_handle+f ()
ffffff0003cb91a0 bcme:bcme_tx_sctgth+305 ()
ffffff0003cb91f0 bcme:UM_SendPacketIntel+56 ()
ffffff0003cb9230 bcme:UM_SendMBLKPacket+f2 ()
ffffff0003cb9280 bcme:UM_SendPacketsMP+91 ()
ffffff0003cb92a0 bcme:UM_ProcessSendPackets+66 ()
ffffff0003cb92d0 bcme:UM_SendPacket+9e ()
ffffff0003cb92f0 bcme:t3TxPacket+bb ()
ffffff0003cb9320 bcme:bcme_wput+77 ()
ffffff0003cb9390 unix:putnext+22b ()
ffffff0003cb9470 ip:tcp_send_data+72a ()
ffffff0003cb95a0 ip:tcp_send+a7b ()
ffffff0003cb9680 ip:tcp_wput_data+77f ()
ffffff0003cb97e0 ip:tcp_rput_data+2b99 ()
ffffff0003cb9820 ip:tcp_input+4a ()
ffffff0003cb98a0 ip:squeue_enter_chain+11d ()
ffffff0003cb9990 ip:ip_input+96f ()
ffffff0003cb99e0 ip:ip_rput+119 ()
ffffff0003cb9a50 unix:putnext+22b ()
ffffff0003cb9ac0 bcme:t3SendUp+2dd ()
ffffff0003cb9ae0 bcme:t3ProcessRxPacket+98 ()
ffffff0003cb9b10 bcme:bcme_recv+87 ()
ffffff0003cb9b40 bcme:LM_RxPackets_Service+56 ()
ffffff0003cb9b60 bcme:LM_ISR_ServiceSoftInt+29 ()
ffffff0003cb9bc0 bcme:bcme_intr+30e ()
ffffff0003cb9c20 unix:av_dispatch_autovect+78 ()
ffffff0003cb9c60 unix:dispatch_hardint+2f ()
ffffff0003c05ac0 unix:switch_sp_and_call+13 ()
ffffff0003c05b10 unix:do_interrupt+9b ()
ffffff0003c05b20 unix:cmnint+ba ()
ffffff0003c05c10 unix:mach_cpu_idle+6 ()
ffffff0003c05c40 unix:cpu_idle+c8 ()
ffffff0003c05c60 unix:idle+10e ()
ffffff0003c05c70 unix:thread_start+8 ()
syncing file systems...
4
done
dumping to /dev/dsk/c0d0s1, offset 860356608, content: kernel
可以從panic的調(diào)用棧看出,這個(gè)panic和網(wǎng)絡(luò)有關(guān),涉及的模塊有unix, bcme, ip, genunix;
操作系統(tǒng)版本:
> ::showrev
Hostname: palace
Release: 5.11
Kernel architecture: i86pc
Application architecture: amd64
Kernel version: SunOS 5.11 i86pc snv_57
Platform: i86pc
我們知道,這臺(tái)機(jī)器是amd64 CPU, 操作系統(tǒng)是Solaris 11 build 57.
因?yàn)閎cme是硬件驅(qū)動(dòng),因此下面的信息有助于我們了解驅(qū)動(dòng)的基本情況;
網(wǎng)卡信息(通過(guò)vendor id和device id: pciex14e4,167a, 可以查到網(wǎng)卡具體型號(hào)):
> ::prtconf ! grep bcme
fffffffec01f09b0 pciex14e4,167a, instance #0 (driver name: bcme)
網(wǎng)卡驅(qū)動(dòng)版本(v10.0.3):
> ::modinfo ! grep bcme
140 fffffffff7fd2000 1d3d0 1 bcme (Broadcom GbE Driver v10.0.3)
系統(tǒng)panic的消息已經(jīng)告訴我們這是一個(gè)bad trap:
> ::status
debugging crash dump vmcore.0 (64-bit) from palace
operating system: 5.11 snv_57 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=ffffff0003cb9000 addr=86 occurred in module "genunix" due to a NULL pointer dereference
dump content: kernel pages only
這個(gè)bad trap實(shí)際上就是Page fault時(shí),訪(fǎng)問(wèn)了一個(gè)NULL指針,而且addr=86,已經(jīng)很明顯不是一個(gè)合法的地址。
讓我們驗(yàn)證一下,首先,查看調(diào)用棧的詳細(xì)信息:
> ::stackregs
ffffff0003cb9100 ddi_dma_unbind_handle+0xf(66)
ffffff0003cb91a0 bcme_tx_sctgth+0x305()
ffffff0003cb91f0 UM_SendPacketIntel+0x56()
ffffff0003cb9230 UM_SendMBLKPacket+0xf2()
ffffff0003cb9280 UM_SendPacketsMP+0x91()
ffffff0003cb92a0 UM_ProcessSendPackets+0x66()
ffffff0003cb92d0 UM_SendPacket+0x9e()
ffffff0003cb92f0 t3TxPacket+0xbb()
ffffff0003cb9320 bcme_wput+0x77()
ffffff0003cb9390 putnext+0x22b(fffffffec1954638, fffffffee22bf300)
ffffff0003cb9470 tcp_send_data+0x72a(fffffffec82452c0, fffffffee21e3650, fffffffee22bf300)
ffffff0003cb95a0 tcp_send+0xa7b(fffffffee21e3650, fffffffec82452c0, 4ec, 28, 14, 0, ffffff0003cb965c, ffffff0003cb9660,
ffffff0003cb9664, ffffff0003cb9618, 13a00be, 7fffffff)
ffffff0003cb9680 tcp_wput_data+0x77f(fffffffec82452c0, 0, 0)
ffffff0003cb97e0 tcp_rput_data+0x2b99(fffffffec82450c0, fffffffee2240f80, fffffffec1563f00)
ffffff0003cb9820 tcp_input+0x4a(fffffffec82450c0, fffffffee2240f80, fffffffec1563f00)
ffffff0003cb98a0 squeue_enter_chain+0x11d(fffffffec1563f00, fffffffee2240f80, fffffffee2240f80, 1, 1)
ffffff0003cb9990 ip_input+0x96f(fffffffec1b9f428, 0, fffffffee2240f80, 0)
ffffff0003cb99e0 ip_rput+0x119(fffffffec1954540, fffffffee2240f80)
ffffff0003cb9a50 putnext+0x22b(fffffffec19547d0, fffffffee2240f80)
ffffff0003cb9ac0 t3SendUp+0x2dd()
ffffff0003cb9ae0 t3ProcessRxPacket+0x98()
ffffff0003cb9b10 bcme_recv+0x87()
ffffff0003cb9b40 LM_RxPackets_Service+0x56()
ffffff0003cb9b60 LM_ISR_ServiceSoftInt+0x29()
ffffff0003cb9bc0 bcme_intr+0x30e()
ffffff0003cb9c20 av_dispatch_autovect+0x78(10)
ffffff0003cb9c60 dispatch_hardint+0x2f(10, 0)
ffffff0003c05ac0 switch_sp_and_call+0x13()
ffffff0003c05b10 do_interrupt+0x9b(ffffff0003c05b20, 1)
ffffff0003c05b20 _interrupt+0xba()
ffffff0003c05c10 mach_cpu_idle+6()
ffffff0003c05c40 cpu_idle+0xc8()
ffffff0003c05c60 idle+0x10e()
ffffff0003c05c70 thread_start+8()
mdb不但打出了函數(shù)名,而且有些函數(shù)的參數(shù)都得到了,其中ddi_dma_unbind_handle的參數(shù)是0x66,panic時(shí),系統(tǒng)正在執(zhí)行的指令時(shí)ddi_dma_unbind_handle+0xf,我們可以反匯編這個(gè)函數(shù)看看該指令是什么?
> ddi_dma_unbind_handle::dis
ddi_dma_unbind_handle: pushq %rbp
ddi_dma_unbind_handle+1: movq %rsp,%rbp
ddi_dma_unbind_handle+4: subq $0x10,%rsp
ddi_dma_unbind_handle+8: movq %rdi,-0x8(%rbp)
ddi_dma_unbind_handle+0xc: movq %rdi,%rdx
ddi_dma_unbind_handle+0xf: movq 0x20(%rdx),%rsi
ddi_dma_unbind_handle+0x13: movq 0x98(%rsi),%rdi
ddi_dma_unbind_handle+0x1a: xorl %eax,%eax
ddi_dma_unbind_handle+0x1c: call *0xe8(%rsi)
ddi_dma_unbind_handle+0x22: leave
ddi_dma_unbind_handle+0x23: ret
反匯編的結(jié)果有時(shí)會(huì)比較難看懂,好在OpenSolaris已經(jīng)開(kāi)放源代碼了,我們可以對(duì)照代碼:
http://cvs.opensolaris.org/sourc ... i_dma_unbind_handle
int
ddi_dma_unbind_handle(ddi_dma_handle_t h)
{
ddi_dma_impl_t *hp = (ddi_dma_impl_t *)h;
dev_info_t *hdip, *dip;
int (*funcp)(dev_info_t *, dev_info_t *, ddi_dma_handle_t);
dip = hp->dmai_rdip;
hdip = (dev_info_t *)DEVI(dip)->devi_bus_dma_unbindhdl;
funcp = DEVI(dip)->devi_bus_dma_unbindfunc;
return ((*funcp)(hdip, dip, h));
}
不難看出,ddi_dma_unbind_handle+0xf指令正是以下這行,訪(fǎng)問(wèn)dmai_rdip成員的語(yǔ)句:
dip = hp->dmai_rdip;
可以用mdb驗(yàn)證一下:
> : ffsetof ddi_dma_impl_t dmai_rdip
offsetof (ddi_dma_impl_t, dmai_rdip) = 0x20
ddi_dma_unbind_handle+0xf對(duì)應(yīng)的指令是:
ddi_dma_unbind_handle+0xf: movq 0x20(%rdx),%rsi
%rdx寄存器的值可以從以下命令得到:
> ::regs
%rax = 0x0000000000000000 %r9 = 0xfffffffec02023c8
%rbx = 0x0000000000000018 %r10 = 0x0000000000002000
%rcx = 0x0000000000000009 %r11 = 0x0000000000000001
%rdx = 0x0000000000000066 %r12 = 0xfffffffec1dc8fb0
%rsi = 0x0000000000000009 %r13 = 0x0000000000000018
%rdi = 0x0000000000000066 %r14 = 0xfffffffec1dc9298
%r8 = 0x0000000000000180 %r15 = 0xfffffffec1dc9288
%rip = 0xfffffffffba189ff ddi_dma_unbind_handle+0xf
%rbp = 0xffffff0003cb9100
%rsp = 0xffffff0003cb90f0
%rflags = 0x00010286
id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
status=<of,df,IF,tf,SF,zf,af,PF,cf>
%cs = 0x0030 %ds = 0x004b %es = 0x004b
%trapno = 0xe %fs = 0x0000 %gs = 0x01c3
%err = 0x0
所以ddi_dma_unbind_handle+0xf指令實(shí)際上是去訪(fǎng)問(wèn)0x20(%rdx)對(duì)應(yīng)的地址而發(fā)生的bad trap, 即addr=88:
> 0x66+0x20/X
mdb: failed to read data from target: no mapping for address
0x86:
再看一下ddi_dma_unbind_handle(9F)里的說(shuō)明,這個(gè)函數(shù)幾乎是所有的使用dma的驅(qū)動(dòng)都需要用到的:
Kernel Functions for Drivers ddi_dma_unbind_handle(9F)
NAME
ddi_dma_unbind_handle - unbinds the address in a DMA handle
SYNOPSIS
#include <sys/ddi.h>
#include <sys/sunddi.h>
int ddi_dma_unbind_handle(ddi_dma_handle_t handle);
PARAMETERS
handle The DMA handle previously allocated by a call to
ddi_dma_alloc_handle(9F).
可見(jiàn),在bcme驅(qū)動(dòng)調(diào)用ddi_dma_unbind_handle時(shí),傳遞的handle可能已經(jīng)被釋放,或者是一個(gè)非法的地址,可惜我沒(méi)有bcme驅(qū)動(dòng)的源代碼,不然也許可以進(jìn)一步定位錯(cuò)誤,因?yàn)殡m然mdb沒(méi)有自動(dòng)找到bcme函數(shù)的參數(shù),但是因?yàn)槲覀冇衎cme的棧指針,或許可以從棧中找到已經(jīng)被保存的函數(shù)參數(shù),當(dāng)然,這要對(duì)AMD64的ABI比較熟悉:
> 0xffffff0003cb90f0,30/nap
0xffffff0003cb90f0:
0xffffff0003cb90f0: 0xfffffffff7fe9d38
0xffffff0003cb90f8: 0x66
0xffffff0003cb9100: 0xffffff0003cb91a0
0xffffff0003cb9108: bcme_tx_sctgth+0x305
0xffffff0003cb9110: 0xfffffffee22bf300
0xffffff0003cb9118: 0x522
0xffffff0003cb9120: 0xfffffffec1dab000
0xffffff0003cb9128: 0xfffffffec1dc8fb0
0xffffff0003cb9130: 0
0xffffff0003cb9138: QQ_PushTail+0x57
0xffffff0003cb9140: 0xfffffffec1db39e0
0xffffff0003cb9148: 0x48a22ddc
0xffffff0003cb9150: 0x28
0xffffff0003cb9158: 0xbaddcafe00000000
0xffffff0003cb9160: 0xffffff0003cb91a0
0xffffff0003cb9168: 0x522f7fe488d
0xffffff0003cb9170: 0xfffffffee22c8880
0xffffff0003cb9178: 0xfffffffee22b6ddc
0xffffff0003cb9180: 0xfffffffec1db3d68
0xffffff0003cb9188: 0x28
0xffffff0003cb9190: 0xfffffffec1dab000
0xffffff0003cb9198: 0x3c
0xffffff0003cb91a0: 0xffffff0003cb91f0
0xffffff0003cb91a8: UM_SendPacketIntel+0x56
0xffffff0003cb91b0: 0xfffffffec1dab000
0xffffff0003cb91b8: 0xfffffffee22bf300
0xffffff0003cb91c0: 0
0xffffff0003cb91c8: 0x3c
0xffffff0003cb91d0: 0x522
0xffffff0003cb91d8: 0x3c
0xffffff0003cb91e0: 0x522
0xffffff0003cb91e8: 0xfffffffec1dab040
0xffffff0003cb91f0: 0xffffff0003cb9230
0xffffff0003cb91f8: UM_SendMBLKPacket+0xf2
0xffffff0003cb9200: 0xfffffffec1db3f30
0xffffff0003cb9208: 0xfffffffec1dab000
0xffffff0003cb9210: 0xfffffffee22bf300
0xffffff0003cb9218: 0xfffffffee22bf300
0xffffff0003cb9220: 0xfffffffec1db39f0
0xffffff0003cb9228: 0xfffffffee3be9600
0xffffff0003cb9230: 0xffffff0003cb9280
0xffffff0003cb9238: UM_SendPacketsMP+0x91
0xffffff0003cb9240: 0xfffffffec1954638
0xffffff0003cb9248: 0xfffffffee22bf300
0xffffff0003cb9250: 0xfffffffec1dab000
0xffffff0003cb9258: 0xfffffffec1db3f20
0xffffff0003cb9260: 0xfffffffee22bf300
即使沒(méi)有源代碼,我們也知道,這是bcme驅(qū)動(dòng)的一個(gè)bug,我會(huì)設(shè)法聯(lián)系broadcom公司的人報(bào)告一個(gè)bug,并且我可以提供crash dump文件供他們分析和解決這個(gè)問(wèn)題。
相關(guān)文檔:
Solaris學(xué)習(xí)筆記(4)
Solaris學(xué)習(xí)筆記(3)
Solaris學(xué)習(xí)筆記(2)
Solaris學(xué)習(xí)筆記(1)
X86匯編語(yǔ)言學(xué)習(xí)手記(3)
X86匯編語(yǔ)言學(xué)習(xí)手記(2)
X86匯編語(yǔ)言學(xué)習(xí)手記(1)
--------------------------------------------------------------------------------
Technorati Tag: OpenSolaris
Technorati Tag: Solaris |
|