Oracle数据库:SMON: Parallel transaction recovery tried 引发

oracleerp 2012-07-26

SMON: Parallel transaction recovery tried 这个一般是在 具有在跑大数据量的 transaction的时候 kill 掉了进程而导致 smon 去清理 回滚段时导致的。

这个在 业务高峰期的时候,如果发现这个,有可能导致 SMON 占用了 100% cpu 而导致 系统 hang 在那边。

即使你 shutdown immediate ,Oracle 也会等待 smon 清理完毕才能关机,而这个等待过程也许是漫长的。

如果你 shutdown abort,那么oracle 会麻烦 shutdown ,但是,当你 startup的时候,有可能就会很慢,因为 smon 会接着清理 undo,这个等待过程也许是很漫长的:

——————————————————————————————————
Completed: ALTER DATABASE   MOUNT
Thu Aug 26 22:43:57 2010
ALTER DATABASE OPEN
Thu Aug 26 22:43:57 2010
Beginning crash recovery of 1 threads
Thu Aug 26 22:43:57 2010
Started first pass scan
Thu Aug 26 22:43:57 2010
Completed first pass scan
 402218 redo blocks read, 126103 data blocks need recovery
Thu Aug 26 22:45:05 2010
Restarting dead background process QMN0
QMN0 started with pid=16
Thu Aug 26 22:45:19 2010
Started recovery at
 Thread 1: logseq 13392, block 381202, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 3 Seq 13392 Reading mem 0
  Mem# 0 errs 0: /zxindata/oracle/redolog/redo03.dbf
Recovery of Online Redo Log: Thread 1 Group 1 Seq 13393 Reading mem 0
  Mem# 0 errs 0: /zxindata/oracle/redolog/redo01.dbf
Thu Aug 26 22:45:21 2010
Completed redo application
Thu Aug 26 22:48:35 2010
Ended recovery at
 Thread 1: logseq 13393, block 271434, scn 2623.1377219707
 126103 data blocks read, 115641 data blocks written, 402218 redo blocks read
Crash recovery completed successfully
________________________________________________
     看 红色标注的那个,等待了 3 分钟才做完 recovery。
    那如何才能让它快呢,metalink(238507.1) 有给出一些做法:
---------------------------------------------------------------------------------------------

1. Find SMON's Oracle PID:

Example:

SQL> select pid, program from v$process where program like '%SMON%';

       PID PROGRAM
---------- ------------------------------------------------
         6 oracle@stsun7 (SMON)

2. Disable SMON transaction cleanup:

SVRMGR> oradebug setorapid <SMON's Oracle PID>
SVRMGR> oradebug event 10513 trace name context forever, level 2

3. Kill the PQ slaves that are doing parallel transaction recovery.
You can check V$FAST_START_SERVERS to find these.

4. Turn off fast_start_parallel_rollback:

 alter system set fast_start_parallel_rollback=false;

If SMON is recovering, this command might hang, if it does just control-C out of it.  You may need to try this many times to get this to complete (between SMON cycles).

5. Re-enable SMON txn recovery:

SVRMGR> oradebug setorapid <SMON's Oracle PID>
SVRMGR> oradebug event 10513 trace name context off——————————————————————————————————
以上的思路主要是要把 SMON 并行 recovery 的功能给改成 非并行,主要是 fast_start_parallel_rollback 这个参数的作用。    There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the pq slaves are interfering with each other. This depends mainly on the type of changes that need to be made during rollback and usually may happen when rolling back INDEX Updates in parallel.

我们也可以通过 V$FAST_START_TRANSACTIONS.UNDOBLOCKSTOTAL 来查看需要 recover 的undo 的量,但是很遗憾,在我测试中,该视图始终为空。

另外:也许 SMON: Parallel transaction recovery tried 这个会伴随着 SMON: Restarting fast_start parallel rollback 一起出现(曾经在 接口机上看到 10.2.0.4)

相关推荐

Nexthop / 0评论 2019-10-20