How to set Execution plan for my sql

November 1, 2012, 5:17 am

≫ Next: PMON terminating the instance due to error 481

Most of us face with similar issue on our sql which are running on our databases. Sometimes some sql could choose different plan and performance goes to worst. Normally with correct plan sql took 5 min, but with wrong plan choose it took 20 minutes. In this article I am trying to explain how we can set execution plan for sql. Before start I can say there are some problem on 10g level such as even We set plan ,sql could use different plan. We have been opened SR for this issue and finally Support suggest us to use MOS 445126.1 for 10g. You can try below steps for 11g and also 10g .

My db version is 11.2.0.2 2 node RAC system on AIX 6.1

Here is the steps:

1. Firstly We need to know our SQL’s SQL_ID. This SQL_ID information will be use at step 2. For can find SQL_ID you can use
- Grid screen
- v$sql and v$sqlarea views

In this example, my SQL_ID is dp2sc627nnpd4

2. In this steps We will use coe_xfr_sql_profile.sql which maden by Oracle workers Carlos Sierra. This sql also avaliable at metalink.

You can use this sql please click on coe_xfr_sql_profile.sql for can get this sql

3. Now as sysdba I am invoking coe_xfr_sql_profile.sql. While I am running this sql, it will ask me some question such as:

As oracle user:

$cd /tmp
$ sqlplus “/as sysdba”
SQL> @coe_xfr_sql_profile.sql

Parameter 1:
SQL_ID (required)

Enter value for 1: dp2sc627nnpd4 — this info comes from step 1
PLAN_HASH_VALUE AVG_ET_SECS
————— ———–
436961061 10.3
2424534850 1354.994

– As you can see there are 2 plan_hash_value avaliable for this sql. I need to fix first one which give me best AVG_ET_SEC

Parameter 2:
PLAN_HASH_VALUE (required)

Enter value for 2: 436961061

Values passed:
~~~~~~~~~~~~~
SQL_ID : “dp2sc627nnpd4″
PLAN_HASH_VALUE: “436961061″
Execute coe_xfr_sql_profile_dp2sc627nnpd4_436961061.sql
on TARGET system in order to create a custom SQL Profile
with plan 436961061 linked to adjusted sql_text.
COE_XFR_SQL_PROFILE completed.

4. After We complate step 3 coe_xfr_sql_profile.sql will be create
coe_xfr_sql_profile_dp2sc627nnpd4_436961061.sql output file under /tmp (In step 3 I used cd /tmp command,so file has been created in this path)

5. coe_xfr_sql_profile_dp2sc627nnpd4_436961061.sql file open with vi editor. We need to change force_match parameter as TRUE

Orginal state
————–
category => ‘DEFAULT’,
validate => TRUE,
replace => TRUE,
force_match => FALSE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );
END;

After edit
————-
category => ‘DEFAULT’,
validate => TRUE,
replace => TRUE,
force_match => TRUE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );
END;

6. Now We need to run coe_xfr_sql_profile_dp2sc627nnpd4_436961061.sql for can fix plan for my sql.

SQL >

$cd /tmp
$ sqlplus “/as sysdba”
SQL> @coe_xfr_sql_profile_dp2sc627nnpd4_436961061.sql

↧

PMON terminating the instance due to error 481

November 21, 2012, 3:48 am

≫ Next: “File Not Found” Errors Running RunInstaller or Setup.exe on 11gRx

≪ Previous: How to set Execution plan for my sql

I have 11.2.0.1.6 2 node RAC system with ASM on AIX 6.1 64 bit platform. One of the node has been rebooted. While We were investigating issue We noticed LMDO process terminating the instance.

We have below in alert.log on victim node:

LMS2 (ospid: 28770960) received an instance eviction notification from instance 2 [2]
LMON received an instance eviction notification from instance 2
The instance eviction reason is 0×2
The instance eviction map is 1
PMON (ospid: 13435162): terminating the instance due to error 481
/oracle11g/VIS/db/diag/rdbms/VIS/rac_node1/trace/rac_node1_diag_19530038.trc

While We were investigating trc file rac_node1_diag_19530038.trc
=========================
Enqueue blocker waiting on ‘KSV master wait’ <<<<<<<<<<< This is the error message
Dumping global systemstates of the ASM instances, check DIAG traces under the ASM instances. <<<<<< We need to check ASM log

Performing diagnostic data dump for this instance
===================================================
SYSTEM STATE (level=10)
————
System global information:
processes: base 0x700000df8eb8550, size 5000, cleanup 0x700000d90f2e278
allocation: free sessions 0x700000d992170c0, free calls 0×0
control alloc errors: 1092 (process), 1092 (session), 1092 (call)
PMON latch cleanup depth: 0
seconds since PMON’s last scan for dead processes: 44
system statistics:
alert_+ASM1.log
====================
NOTE: ASM client rac_node1:VIS disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Process state recorded in trace file /oracle11g/VIS/base/11.2.0/diag/asm/+asm/+ASM1/trace/+ASM1_ora_57016454.trc
Tue Oct 23 09:37:21 2012
NOTE: killing foreground process 45 (10093158) for state cleanup
NOTE: killing foreground process 35 (15335956) for state cleanup
NOTE: killing foreground process 44 (26477216) for state cleanup

We checked /oracle11g/VIS/base/11.2.0/diag/asm/+asm/+ASM1/trace/+ASM1_diag_13500576.trc

We see the following in the trace file:
“Enqueue blocker waiting on ‘KSV master wait’”

After make some search We noticed that We are hitting a known problem.
This problem is due to the next bug:
Bug 11800170 – Many ASM file metadata / KSV master waits [ID 11800170.8]

Details of bug:
====================
Abstract: Many ASM file metadata / KSV master waits
Prod/Comp: DB 5/RDBMS
Affects: Vers >=11 and BELOW 12.1 Specifically: 11.2.0.1 11.2.0.2
Fixed Releases: 11.2.0.2.3 11.2.0.2.BP08 11.2.0.3 12.1.0.0 WIN:B202P07
Tags: ASM PERF
Hooks:
NEED:ASM “WAITEVENT:ASM file metadata operation”
“WAITEVENT:KSV master wait” STACKHAS:kcfis_tablespace_is_on_sage PARAMETER:_allow_cell_smart_scan_attr

DB sessions may show excessive waits for “ASM file metadata operation” / “KSV master wait” waits for systems NOT running on Exadata dueto calls into ASM to query “cell.smart” data on non-Exadata systems.Such calls are not needed.
Rediscovery Notes: ASM is used. Exadata is NOT used. The waiting DB sessions shows stacks including kcfis_tablespace_is_on_sage().

If you don’t use Exadata than Workaround for this error message is set “_allow_cell_smart_scan_attr” = false which comes True by default.

↧

“File Not Found” Errors Running RunInstaller or Setup.exe on 11gRx

November 22, 2012, 7:09 am

≫ Next: MRP0: Background Media Recovery terminated with error 1111

≪ Previous: PMON terminating the instance due to error 481

I see many issues on OTN forum site which are related “File Not Found” errors during 11gR1 and 11gR2 installations.

While you are starting to make installation by running RunInstaller or Setup.exe you can hit “File Not Found” Errors for WFMLRSVCApp.ear, WFMGRApp.ear, WFALSNRSVCApp.ear files.

This is know issue.This error show us Unzip of downloaded install images was done incorrectly.

For 11GRx version on Unix ,you can follow:

$ mkdir setup
$ mv aix.ppc64_11gR1_database_disk1.zip setup
$ mv aix.ppc64_11gR1_database_disk2.zip setup
$ cd setup
$ unzip aix.ppc64_11gR1_database_disk1.zip
$ unzip aix.ppc64_11gR1_database_disk2.zip
$ cd database
$ ./runInstaller
For Windows ,you can follow:

- create a directory C:\tmp\Setup

- copy win32_11gR2_database_1of2.zip and win32_11gR2_database_2of2.zip under C:\tmp\Setup

- unzip win32_11gR2_database_1of2.zip and win32_11gR2_database_2of2.zip under C:\tmp\Setup

Source:
Installing 11G : “File Not Found” Errors Running RunInstaller or Setup.exe (WFMLRSVCApp.ear, WFMGRApp.ear, WFALSNRSVCApp.ear) [ID 468771.1]

↧

MRP0: Background Media Recovery terminated with error 1111

December 4, 2012, 12:32 am

≫ Next: How to Stop/Start RAC components

≪ Previous: “File Not Found” Errors Running RunInstaller or Setup.exe on 11gRx

I have dataguard configuration such as 2 node rac(11.2.0.2) on aix 6.1 as primary, standalone standby database.

I noticed that Primary can transport archive log to standby but standby can not apply archive log and giving errors at standby alert.log such as:
MRP0: Background Media Recovery terminated with error 1111 and MRP0: Background Media Recovery process shutdown (PROD00DG)

On primary alert.log
————————
RC8: Archive log rejected (thread 1 sequence 75698) at host ‘PROD00DG_private_odm_izm’
FAL[server, ARC8]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance PROD001 – Archival Error. Archiver continuing.

I noticed that MPR has been stopped and standby.

On standby:
—————
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

at alert:
————–
Slave exiting with ORA-1111 exception
Errors in file /oracle11g/app/oracle/diag/rdbms/PROD00dg/PROD00DG/trace/PROD00DG_pr00_17891406.trc:
ORA-01111: name for data file 1285 is unknown – rename to correct file
ORA-01110: data file 1285: ‘/oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285′
ORA-01157: cannot identify/lock data file 1285 – see DBWR trace file
ORA-01111: name for data file 1285 is unknown – rename to correct file
ORA-01110: data file 1285: ‘/oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285′
Recovery Slave PR00 previously exited with exception 1111
MRP0: Background Media Recovery process shutdown (PROD00DG)

at standby trace:
———————-
MRP0: Background Media Recovery terminated with error 1111
ORA-01111: name for data file 1285 is unknown – rename to correct file
ORA-01110: data file 1285: ‘/oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285′
ORA-01157: cannot identify/lock data file 1285 – see DBWR trace file
ORA-01111: name for data file 1285 is unknown – rename to correct file
ORA-01110: data file 1285: ‘/oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285′

*** 2012-12-01 19:41:03.428
Completed Media Recovery
Managed Recovery: Not Active posted.

*** 2012-12-01 19:41:04.133
Slave exiting with ORA-1111 exception
ORA-01111: name for data file 1285 is unknown – rename to correct file
ORA-01110: data file 1285: ‘/oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285′
ORA-01157: cannot identify/lock data file 1285 – see DBWR trace file
ORA-01111: name for data file 1285 is unknown – rename to correct file
ORA-01110: data file 1285: ‘/oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285′

I have checked below query ouputs:

a.) select file#, error from v$recover_file;
b.) select file#, name, status from v$datafile;
Outputs are:
——————

SQL> select file#, error from v$recover_file;

FILE# ERROR
———- —————————–
1268
1269
1270
1275
1276
1277
1278
1279
1280
1281
1282

FILE# ERROR
———- ——————————
1283
1284
1285 FILE MISSING

SQL> select file#, name, status from v$datafile;

file# name status
—— ———– ———
1285 /oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285 RECOVER

After make some search, I have found MOS Recovering the primary database’s datafile using the physical standby, and vice versa [ID 453153.1]

A backup of the one datafile from the primary can be made and then used to restore on the standby database, as indicated in the following note:

The document walks you through the process starting about one-half way down, in the section titled:
“Recovering the Standby’s Datafile”

I followed below steps:

1. Backup related file at primary

On primary:
——————–
$ rman target /

RMAN> backup datafile 1285 format ‘/tmp/1285_pr.bk’ tag ‘PRIMARY_1285′;

2. Transfer the file to the standby site using an operating system utility such as scp, NFS, ftp etc

3. At the standby site, catalog the backuppiece and confirm it’s available for use:

On standby:
——————–
$ rman target /

RMAN> catalog backuppiece’/tmp/1285_pr.bk’;
RMAN> list backuppiece’/tmp/1285_pr.bk’;
RMAN> list backup of datafile 1285;

4. Stop redo apply on the physical standby database

On standby:
——————–
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

For my issue my redo apply has been already stopped.

5. On the standby site restore the datafile:

On standby:
——————–
$ rman target /
RMAN> restore datafile 1285;

At steps 5 I got error:

RMAN> restore datafile 1285;

Starting restore at 01-DEC-12
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=438 device type=DISK

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 12/01/2012 21:46:50
RMAN-06085: must use SET NEWNAME command to restore datafile /oracle11g/app/oracle/11.2.0/dbs/UNNAMED01285

So I need to run below command for can restore datafile 1285:

RUN {
SET NEWNAME FOR DATAFILE 1285 to ‘+ORADATA’;
RESTORE DATAFILE 1285;
SWITCH DATAFILE 1285;
}

6. Check status of files:Restart redo apply on the physical standby database

On standby:
——————–
a.) select file#, error from v$recover_file;
b.) select file#, name, status from v$datafile;

On standby:
——————–
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
from log:

Successfully added datafile 1285 to media recovery
Datafile #1285: ‘+ORADATA/PROD00dg/datafile/tPRODspace_2012_ernst.1574.800921533′

Successfully added datafile 1285 to media recovery
Datafile #1286: ‘+ORADATA/PROD00dg/datafile/tPRODspace_2012_ernst.1574.800921533′
SQL> select thread#, max (sequence#) from v$archived_log where APPLIED=’YES’ group by thread#;

THREAD# MAX(SEQUENCE#)
———- ————–
1 75677
2 72871

Reference:
—————
Recovering the primary database’s datafile using the physical standby, and vice versa [ID 453153.1]
How to Recover from a Lost or Deleted Datafile with Different Scenarios [ID 198640.1]
MRP0: Background Media Recovery terminated with error 1274 [ID 739618.1]

↧

How to Stop/Start RAC components

January 3, 2013, 6:50 am

≫ Next: ORA-01578: ORACLE data block corrupted with ORA-01110: data file #: string

≪ Previous: MRP0: Background Media Recovery terminated with error 1111

In this article, I am going to explain how to stop/start rac components. Here is the my system:

My db version : 11.2.0.3
My Operating System : AIX 7.1
My servers hostname : node1-node2
My database name : TEST01
My instance name : TEST011-TEST012

Here is the some basic commands, for commands details&options please review Reference docs:

Checking CRS Status
[oracle@node1]</home/oracle> crsctl check crs

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

[oracle@node2]</home/oracle> crsctl check crs

CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Checking Node Status

[oracle@node1]</home/oracle> srvctl status nodeapps

VIP node1-vip is enabled
VIP node1-vip is running on node: node1
VIP 192.168.100.101 is enabled
VIP 192.168.100.101 is running on node: node2
Network is enabled
Network is running on node: node1
Network is running on node: node2
GSD is disabled
GSD is not running on node: node1
GSD is not running on node: node2
ONS is enabled
ONS daemon is running on node: node1
ONS daemon is running on node: node2

[oracle@node2]</home/oracle> srvctl status nodeapps

Checking Clusterware Resource Status
[oracle@node1]</home/oracle> crsctl status resource -t

I will not paste result because output is not clear in that page

You can use below command which is not recommended for 11g and which is depreciated

[oracle@node1]</home/oracle> crs_stat -t
Name Type Target State Host
————————————————————
ora….DATA.dg ora….up.type ONLINE ONLINE node1
ora….ER.lsnr ora….er.type ONLINE ONLINE node1
ora….N1.lsnr ora….er.type ONLINE ONLINE node1
ora….N2.lsnr ora….er.type ONLINE ONLINE node2
ora.ORADATA.dg ora….up.type ONLINE ONLINE node1
ora.asm ora.asm.type ONLINE ONLINE node1
ora.cvu ora.cvu.type ONLINE ONLINE node2
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora….SM1.asm application ONLINE ONLINE node1
ora….11.lsnr application ONLINE ONLINE node1
ora….b11.gsd application OFFLINE OFFLINE
ora….b11.ons application ONLINE ONLINE node1
ora….b11.vip ora….t1.type ONLINE ONLINE node1
ora….SM2.asm application ONLINE ONLINE node2
ora….12.lsnr application ONLINE ONLINE node2
ora….b12.gsd application OFFLINE OFFLINE
ora….b12.ons application ONLINE ONLINE node2
ora….b12.vip ora….t1.type ONLINE ONLINE node2
ora….network ora….rk.type ONLINE ONLINE node1
ora.oc4j ora.oc4j.type ONLINE ONLINE node2
ora.ons ora.ons.type ONLINE ONLINE node1
ora.test01.db ora….se.type ONLINE ONLINE node1
ora….int.svc ora….ce.type ONLINE ONLINE node2
ora….int.svc ora….ce.type ONLINE ONLINE node2
ora….kis.svc ora….ce.type ONLINE ONLINE node2
ora….est.svc ora….ce.type ONLINE ONLINE node1
ora….ry.acfs ora….fs.type ONLINE ONLINE node1
ora.scan1.vip ora….ip.type ONLINE ONLINE node1
ora.scan2.vip ora….ip.type ONLINE ONLINE node2

Oracle High Availability Services

– disable/enable Oracle HAS.
Use the “crsctl enable/disable has” command to disable automatic startup of the Oracle High Availability Services stack when the server boots up.

To can see current settings for Oracle High Availability Services stack when the server boots up, follow:

[root@node1]crsctl config has
CRS-4622: Oracle High Availability Services autostart is enabled.

[root@node1]cat /etc/oracle/scls_scr/node1/root/ohasdstr
enable

So as you can see my current setting is enable.If your system shown disable than :

For Disable:
[root@node1]crsctl disable has
CRS-4621: Oracle High Availability Services autostart is disabled.

[root@node1] crsctl config has
CRS-4621: Oracle High Availability Services autostart is disabled.

# cat /etc/oracle/scls_scr/node1/root/ohasdstr
disable

For Enable:
[root@node1]crsctl enable has
CRS-4621: Oracle High Availability Services autostart is enabled.

Check new setting:

[root@node1] crsctl config has
CRS-4621: Oracle High Availability Services autostart is enabled.

[root@node1] cat /etc/oracle/scls_scr/node1/root/ohasdstr
enable

Stop the Oracle clusterware stack

You can use below commands:

With root user:

crsctl stop crs or crsctl stop has

[root@node1]crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node1′
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘node1′
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on ‘node1′
CRS-2673: Attempting to stop ‘ora.LISTENER_SCAN2.lsnr’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.LISTENER.lsnr’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.test01.db’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.LISTENER_SCAN3.lsnr’ on ‘node1′
CRS-2677: Stop of ‘ora.LISTENER_SCAN2.lsnr’ on ‘node1′ succeeded
CRS-2673: Attempting to stop ‘ora.scan2.vip’ on ‘node1′
CRS-2677: Stop of ‘ora.LISTENER.lsnr’ on ‘node1′ succeeded
CRS-2673: Attempting to stop ‘ora.node1.vip’ on ‘node1′
CRS-2677: Stop of ‘ora.LISTENER_SCAN3.lsnr’ on ‘node1′ succeeded
CRS-2673: Attempting to stop ‘ora.scan3.vip’ on ‘node1′
CRS-2677: Stop of ‘ora.node1.vip’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.node1.vip’ on ‘node2′
CRS-2677: Stop of ‘ora.scan2.vip’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.scan2.vip’ on ‘node2′
CRS-2677: Stop of ‘ora.scan3.vip’ on ‘node1′ succeeded
CRS-2672: Attempting to start ‘ora.scan3.vip’ on ‘node2′
CRS-2676: Start of ‘ora.node1.vip’ on ‘node2′ succeeded
CRS-2677: Stop of ‘ora.test01.db’ on ‘node1′ succeeded
CRS-2676: Start of ‘ora.scan2.vip’ on ‘node2′ succeeded
CRS-2672: Attempting to start ‘ora.LISTENER_SCAN2.lsnr’ on ‘node2′
CRS-2676: Start of ‘ora.scan3.vip’ on ‘node2′ succeeded
CRS-2672: Attempting to start ‘ora.LISTENER_SCAN3.lsnr’ on ‘node2′
CRS-2676: Start of ‘ora.LISTENER_SCAN2.lsnr’ on ‘node2′ succeeded
CRS-2676: Start of ‘ora.LISTENER_SCAN3.lsnr’ on ‘node2′ succeeded
CRS-2673: Attempting to stop ‘ora.ons’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.eons’ on ‘node1′
CRS-2677: Stop of ‘ora.ons’ on ‘node1′ succeeded
CRS-2673: Attempting to stop ‘ora.net1.network’ on ‘node1′
CRS-2677: Stop of ‘ora.net1.network’ on ‘node1′ succeeded
CRS-2677: Stop of ‘ora.eons’ on ‘node1′ succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on ‘node1′ has completed
CRS-2677: Stop of ‘ora.crsd’ on ‘node1′ succeeded
CRS-2673: Attempting to stop ‘ora.mdnsd’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.gpnpd’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.cssdmonitor’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.ctssd’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.evmd’ on ‘node1′
CRS-2677: Stop of ‘ora.cssdmonitor’ on ‘node1′ succeeded
CRS-2677: Stop of ‘ora.mdnsd’ on ‘node1′ succeeded
CRS-2677: Stop of ‘ora.gpnpd’ on ‘node1′ succeeded
CRS-2677: Stop of ‘ora.evmd’ on ‘node1′ succeeded
CRS-2677: Stop of ‘ora.ctssd’ on ‘node1′ succeeded
CRS-2673: Attempting to stop ‘ora.cssd’ on ‘node1′
CRS-2677: Stop of ‘ora.cssd’ on ‘node1′ succeeded
CRS-2673: Attempting to stop ‘ora.diskmon’ on ‘node1′
CRS-2673: Attempting to stop ‘ora.gipcd’ on ‘node1′
CRS-2677: Stop of ‘ora.gipcd’ on ‘node1′ succeeded
CRS-2677: Stop of ‘ora.diskmon’ on ‘node1′ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘node1′ has completed
CRS-4133: Oracle High Availability Services has been stopped.

Start the Oracle clusterware stack

You can use below commands:

With root user:
crsctl start crs or crsctl start has

[root@node1] crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

Start the Oracle Database
To start all Oracle RAC instances for a database:
[oracle@node1]</home/oracle> $ORACLE_HOME/bin/srvctl start database -d db_name

PS: db_name is the name of the databasethis command is starting all the instances

Stop the Oracle Database
To shut down all Oracle RAC instances for a database:
[oracle@node1]</home/oracle> $ORACLE_HOME/bin/srvctl stop database -d db_name

PS: db_name is the name of the databasethis command is starting all the instances

Start the Oracle Instance:

[oracle@node1]</home/oracle> $ORACLE_HOME/bin/srvctl start instance –d db_name –i instance_name

Stop the Oracle Instance:

[oracle@node1]</home/oracle> $ORACLE_HOME/bin/srvctl stop instance –d db_name –i instance_name
Stop/Start Listener-SCAN_LISTENER
-
srvctl stop/start listener -n node1
srvctl stop/start listener -n node2
srvctl stop scan_listener

Stop ASM

srvctl stop asm [-o stop_options] [-f]
srvctl stop asm -n node1

Reference:
10gR2, 11gR1 and 11gR2 Oracle Clusterware (CRS / Grid Infrastructure) & RAC Command (crsctl, srvctl, cluvfy etc) Syntax and Reference [ID 1332452.1]
11gR2 Clusterware and Grid Home – What You Need to Know [ID 1053147.1]
http://download.oracle.com/docs/cd/E11882_01/rac.112/e16795.pdf

↧

ORA-01578: ORACLE data block corrupted with ORA-01110: data file #: string

February 21, 2013, 12:50 am

≫ Next: Recreating the Spfile for RAC Instances.

≪ Previous: How to Stop/Start RAC components

You can hit this error on your alert.log. Error could be similar as:

ORA-01578: ORACLE data block corrupted (file # 12, block # 4207)

ORA-01110: data file 11: ‘/xx/sytem.dbf’

There are many possible causes of a block corruption including:

- Bad IO hardware / firmware
- OS problems
- Oracle problems
- Recovering through “UNRECOVERABLE” or “NOLOGGING” database actions
(in which case ORA-1578 is expected behaviour

Behavior in 9i and 10g, the view v$database_block_corruption used to get populated only when RMAN Backup validate /check logical validate command was run.

The populated information used to get refreshed only once the corruption was repaired(media recovery/Object dropped) and on re-run of the Rman Backup validate /check logical validate command on the database or the affected datafile.

With 11g this behavior has Changed.When any database utility or process encounters an intrablock corruption, it automatically records it in V$DATABASE_BLOCK_CORRUPTION.

The repair removes metadata about corrupt blocks from the view.

Repair techniques include:
- block media recovery,
- restoring datafiles,
- recovering by means of incremental backups, and block newing.

Do not forget, Block media recovery can repair physical corruptions, but not logical corruptions.

Database utilities which populates V$DATABASE_BLOCK_CORRUPTION on detecting corruption:

- Analyze table .. Validate structure
- Dbverify
- CTAS(Create table as Select)
- Export

Checking for Block Corruption with the VALIDATE Command
———————————————————
Syntax for Rman Validate Command :-

For Database :
——————
RMAN > Validate database

For Datafile :
——————
RMAN > Validate datafile <file no>,<file no> ;

For Data block :
——————
RMAN > Validate datafile <file no> block <Block no> ;

Archivelog restores for BMR can be run in parallel on multiple channels, but datafile/backupset scans and the recovery session must all run in the same server session.

To allow selection of which backup will be used to select the desired blocks,the blockrecover command supports options used in the restore command:

FROM BACKUPSET — restore blocks from backupsets only
FROM DATAFILECOPY — restore blocks from datafile copies only
FROM TAG — restore blocks from tagged backup
RESTORE UNTIL TIME|SCN|LOGSEQ

So after validate how we can recover related corruptions? Here is the some examples:

Recovery using Explicit File/Block:
————————————
$ rman target / log=rman1.log
RMAN> run {blockrecover datafile 12 block 4207;}
Recovery using Corruption list :
————————————
$ rman target / log=rman1.log
RMAN> run {blockrecover corruption list;}

There are too many documents avaliable at metalink which are covers deeply explain concept with corruptions examples.So, I strongly suggest to review below docs while you are hitting similar errors on your system:

Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g/11g [ID 28814.1]
Master Note for Handling Oracle Database Corruption Issues [ID 1088018.1]
Data Recovery Advisor – Corruption Reference Guide [ID 1317849.1]
RMAN : Block-Level Media Recovery – Concept & Example [ID 144911.1]
OERR: ORA-1578 “ORACLE data block corrupted (file # %s, block # %s)” Master Note [ID 1578.1]
HOW TO TROUBLESHOOT AND RESOLVE an ORA-1110 [ID 434013.1]
11g New Feature V$Database_block_corruption Enhancements and Rman Validate Command [ID 471716.1]

↧

Recreating the Spfile for RAC Instances.

March 6, 2013, 7:21 am

≫ Next: RCA: LMON (ospid: XXXX) detects hung instances during IMR reconfiguration

≪ Previous: ORA-01578: ORACLE data block corrupted with ORA-01110: data file #: string

In this post, I am going to explain how we can create Recreating the Spfile for RAC Instances.

We had issue on our EBS R12.1.3 system with 11.2.0.1 2 node RAC system on AIX 6.1. We need to recreate our spfile on ASM. So here is the steps:

By default init<SID>.ora file represent at $ORACLE_HOME/dbs path.

In this case the parameter file was saved to /u01/app/oracle/11.2.0/dbs/initPROD001.ora

1. Using this text based format of the parameter file after having corrected the parameters that have caused the issue, start one of the RAC instances to the mount phase.

SQL> startup mount pfile=’/u01/app/oracle/11.2.0/home/oracle/initPROD01.ora’;

2. The current location for spfile is; +oradata/PROD00/spfilePROD00.ora so the new file will need to replace with this file. ASM itself stores the spfile in +oradata/PROD00/PARAMETERFILE/spfile.267.737949031 and links or aliases the spfile in the location +oradata/PROD00/spfilePROD00.ora

We can also check where spfile is in ASM diskgroup by using the ASMCMD command

For example(Do not forget to set ASM env):

ASMCMD [+oradata/PROD00] > ls -l
Type Redund Striped Time Sys Name
Y CHANGETRACKING/
Y CONTROLFILE/
Y DATAFILE/
Y ONLINELOG/
Y PARAMETERFILE/
Y TEMPFILE/
N spfilePROD00.ora => +ORADATA/PROD00/PARAMETERFILE/spfile.267.737949031

3. Before start be ensure one of the database instances is mounted in RAC system to recreate the spfile.

SQL> select INSTANCE_NAME,HOST_NAME,STATUS from v$instance

INSTANCE_NAME HOST_NAME STATUS
—————- —————– ————
PROD001 orapdb11 mount

4. Create the new spfile

SQL> create spfile=’+oradata/PROD00/spfilePROD00.ora’ from pfile=’/u01/app/oracle/11.2.0/dbs/initPROD001.ora’;
File created.

5. ASMCMD will show that a new spfile has been created as the alias spfilePROD00.ora is now pointing to a new spfile under the PARAMETER directory in ASM.

ASMCMD> pwd
+oradata/PROD00
ASMCMD> ls -l
Type Redund Striped Time Sys Name
Y CONTROLFILE/
Y DATAFILE/
Y ONLINELOG/
Y PARAMETERFILE/
Y TEMPFILE/
N spfilePROD00.ora =>
+oradata/PROD00/PARAMETERFILE/spfile.298.731252301

6. Shutdown the instance and restart the database using srvctl using the newly created spfile.

SQL> shutdown immediate
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

[oracle@node1 ~]$ srvctl start database -d PROD00

7. ASMCMD will now show a number of spfiles exist in the PARAMETERFILE directory for this database. We need to remove old spfile .

ASMCMD> pwd
+oradata/PROD00

ASMCMD> cd PARAMETERFILE

ASMCMD> ls -l
Type Redund Striped Time Sys Name
ASMCMD [+oradata/PROD00/PARAMETERFILE] > ls -l
Type Redund Striped Time Sys Name
PARAMETERFILE UNPROT COARSE MAR 06 16:00:00 Y spfile.267.737949031
PARAMETERFILE UNPROT COARSE MAR 06 16:40:00 Y spfile.298.731252301
ASMCMD> rm spfile.267.737949031
ASMCMD> ls
spfile.298.731252301

Reference:
Recreating the Spfile for RAC Instances Where the Spfile is Stored in ASM [ID 554120.1]

↧

RCA: LMON (ospid: XXXX) detects hung instances during IMR reconfiguration

March 21, 2013, 7:09 am

≫ Next: ORA-39726: unsupported add/drop column operation on compressed tables

≪ Previous: Recreating the Spfile for RAC Instances.

We have 11.2.0.1.6 2 node RAC system which is belong r12.1.3 EBS system.

Our node 1 is goes down by below errors:
opiodr aborting process unknown ospid (11075852) as a result of ORA-28
LMON (ospid: 63767522) detects hung instances during IMR reconfiguration
LMON (ospid: 63767522) tries to kill the instance 2.
Please check instance 2′s alert log and LMON trace file for more details.
Tue Mar 19 10:58:36 2013
USER (ospid: 32900426): terminating the instance due to error 481
Tue Mar 19 10:58:36 2013
Errors in file /oracle11g/PROD00/db/diag/rdbms/PROD00/PROD001/trace/PROD001_lmon_63767522.trc:
ORA-29702: error occurred in Cluster Group Service operation
System state dump is made for local instance
System State dumped to trace file /oracle11g/PROD00/db/diag/rdbms/PROD00/PROD001/trace/PROD001_diag_9373174.trc
Instance terminated by USER, pid = 32900426
Error Codes
—————————————————

From: PROD001_lmon_63767522.trc

rom: PROD001_lmon_63767522.trc

*** 2013-03-19 10:55:00.531

* DRM RCFG called (swin 1)
CGS recovery timeout = 85 sec
Begin DRM(5108) (swin 1)

*** 2013-03-19 10:57:11.547
kjxgmrcfg: Reconfiguration started, type 6
CGS/IMR TIMEOUTS:
CSS recovery timeout = 31 sec (Total CSS waittime = 65)
IMR Reconfig timeout = 75 sec
CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 274 0.

*** 2013-03-19 10:57:11.567
Name Service frozen

..
* kjfcln: DRM aborted due to CGS rcfg.

*** 2013-03-19 10:57:16.439
*** 2013-03-19 10:57:41.514
=====================================================
kjxgmpoll: CGS state (274 1) start 0×51482867 cur 0×51482885 rcfgtm 30 sec
=====================================================
Group name: PROD00
Member id: node 0 inst 1
Cached KGXGN event: 0
Group State:
State: 274 1
Flags: 0×4 SSFlags: 0×0
Reconfig started cur-tm 0x6aba6ec8 start-tm 0x6ab9fc80 tmout 0×55 state 0×2
Reconfig INPG type 6 inc 274 rsn 5 data 0×1
Reconfig COMP type 6 inc 274 rsn 5 data 0×1
..

*** 2013-03-19 10:58:31.632
=====================================================
kjxgmpoll: CGS state (274 1) start 0×51482867 cur 0x514828b7 rcfgtm 80 sec

*** 2013-03-19 10:58:36.664
=====================================================
kjxgmpoll: CGS state (274 1) start 0×51482867 cur 0x514828bc rcfgtm 85 sec
kjxgmpoll: the CGS reconfiguration has spent 85 seconds.
kjxgmpoll: terminate the CGS reconfig.
Error: Cluster Group Service reconfiguration takes too long
LMON caught an error 29702 in the main loop
error 29702 detected in background process
ORA-29702: error occurred in Cluster Group Service operation

We see many drm quiesce hang messages
find . -name “*lmon*.trc” |xargs grep -i “quiesce hang”
./oracle/PROD001_lmon_63767522.trc:* Request pseudo reconfig due to drm quiesce hang
./oracle/PROD001_lmon_63767522.trc:* Request pseudo reconfig due to drm quiesce hang
./oracle/PROD002_lmon_14221454.trc:* Request pseudo reconfig due to drm quiesce hang
./oracle/PROD002_lmon_14221454.trc:* Request pseudo reconfig due to drm quiesce hang
./oracle/PROD002_lmon_14221454.trc:* Request pseudo reconfig due to drm quiesce hang
./oracle/PROD002_lmon_14221454.trc:* Request pseudo reconfig due to drm quiesce hang
./oracle/PROD002_lmon_14221454.trc:* Request pseudo reconfig due to drm quiesce hang

Based on these, The issue does appear to be an occurance of bug : 12879027 LMON gets stuck in DRM quiesce causing intermittent pseudo reconfiguration
To get the fix for the bug, please install the 11.2.0.3 patchset into the rdbms $ORACLE_HOME and then , apply on top, the 11.2.0.3.3 PSU, or higher/later PSUs

More details can be found at MOS note: Bug 12879027 – LMON gets stuck in DRM quiesce causing intermittent pseudo reconfiguration [ID 12879027.8]

↧

ORA-39726: unsupported add/drop column operation on compressed tables

April 1, 2013, 6:57 am

≫ Next: Error starting ORMI-Server. Unable to bind socket: The socket name is already in use.

≪ Previous: RCA: LMON (ospid: XXXX) detects hung instances during IMR reconfiguration

You can hit this error during drop column on your compressed tables

For the 10.2 there is no solution for this error. But there are some options to workaround for can pass this error.

These are:

Upgrade your db to 11gR2
Can keep unused columns on tables
You may recreate table without such columns which you try to drop
Do not to use compress

For 11g here is the my demo:

oracle@helios> create table test_table compress as select * from dba_users;

Table created.

oracle@helios> alter table test_table add name varchar2;

Table altered.

oracle@helios> alter table test_table drop column name;
alter table test_table drop column name
*
ERROR at line 1:
ORA-39726: unsupported add/drop column operation on compressed tables

oracle@helios> alter table test_table set unused column name;

Table altered.

oracle@helios> alter table test_table drop unused columns;

Table altered.

If you still keep hiting same error you can hit bug 9163477.

Reference:
Error ORA-39726 Drop Column Operation On Compressed Tables 10.2 Release [ID 1068820.1]
Bug 9163477 – error ora-39726 while dropping column on compressed partitioned table [ID 9163477.8]

↧

Error starting ORMI-Server. Unable to bind socket: The socket name is already in use.

April 25, 2013, 4:46 am

≫ Next: RMAN backup end with ORA-19606&RMAN-03009 Errors

≪ Previous: ORA-39726: unsupported add/drop column operation on compressed tables

Error starting ORMI-Server. Unable to bind socket: The socket name is already in use.
Thread-1 WARN http: snmehl_connect: connect failed to (node1:xx): A remote host refused an attempted connect operation. (error = 79)

I face with that error on 10.2.0.4 RAC system on AIX 6.1. While I am trying to start dbconsole it fails.

[oracle@node1] emctl start dbconsole
Oracle Enterprise Manager 10g Database Control Release 10.2.0.4.0
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.

http://node1:1158/em/console/aboutApplication

ps: 0509-048 Flag -o was used with invalid list.
ps: Not a recognized flag: -
Usage: ps [-AMNZaedfklm] [-n namelist] [-F Format] [-o specifier[=header],…]
[-p proclist][-G|-g grouplist] [-t termlist] [-U|-u userlist] [-c classlist] [ -T pid] [ -L pidlist ]
[-@ [wparname] ]
Usage: ps [aceglnsuvwxX] [t tty] [processnumber]
Starting Oracle Enterprise Manager 10g Database Control ………………………..
When I checked $ORACLE_HOME/node1_Instance1/sysman/log I noticed 2 error messages at log files:

- Error starting ORMI-Server. Unable to bind socket: The socket name is already in use.
- Thread-1 WARN http: snmehl_connect: connect failed to (node1:xx): A remote host refused an attempted connect operation. (error = 79)

After investigate issue I followed:

1. Stop the dbconsole:
$ORACLE_HOME/bin/emctl stop dbconsole

2. Edit the following file
$ORACLE_HOME/oc4j/j2ee/OC4J_DBConsole_<HOST_SID>/config/rmi.xml

change the rmi-server port to different port
For example, change

<rmi-server port=”5520″
to
<rmi-server port=”5521″

Note: Before changing the check if it is not being used by any process using netstat command

For instance:
netstat -an|grep 5521

3. Start the dbconsole
$ORACLE_HOME/bin/emctl start dbconsole

After that change dbconsole has been started

Reference:
Problem: Startup: Cannot Start dbconsole and log Shows ‘ORMI-Server address is already being used’ [ID 419586.1]
EMCA or DB Control (DBConsole) Fails with Error starting ORMI-Server [ID 438504.1]

↧

RMAN backup end with ORA-19606&RMAN-03009 Errors

June 13, 2013, 4:29 am

≫ Next: ORA-15018-ORA-15030 How to recreate ASM Diskgroup Oracle 10g&11g

≪ Previous: Error starting ORMI-Server. Unable to bind socket: The socket name is already in use.

I have 11.2.0.3.6 standalone database with ASM on Aıx 7.1 system. We are using rman script to take full backup for that system.

We have basic script to can take full backup of our system such as:

run
{
ALLOCATE CHANNEL C1 device type ‘sbt_tape’ ;
ALLOCATE CHANNEL C2 device type ‘sbt_tape’ ;
backup database plus archivelog delete input format= ‘DB_SID_DB_%d_%t_%s_%p’;
backup archivelog all delete input FORMAT= ‘DB_ARC_%d_%t_%s_%p’;
delete force noprompt obsolete;
RELEASE CHANNEL C1;
RELEASE CHANNEL C2;
}

We got below error message during end of backup:

.
.
released channel: C1
released channel: C2
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on default channel at 06/13/2013 05:00:17
ORA-19606: Cannot copy or restore to snapshot control file

Recovery Manager complete.
06/13/13 05:00:22 Finished command. Return code is: 111
06/13/13 05:00:22 ANS1909E The scheduled command failed.

After We investigate issue,We have been found MOS Error ORA-19606 on RMAN Delete [ID 1215493.1]

Here is the steps of solutions:

1. Current settings:

RMAN> show all;

CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f’; # default

Now I am going to change new name (or location) for RMAN to use for snapshot controlfile:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD_Database.f’;

new RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD_Database.f’;
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete

[oracle@PROD]</u01/app/oracle/product/11.2.0/dbs> ls -lrt

-rw-r—– 1 oracle oinstall 7749632 Jun 13 05:00 snapcf_PROD.f
-rw-rw—- 1 oracle oinstall 1544 Jun 13 14:00 hc_PROD.dat
-rw-r—– 1 oracle oinstall 7749632 Jun 13 14:01 snapcf_PROD_Database.f

2. Remove or delete old controlfile copy.

[oracle@PROD]</u01/app/oracle/product/11.2.0/dbs> mv snapcf_PROD.f snapcf_PROD.f_old

3. Crosscheck controlfile copy.

RMAN> crosscheck controlfilecopy ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f’;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1347 device type=DISK
validation failed for control file copy
control file copy file name=/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f RECID=1 STAMP=815323471
Crosschecked 1 objects

3. Delete controlfile copy via RMAN.

RMAN> delete expired controlfilecopy ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f’;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1347 device type=DISK
List of Control File Copies
===========================

Key S Completion Time Ckp SCN Ckp Time
——- – ——————- ———- ——————-
1172256811 X 13-05-2013 14:44:31 358151 13-05-2013 14:44:31
Name: /u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f
Tag: TAG20130513T144431
Do you really want to delete the above objects (enter YES or NO)? YES
deleted control file copy
control file copy file name=/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f RECID=1 STAMP=815323471
Deleted 1 EXPIRED objects

4. Current settings:

RMAN> show all;

CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD_Database.f’;

Now I am going to change new name for RMAN to use for snapshot controlfile
PS: You dont need this steps if its okey to use your copy with new name

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f’;

old RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD_Database.f’;
new RMAN configuration parameters:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/u01/app/oracle/product/11.2.0/dbs/snapcf_PROD.f’;
new RMAN configuration parameters are successfully stored
starting full resync of recovery catalog
full resync complete

Reference:
Error ORA-19606 on RMAN Delete [ID 1215493.1]

↧

ORA-15018-ORA-15030 How to recreate ASM Diskgroup Oracle 10g&11g

August 13, 2013, 3:51 am

≫ Next: ksh: /usr/bin/rm: arg list too long

≪ Previous: RMAN backup end with ORA-19606&RMAN-03009 Errors

Sometimes as DBAs may need to recreate ASM Diskgroup on their environments. I face with one issue on our disaster test. Our ORADATA diskgroup sync with Disaster center but ORAFRA diskgroup do not copy to disaster center.
Normally If you have disk on disaster center(In our case diskgroup is ORAFRA) you can select a different name for the diskgroup or drop diskgroup and recreate the raw devices.Recreate the raw devices is required in order to remove completely the ASM metadata.

In my case, Storage team give me one new raw device at disaster center and I need to recreate ORAFRA at disaster center.

You can follow below steps to recreate ASM Diskgroup for 10g&11g.

Steps are different for 10g&11g.So Here is the steps for 11g:

1. Set ASM env.
2. Connect as sysasm to ASM instance.

# sqlplus “/as sysasm”
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Automatic Storage Management option

SQL> CREATE DISKGROUP ORAFRA DISK ‘/dev/rhdisk21′;
CREATE DISKGROUP ORAFRA DISK ‘/dev/rhdisk21′
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
ORA-15030 diskgroup namedgrp1 is in use by another diskgroup

3. run asmcmd command

# asmcmd -p
ASMCMD [+] > lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 1843200 261487 0 261487 0 N ORADATA/
Our aim is to recreate ORAFRA

ASMCMD [+] > help dropdg
dropdg

Drops a disk group.

dropdg [-r [-f]] diskgroup

The options for the dropdg command are described below.

-f – Force the operation. Only applicable if the disk group
cannot be mounted.
-r – Recursive, include contents.
diskgroup – Name of disk group to drop.

dropdg drops an existing disk group. The disk group cannot be mounted
on more than one node.

These are examples of the use of dropdg. The first example forces the
drop of the disk group data, including any data in the disk group.
The second example drops the disk group fra, including any data in
the disk group.

ASMCMD [+] > dropdg -r -f data
ASMCMD [+] > dropdg -r fra

ASMCMD [+] > dropdg -f -r ORAFRA
ASMCMD [+] > exit

ASMCMD [+] > lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 1843200 261487 0 261487 0 N ORADATA/

4. Reconnect as sysasm to ASM instance.

# sqlplus “/as sysasm”

SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 13 10:12:15 2013

Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Automatic Storage Management option

SQL> CREATE DISKGROUP ORAFRA EXTERNAL REDUNDANCY DISK ‘/dev/rhdisk21′;

Diskgroup created.

SQL> exit

Here is the steps for 10g:

1. Set ASM settings
2. Connect as sysdba to ASM instance.

# sqlplus “/as sysdba”

SQL> CREATE DISKGROUP ORAFRA EXTERNAL REDUNDANCY DISK ‘/dev/rhdisk21′ force;

Diskgroup created.

Reference:
ORA-15030 When Creating Diskgroup and Diskgroup Name Already Referenced in ASM_DISKGROUPS Parameter (Doc ID 1536701.1)
ORA-15030 DISKGROUP NAME “DATA” IS IN USE BY ANOTER DISKGROUP (Doc ID 1530864.1)

↧

ksh: /usr/bin/rm: arg list too long

August 15, 2013, 1:19 am

≫ Next: ex: 0602-101 Out of memory saving lines for undo.

≪ Previous: ORA-15018-ORA-15030 How to recreate ASM Diskgroup Oracle 10g&11g

While you are deleting some xxx.trc -c-dump files or some other files which fill up your disk you may hit ksh: /usr/bin/rm: arg list too long error message,

# cd /u01/app/oracle/admin/DB_NAME/bdump

# rm -rf *.trc

ksh: /usr/bin/rm: arg list too long

So you can use below syntax to can delete those files:

# find . -name “DIR_PATH/*.trc” -type f -exec rm -f {} \;

or:

# find DIR_PATH -name “*.trc” -type f -exec rm -f {} \;

PS: In our case DIR_PATH=/u01/app/oracle/admin/DB_NAME/bdump

↧

ex: 0602-101 Out of memory saving lines for undo.

August 15, 2013, 1:29 am

≫ Next: LsInventorySession failed: OracleHomeInventory::load() gets null oracleHomeInfo

≪ Previous: ksh: /usr/bin/rm: arg list too long

While you are trying to open alert_SID.log file with vi editor you may hit ex: 0602-101 Out of memory saving lines for undo error message,

# cd /u01/app/oracle/admin/DB_NAME/bdump

# vi alert_SID.log

ex: 0602-101 Out of memory saving lines for undo.

So you can use below syntax to can open your alert.log :

export EXINIT=”set ll=80000000″

Reference:
Hints and tips when using vi on AIX

↧

LsInventorySession failed: OracleHomeInventory::load() gets null oracleHomeInfo

September 17, 2013, 3:53 am

≫ Next: Message 3511 not found; No message file for product=network, facility=TNSTNS-03505: Message 3505 not found; No message file for product=network, facility=TNS

≪ Previous: ex: 0602-101 Out of memory saving lines for undo.

I faced with that error message while I was trying to apply 10.2.0.4 patch on AIX 6.1

Here is the my steps:

1. Error Message:

[oracle@test]</u01/app/oracle/product/10.2.0/OPatch>opatch lsinventory

Invoking OPatch 10.2.0.3.0

Oracle interim Patch Installer version 10.2.0.3.0
Copyright (c) 2005, Oracle Corporation. All rights reserved..
Oracle Home : /u01/app/oracle/product/10.2.0
Central Inventory : /u01/app/oracle/oraInventory
from : /etc/oraInst.loc
OPatch version : 10.2.0.3.0
OUI version : 10.2.0.3.0
OUI location : /u01/app/oracle/product/10.2.0/oui
Log file location : /u01/app/oracle/product/10.2.0/cfgtoollogs/opatch/opatch2013-09-17_10-42-49AM.log

List of Homes on this system:

Inventory load failed… OPatch cannot load inventory for the given Oracle Home.
Possible causes are:
Oracle Home dir. path does not exist in Central Inventory
Oracle Home is a symbolic link
Oracle Home inventory is corrupted
LsInventorySession failed: OracleHomeInventory::load() gets null oracleHomeInfo

OPatch failed with error code 73

2. Lets check log file

[oracle@test]tail -100f /u01/app/oracle/product/10.2.0/cfgtoollogs/opatch/opatch2013-09-17_10-42-49AM.log
SEVERE:OPatch invoked as follows: ‘lsinventory ‘
INFO:
Oracle Home : /u01/app/oracle/product/10.2.0
Central Inventory : /u01/app/oracle/oraInventory
from : /etc/oraInst.loc
OPatch version : 10.2.0.3.0
OUI version : 10.2.0.3.0
OUI location : /u01/app/oracle/product/10.2.0/oui
Log file location : /u01/app/oracle/product/10.2.0/cfgtoollogs/opatch/opatch2013-09-17_10-42-49AM.log

INFO:Starting LsInventorySession at Tue Sep 17 10:42:49 GMT+02:00 2013
INFO:List of Homes on this system:

SEVERE:OUI-67028:Inventory load failed… OPatch cannot load inventory for the given Oracle Home.
Possible causes are:
Oracle Home dir. path does not exist in Central Inventory
Oracle Home is a symbolic link
Oracle Home inventory is corrupted
SEVERE:OUI-67073:LsInventorySession failed: OracleHomeInventory::load() gets null oracleHomeInfo
INFO:Finishing LsInventorySession at Tue Sep 17 10:42:49 GMT+02:00 2013
INFO:Stack Description: java.lang.RuntimeException: OracleHomeInventory::load() gets null oracleHomeInfo
INFO:StackTrace: oracle.opatch.OracleHomeInventory.load(OracleHomeInventory.java:2296)
INFO:StackTrace: oracle.opatch.LsInventorySession.loadAndPrintInventory(LsInventorySession.java:358)
INFO:StackTrace: oracle.opatch.LsInventorySession.process(LsInventorySession.java:312)
INFO:StackTrace: oracle.opatch.OPatchSession.main(OPatchSession.java:1060)
INFO:StackTrace: oracle.opatch.OPatch.main(OPatch.java:516)

3. As you can see my oraInst.loc has been corrupted, so I need to recreate it. You can use note :Steps To Recreate Central Inventory(oraInventory) In RDBMS Homes (Doc ID 556834.1)

First mv oraInventory folder as oraInventory_orig than

You can use below syntax

./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME=”/u01/app/oracle/product/10.2.0″ ORACLE_HOME_NAME=”TEST”

or can use

or you can use attachHome.sh which is under $ORACLE_HOME/oui/bin path

While I am running both command I got below message:

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oracle/oraInventory
‘AttachHome’ was failed.

We have to check log file which is under oraInventory folder, log shows:

INFO: Setting variable ‘INVENTORY_LOCATION’ to ‘/u01/app/oracle/oraInventory’. Received the value from a code block.
INFO: Created OiicStandardInventorySession.
INFO: Unable to read /oracle10g/app/oracle/oraInventory/ContentsXML/comps.xml. Some inventory information may be lost.
SEVERE: oracle.sysman.oii.oiii.OiiiMissingDependeeException: OUI-10209:Cannot find dependee “oracle.swd.jre 1.1.8.3.0″ of component “Oracle Java Client 10.2.0.1.0 ” in home “0″.
at oracle.sysman.oii.oiii.OiiiInstallInventory.updateCompDependencies(OiiiInstallInventory.java:3848)
at oracle.sysman.oii.oiii.OiiiInstallInventory.attachHomeEx(OiiiInstallInventory.java:3578)
at oracle.sysman.oii.oiic.OiicAttachHome.getOracleHomeInfo(OiicAttachHome.java:164)
at oracle.sysman.oii.oiic.OiicAttachHome.doOperation(OiicAttachHome.java:216)
at oracle.sysman.oii.oiic.OiicBaseInventoryApp.main_helper(OiicBaseInventoryApp.java:767)
at oracle.sysman.oii.oiic.OiicAttachHome.main(OiicAttachHome.java:414)

INFO: ‘AttachHome’ failed.

4. So solution avaliable at note: OUI-10209:Cannot find dependee’ When try to attachHome (Doc ID 1273158.1)

cp $ORACLE_HOME/inventory/ContentsXML/comps.xml $ORACLE_HOME/inventory/ContentsXML/comps.xml_orig

Modify the $ORACLE_HOME/inventory/ContentsXML/comps.xml file so that there are no references to HOME_IDX=”0″ which mean make search at xml file as HOME_IDX=”0″ and remove those lines

5. Try to attachHome again

↧

Message 3511 not found; No message file for product=network, facility=TNSTNS-03505: Message 3505 not found; No message file for product=network, facility=TNS

February 14, 2014, 3:26 am

≫ Next: WAIT FOR EMON PROCESS NTFNS

≪ Previous: LsInventorySession failed: OracleHomeInventory::load() gets null oracleHomeInfo

You may hit TNSTNS-03505 Message 3511 error message while you are trying to use tnsping command.Complete error message is:

Message 3511 not found; No message file for product=network, facility=TNSTNS-03505: Message 3505 not found; No message file for product=network, facility=TNS

You may hit this error while on database machine or machine which only has client installataion.

I have been face this problem on machine which has 11g client. Also the user is not installation user,so We have been created one other user and give that user to development team.

My installation user is oracle, development user is oracle1… Both user group are same(dba)

Even I can use tnsping with oracle user i can get response, but wiht oracle 1 I am hitting below error

With Oracle user:
[oracle@client_machine]</oracle11g/app/oracle/product/11.2.0/client_1/bin>tnsping DB_NAME

TNS Ping Utility for IBM/AIX RISC System/6000: Version 11.2.0.4.0 – Production on 14-FEB-2014 11:08:13

Used parameter files:
/oracle11g/app/oracle/product/11.2.0/client_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = xxx)(PORT = 1525)) (ADDRESS = (PROTOCOL = TCP)(HOST = xxx)(PORT = 1525)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = xxx)))
OK (0 msec)

With Oracle1 user:
[oracle1@client_machine]</oracle11g/app/oracle/product/11.2.0/client_1/bin> tnsping DB_NAME

TNS Ping Utility for IBM/AIX RISC System/6000: Version 11.2.0.4.0 – Production on 14-FEB-2014 11:05:05

Message 3511 not found; No message file for product=network, facility=TNSTNS-03505: Message 3505 not found; No message file for product=network, facility=TNS

So issue has been solved by below steps:

At oracle1 user I have been set below parameter to .profile

export ORACLE_BASE=/oracle11g/app/oracle
export ORACLE_HOME=/oracle11g/app/oracle/product/11.2.0/client_1
PATH=$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:$PATH

[oracle1@client_machine]</oracle11g/app/oracle/product/11.2.0/client_1/bin> tnsping DB_NAME
TNS Ping Utility for IBM/AIX RISC System/6000: Version 11.2.0.4.0 – Production on 14-FEB-2014 11:08:13

If you are on db server than you also need to set ORACLE_SID.

↧

WAIT FOR EMON PROCESS NTFNS

June 19, 2014, 12:54 am

≫ Next: CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node oda2-1, number 1, and is terminating”

≪ Previous: Message 3511 not found; No message file for product=network, facility=TNSTNS-03505: Message 3505 not found; No message file for product=network, facility=TNS

WAIT FOR EMON PROCESS NTFNS

We face this event on our 10.2.0.4.0 database. This is 2 node RAC system which is running on AIX 6.1

We notice that EMON process consuming CPU

So What is EMON process? From oracle doc:

EMNC / Ennn ( EMON Coordinator Process / EMON Slave Process )

EMNC coordinates event management and notification activity in the database, including Streams Event Notifications, Continuous Query Notifications, and Fast Application Notifications.

The database event management and notification load is distributed among the EMON slave processes. These processes work on the system notifications in parallel, offering a capability to process a larger volume of notifications, a faster response time, and a lower shared memory use for staging notifications.

So what that process cause on database side? Here is the answer:

As you can see it cause WAIT FOR EMON PROCESS NTFNS event and those event hangs other sessions,check below :

In oracle documentation its mention this is bug

So how we prevent this event?
We have 2 option

1. We can kill EMON process (Its for 11g but also work for 10g) The emon slave will automatically restart when it is next required to do so.

2. We can set SQLNET.SEND_TIMEOUT=10

It was found that after the SEND_TIMEOUT parameter was adjusted and resolved the EMN process to stop spinning and de-register the subscribers; if customer tried to re-register, the subscribers were removed as soon as new changes took place and could not re-register successfully until the EMN process was killed.

Source:
Event Monitor (EMON) slave process constantly consuming CPU (Doc ID 1603844.1)
Sessions Hang On Wait Event “WAIT FOR EMON PROCESS NTFNS” (Doc ID 1287435.1)
Master Note: Troubleshooting Oracle Background Processes (Doc ID 1509616.1)

↧

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node oda2-1, number 1, and is terminating”

July 9, 2014, 7:25 am

≫ Next: ORA-39002: ORA-39166: during import from full export

≪ Previous: WAIT FOR EMON PROCESS NTFNS

We face with this error while We are making 2 node RAC installation

Db version : 11.2.0.4
Os : AIX 7.1

We run root.sh on node 1 without any issue. While we run root.sh node 2 we face this message:

Adding Clusterware entries to inittab
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node Node1, number 1, and is terminating
An active cluster was found during exclusive start-up, restarting to join the cluster
Configure Oracle Grid Infrastructure for a Cluster … succeeded

So we check crs healthy and also run cluvfy to be sure our system is healthy

# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

# cluvfy stage post -h crsinst -n node1,node2 -verbose
#cluvfy comp crs -n node-1,node-2 -verbose

All passed and seems everything is okey. We found some hints at MOS Troubleshooting 11.2 Grid Infrastructure root.sh Issues (Doc ID 1053970.1).

The note mention, If the root.sh is successful but there are subsequent startup problems, see Note: 1050908.1 for more troubleshooting information. So we try to reebot servet to can see there is problem during startup.. End of the startup there is no any error on our CRS.

So what is this Error message? Here is the answer:

Error CRS-4402 is expected on second node because the CSS is already running on first node and so the css should be started in cluster mode on second node.

Please ignore this error. As per the root.sh output the clusterware install is completed successfully on both the nodes.

Also similar explanations avaliable at:
Oracle® Database Appliance Getting Started Guide

http://docs.oracle.com/cd/E22693_01/doc.21/e22692/toc.htm

Error “CRS-4402: The CSS daemon was started in exclusive mode but found an
active CSS daemon on node oda2-1, number 1, and is terminating”

Cause: This error occurs when the Oracle Grid Infrastructure CSS daemon attempts to start the node as a standalone cluster node, but during start-up discovers that the other cluster node is running, and changes to cluster mode to join the cluster.
Action: Ignore this error

So, If you hit this error message just Ignore the error

↧

ORA-39002: ORA-39166: during import from full export

August 20, 2014, 1:25 am

≫ Next: ORA-02097&ORA-00068

≪ Previous: CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node oda2-1, number 1, and is terminating”

As a DBA We are using export/import utulity many times during our daily process.
I have face this error trying to import only one table from full export.

ORA-39002: invalid operation
ORA-39166: ObjectUSER1.TABLE was not found.

OS: AIX 7.1
DB version : 11.2.0.4 – 2 Node RAC
My syntax was:
impdp system/****@DB_NAME directory=DMP_DIR REMAP_SCHEMA=USER1:restore_user remap_tablespace=USER1_TS:Restorespace tables=USER1.TABLE
dumpfile=MY_RAC_full.dmp logfile=imp.log cluster=n

So it gives:

ORA-39002: invalid operation
ORA-39166: ObjectUSER1.TABLE was not found.

While I checked my full export log, I saw that table has been exported.

After some search issue fix by using below syntax:
impdp system/****@DB_NAME directory=DMP_DIR REMAP_SCHEMA=USER1:restore_user remap_tablespace=USER1_TS:Restorespace tables=”USER1″.”TABLE”
dumpfile=MY_RAC_full.dmp logfile=imp.log cluster=n

↧

ORA-02097&ORA-00068

September 25, 2014, 11:50 pm

≫ Next: ERROR 1033&Archival Error RECEIVED LOGGING ON TO THE STANDBY

≪ Previous: ORA-39002: ORA-39166: during import from full export

As you know sometimes we need to change CPU count dynamically on our prod systems.

Our Unix team change our 2 node RAC system (11.2.0.3 version) server’s CPU from 20 to 30 on AIX 7.1

After that change one of the instance(node 2) goes down with below error messages:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00068: invalid value 6360 for parameter parallel_max_servers, must be between 0 and 3600
CKPT (ospid: 22135): terminating the instance due to error 2097

After makes some search, to be sure We rised SR 1.

Here is the answers:
Our The Default Value of the parallel_max_servers is calculated based on the Following Equation :

PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5

as per : http://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams186.htm#REFRN10158

and the maximum value for this parameter is 3600 , so in some times this equation can lead to higher values than 3600

Please Perform the following Action Plan to resolve this issue :

1- Lower the Value of parallel_max_servers parameter to a reasonable value , for example (10) :

SQL> alter system set parallel_max_servers=300 scope=both sid=’*’ ;

2- Apply Patch 13743987

If a system has a high CPU count and there is no value set for parallel_max_servers then the system may try to set the default value of the parallel_max_servers parameter higher than is allowed.
eg:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00068: invalid value 6360 for parameter parallel_max_servers, must be between 0 and 3600

This can cause an instance shutdown if the number of system CPUs alters dynamically.
eg:
The alert log may show a dynamically detected change of CPU count:
Detected change in CPU count to 159
* Load Monitor used for high load check
* New Low – High Load Threshold Range = [152640 - 203520]
Errors in file /…/trace/+ASM2_ckpt_22135.trc:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00068: invalid value 6360 for parameter parallel_max_servers, must be between 0 and 3600
CKPT (ospid: 22135): terminating the instance due to error 2097

↧