Physical Disk - Loss of Path Redundancy

What Caused the Problem?

A communication path with a physical disk has been lost. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.

Caution: Electronic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching components without using a proper ground may damage the equipment.

Important Notes

Recovery Steps

1

Fix any other problems reported by the Recovery Guru before attempting to fix this problem.

2

If...

Then...

The affected enclosure listed in the Recovery Guru Details area contains both RAID controller modules and physical disks

Go to step 7.

The affected enclosure listed in the Recovery Guru Details area contains only physical disks

Go to step 3.

3

To determine the non-working channel, start at the physical disk port on the RAID controller module enclosure corresponding to the working channel (refer to the labels on the back of the RAID controller module enclosure if needed). Trace the cable from the working channel to the EMM module in the affected expansion enclosure reported in the details area.

Caution: Possible loss of data accessibility. Do not disconnect any cables on the working channel. Doing so may cause a possible loss of data accessibility.

4

Blink the other EMM module in the affected expansion enclosure (this is the module on the non-working channel).

5

Replace the EMM module on the non-working channel using the following steps:

a

Label the interface transceivers (GBICs or SFPs). The labels will help you correctly reconnect the cables to the new EMM module.

While the cables are still connected, remove the interface transceivers from the EMM module you are replacing.

b

Remove the EMM module.

Note: The Service Action Allowed status in the Details area is always NO for this problem because the component is not failed. In this situation, it is acceptable to remove the battery even though the Service Action Allowed is NO.

c

Set all switches on the new EMM module to the same values as the old EMM module.

d

Insert the new EMM module into the expansion enclosure.

e

Using the labels created in step a, reconnect the cables to the replaced module. Wait 40 seconds, then go to step 6.

6

Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area.

If...

Then...

The problem has been fixed

You are finished with this procedure. Do NOT go to step 7.

The problem has not been fixed

Go to step 7.

7

You must replace the physical disk. Which procedure you use depends on the RAID level of the disk group associated with the affected physical disk. To determine the associated disk group, highlight the affected physical disk in the Physical View of the Array Management Window and select View >> Associated Elements. Next highlight the associated disk group in the Logical View of the Array Management Window.

If...

Then...

The disk group is RAID 0

Go to "Recovery Steps for Replacing a Physical Disk in a RAID 0 Disk Group."

The disk group is RAID 1/10, 3, or 5

Go to "Recovery Steps for Replacing a Physical Disk in a RAID 1/10, 3, or 5 Disk Group."

Recovery Steps for Replacing a Physical Disk in a RAID 0 Disk Group

Use the following procedure if the affected disk group is RAID 0.

Fix any other problems reported by the Recovery Guru before continuing with this procedure. Note that all virtual disks in the Logical View of the Array Management Window must be Optimal .

1

Stop all I/O to the affected virtual disks.

2

Reseating the physical disk may clear up the path redundancy problem. Remove the physical disk and then re-insert it.

Note: The Service Action Allowed status in the Details area is always NO for this problem because the component is not failed. In this situation, it is acceptable to remove the battery even though the Service Action Allowed is NO.

3

Wait 40 seconds, and then click the Recheck button to rerun the Recovery Guru to ensure that the problem has been fixed.

If...

Then...

The problem has been fixed

You are finished with this procedure. Do NOT go to step 4.

The problem has not been fixed

Go to step 4.

4

Back up all data on the affected virtual disks. (Step 7 will destroy all data on the affected virtual disks.)

Note: To the operating system (OS), a failed virtual disk is the same as a failed non-RAID physical disk. Refer to the OS documentation for requirements concerning failed physical disks and apply them where necessary.

5

If any of the affected virtual disks are also source or target virtual disks in a copy operation that is either Pending or In Progress, you must stop the copy operation before continuing.

Go to the Copy Manager by selecting Virtual Disk >> Copy >> Copy Manager, then highlight each copy pair that contains an affected virtual disk and select Copy >> Stop.

6

If you have snapshot virtual disks associated with the affected virtual disks, these snapshot virtual disks will no longer be valid once you fail the physical disk in step 8.

If necessary, perform any operations on the snapshot virtual disks and then delete them.

7

Caution: Possible loss of data accessibility. Transitioning virtual disks to failed may cause the loss of accessibility to data on the virtual disks. Make sure that you back up all data on the affected virtual disks before starting this step.

Highlight the affected physical disk in the Physical View of the Array Management Window and select Advanced >> Recovery >> Fail Physical Disk. The affected virtual disks become Failed .

8

Remove the failed physical disk (its fault indicator light should be on).

Note: Make sure the replacement physical disk has a capacity equal to or greater than the failed physical disk.

9

Wait 30 seconds, then insert the new physical disk. Its fault indicator light may be lit for a short time (one minute or less).

Note: Wait until the replaced physical disk is ready (its fault indicator light must be off) before attempting to initialize the virtual disks in step 10.

10

Highlight the disk group associated with the replaced physical disk in the Logical View of the Array Management Window and select Advanced >> Recovery >> Initialize >> Disk Group.

  • The virtual disks in the disk group are initialized, one at a time.
  • To monitor initialization progress for a virtual disk, highlight the virtual disk in the Logical View of the Array Management Window and select Virtual Disk >> Properties. Note that when the initialization is completed, the progress bar is no longer displayed.
  • When initialization is completed, all virtual disks in the disk group are Optimal .

Important: Make sure you save this procedure by selecting Save As. Once you fix the failure, you will not be able to access the information from Recovery Guru.

11

Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area.

If...

Then...

The problem has been fixed.

a

If desired, create any snapshot virtual disks that you deleted in step 6.

b

If desired, re-create any copies you stopped by highlighting the copy pairs in the Copy Manager and selecting Copy >> Re-Copy.

c

Add the affected virtual disks back to the operating system. You may need to reboot the system to see the re-initialized virtual disks.

Note: Do not start I/O to these virtual disks until you have restored data from backup

d Restore the data for the affected virtual disks from backup.

e

You are finished with this procedure.

The problem has not been fixed.

There is a problem with the RAID controller module. Go to "Recovery Steps for Replacing a RAID Controller Module."

Recovery Steps for Replacing a Physical Disk in a RAID 1/10, 3, or 5 Disk Group

Use the following procedure if the affected disk group is RAID 1/10, 3, or 5.

1

You should stop all I/O to all virtual disks in the disk group associated with the affected physical disk to reduce the possibility of data loss. If another physical disk fails in this disk group while you are performing this procedure, you will lose data.

2

Reseating the physical disk may clear up the path redundancy problem. Remove the physical disk and then re-insert it.

Note: The Service Action Allowed status in the Details area is always NO for this problem because the component is not failed. In this situation, it is acceptable to remove the battery even though the Service Action Allowed is NO.

3

Wait 40 seconds, and then click the Recheck button to rerun the Recovery Guru to ensure that the problem has been fixed.

If...

Then...

The problem has been fixed

You are finished with this procedure. Do NOT go to step 4.

The problem has not been fixed

Go to step 4.

4

Although not required, you should back up all data on all virtual disks associated with the affected physical disk.

5

Highlight the affected physical disk in the Physical View of the Array Management Window and select Advanced >> Recovery >> Fail Physical Disk. The associated virtual disks become Degraded .

6

Remove the failed physical disk (its fault indicator light should be on).

Note: Make sure the replacement physical disk has a capacity equal to or greater than the failed physical disk.

7

Wait 30 seconds, then insert the new physical disk. Its fault indicator light may be lit for a short time (one minute or less).

8

Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area.

If...

Then...

The problem has been fixed.

You are finished with this procedure.

The problem has not been fixed.

There is a problem with the RAID controller module. Go to "Recovery Steps for Replacing a RAID Controller Module."

Recovery Steps for Replacing a RAID Controller Module

Important: The RAID controller module replacement recovery steps should only be attempted after ALL other options have been exhausted.

Use the following procedure to replace a RAID controller module to resolve a loss of path redundancy condition.

If... Then...
Your storage array has one RAID controller module Go to "Replacing a RAID Controller Module in a Single-RAID Controller Module Storage Array."
Your storage array has two RAID controller modules Go to "Replacing a RAID Controller Module in a Dual-RAID Controller Module Storage Array."

Replacing a RAID Controller Module in a Single-RAID Controller Module Storage Array

1

Ensure that your replacement RAID controller module matches the RAID controller module in the storage array. If you do not have a RAID controller module with the appropriate replacement part number, contact your technical support representative.

2

Stop all I/O to this storage array.

3

Turn off power to the affected enclosure.

4

Remove the affected RAID controller module. Refer to the Enterprise Management Window (EMW) to view which management method you are using to manage this storage array.

If... Then...
You are using In-Band management for ALL hosts attached to this storage array Go to step 5.
You are using Out-of-Band management for ANY host attached to this storage array Before you insert a new RAID controller module module into the storage array, you must update the DHCP/BOOTP server so that it will associate the new RAID controller module's hardware Ethernet (MAC) address with the DNS/network name and IP address previously assigned to the removed RAID controller module.

To update the DHCP/BOOTP server, find the entry associated with the removed RAID controller module and replace its Ethernet (MAC) address with the new RAID controller module's Ethernet (MAC) address. The RAID controller module's Ethernet (MAC) address is located on an Ethernet ID label on the RAID controller module module in the form xx.xx.xx.xx.xx.xx.

When you are finished, go to step 5.

5

If... Then...
The RAID controller module for this storage array is located in an enclosure containing both RAID controller modules and physical disks Check to see if the new RAID controller module module contains a battery.
  • If your model of storage array does not contain batteries, go to step 6.
  • If your model of storage array is supposed to contain batteries and...
    • there is not a battery installed in the new RAID controller module module, then install the battery from the old module, and go to step 6.
    • there is a battery installed in the new RAID controller module module, then go to step 6.
The RAID controller module for this storage array is located in an enclosure containing only RAID controller modules Go to step 6.

6

a

Make sure at least one minute has elapsed. Then, insert the new RAID controller module module firmly in place.

b

Turn on power to the affected enclosure.

c

Note the RAID controller module slot (A or B) of the affected RAID controller module listed in the Recovery Guru Details area. Highlight this RAID controller module slot in the Physical View of the Array Management Window (AMW).

d
If... Then...
The RAID controller module indicates that it is Online Go to step e.
The RAID controller module indicates that it is Offline Select Advanced >> Recover >> Place RAID Controller Module >> Online and then go to step e.

e

If... Then...
The RAID controller module for this storage array is located in an enclosure containing both RAID controller modules and physical disks Determine whether you need to reset the battery age.
  • If your model of storage array does not contain batteries and is supposed to, go to step 7.
  • If your model of storage array is supposed to contain batteries and...
    • you installed the battery from the old RAID controller module module, then you do not need to reset the battery age. Go to step 7.
    • there was already a battery in the replacement RAID controller module module, then you must reset the battery age using the following procedure:

      Select the Components button on the enclosure containing the RAID controller modules in the Physical View of the Array Management Window. Highlight the batteries option and select the Reset button associated with the new RAID controller module module (A or B). Then, go to step 7.

The RAID controller module for this storage array is located in an enclosure containing only RAID controller modules Go to step 7.

7

If you have virtual disks mapped to hosts that have Automatic Virtual Disk Transfer (AVT) disabled, it may be necessary to redistribute the virtual disks to their preferred RAID controller module. Use the following steps to determine the AVT status of the hosts connected to your storage array:

a

Open the Storage Array Profile by selecting the Storage Array >> View Profile menu option from the Array Management Window. Then, select the profile's Mappings tab.

b

Scroll to the NVSRAM Host Type Internal Definitions section.

If... Then...
There are hosts mapped to the virtual disks on this storage array that have an AVT status of disabled

OR

There are hosts mapped to the virtual disks on this storage array that are not running a host-based, multi-path failover driver

It may be necessary to redistribute the virtual disks to their preferred RAID controller module. If the Array Management Window's Advanced >> Recovery >> Redistribute Virtual Disks menu option is available, select the option.

Note: If you have a mix of hosts with AVT enabled and AVT disabled, all virtual disks will be immediately assigned back to their preferred path. However, until the host-based multi-path failover driver detects the valid preferred path (may take several minutes), the virtual disks mapped to the AVT-enabled hosts may get temporarily returned back to the non-preferred path.

If the menu option is not available (grayed out), the virtual disks are already associated with their preferred RAID controller modules and no action is needed.

Go to step 8.

There are NO hosts mapped to the virtual disks on this storage array with an AVT status of disabled

OR

All hosts mapped to virtual disks on this storage array are running a host-based multi-path failover physical disk

No action is required.

If virtual disks need to be redistributed to their preferred RAID controller module, the host-based, multi-path failover driver will automatically initiate the transfer.

Note that detection of a restored preferred path by the multi-path failover driver can take several minutes.

Got to step 8.

8

Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your technical support representative.

Replacing a RAID Controller Module in a Dual-RAID Controller Module Storage Array

1

Determine which is the affected RAID controller module by locating the non-working channel. Refer to step 3 at the beginning of this recovery procedure for details on how to locate the non-working channel.

2

Place the affected RAID controller module offline.
a

Highlight the RAID controller module containing the battery near expiration in the Physical View of the Array Management Window.

b

Select Advanced >> Recovery >> Place RAID Controller Module >> Offline.

c

Select Yes in the Place Offline confirmation window.

d

Go to step 3.

3

Read all of the following steps before taking any action.
a Click the Recheck button to rerun the Recovery Guru.
b Select the Offline RAID Controller Module problem that is being reported in the Summary area.
c Complete the Recovery Steps in the Offline RAID Controller Module to replace the RAID controller module.

4

Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your technical support representative.