This article describes the supportability of having a
Windows server boot from a Storage Area Network (SAN).
Microsoft supports booting from a Storage Area Network
(SAN) if the SAN vendor supports their particular hardware platform booting a
Windows server. The SAN and host bus adapter (HBA) must be configured according
to the SAN vendor's guidelines and the SAN vendor
must act as the main point of contact for boot-related issues. This
requirement exists because booting from a SAN is extremely complex, and the
vendor needs to support the particular configuration because the SAN vendor
provides the SAN boot supportability statement. It is important to note that
the information that is included in this article is not intended to be an
all-inclusive list of the items that are required to boot from a SAN. The SAN
vendor must provide specific steps, drivers, firmware revisions and resources
about how to make their hardware (storage systems, switches, Host Bus Adapters,
and so on) work properly together.
Configuration
The following issues must be addressed so multiple computers can
successfully boot from a SAN:
Troubleshooting
This section describes several issues that may prevent a Windows
server from successfully booting from a SAN:
- A very common problem when you configure a SAN is that it
is possible that multiple hosts may have access to the same logical disk. This
usually occurs because proper LUN management was not employed. The default
behavior of Windows is to attach and mount every logical unit that it detects
when the HBA driver loads. If multiple hosts mount the same disk, file system
damage can occur. It is up to the configuration of the SAN to insure that only
one host can access a particular logical disk at a time. Symptoms of multiple
hosts accessing the same logical disk are:
Disk Management displays the same logical disk on multiple hosts.Plug and Play notification that new hardware is found may occur on multiple hosts when you add or configure a new logical disk.When you try to access a logical disk by using My Computer or Windows Explorer, you may receive an "Access Denied", "Device not Ready", or similar error message that may indicate that other hosts have access to the same logical disk.
- Your computer stops responding (hangs) or has slow response
times. This can indicate that there is a high latency to the pagefile, and this
may be accompanied by events in the System Log such as:
Event ID: 51
Event Type: Warning
Event Source:
Disk
Description: An error was detected on device \Device\Harddisk0\DR0
during a paging operation.
Event ID: 11
Source:
%HBA_DRIVER_NAME%
Description: The driver detected a controller error on
Device\ScsiPort0.
Event ID: 9
Source: %HBA_DRIVER_NAME%
Description: The device, \Device\ScsiPort0, did not respond within the timeout
period.
If the preceding error messages are in the System Log, it
indicates that Windows was trying to access a disk and there was a problem. If
the disk that is referenced is on the SAN, it could indicate a latency issue.
If an Event ID 51 is shown, this indicates that Memory Manager was attempting
to copy data to or from memory and had a problem. Another indicator of pagefile
latency issues is if the Windows server has a system failure, and either of the
following error messages are displayed on a blue screen: 0x00000050 PAGE_FAULT_IN_NONPAGED_AREA
or
0x0000000A IRQL_NOT_LESS_OR_EQUAL
A possible resolution is to
place the pagefile on the host's local hard disk. Windows needs reliable access
to the pagefile as data is paged in or out of memory. Having the pagefile local
to the host guarantees that access is not influenced by other devices and hosts
on the SAN.
Note If the pagefile is not on the same partition as the boot
partition (typically c:\Windows or c:\WINNT), the creation of a Memory.dmp file
will not occur. A Memory.dmp file is used for troubleshooting a Windows
computer that has a STOP error. For information about how to configure your
computer for a crashdump, see Windows Help.
There are several ways to resolve the preceding problems. The
first method is to try and correlate the time with any events that are
occurring on the SAN. For example, it HostA was doing a large copy operation
and HostB reports the Error 9s, it may imply that proper LUN management is not
in place. Another example is if HostB produces errors whenever HostA is
rebooted. This may indicate that FC-AL is being used and HostB is being
affected by a Loop Initialization Primitive (LIP) sequences from HostA. These
can often be corrected by re-configuring the SAN, and this requires the
assistance of the hardware vendor. Any type of latency issues might be resolved
by placing pagefile on the Windows server's local hard disk, but again, this
disables the creation of a memory dump. A key point to understand is that the
hardware vendor of the SAN will have the most information about the proper
configuration, and must be the first point of contact for all configuration
questions and concerns.
For more information about Windows server clusters in a SAN environment, click the following article numbers to view the articles in the Microsoft Knowledge Base:
304415
(http://support.microsoft.com/kb/304415/
)
Support for multiple clusters attached to the same SAN device
280743
(http://support.microsoft.com/kb/280743/
)
Windows Clustering and geographically separate sites
301647
(http://support.microsoft.com/kb/301647/
)
Cluster Service improvements for storage area networks