To be honest, I got a bit confused with the various 64-bit CPUs (like why didn’t Intel and HP’s Itanium take off, but AMD’s AMD64 did and Itanium 2 looks like it will too), but whatever the hardware issues, it seems that x64 software has finally come of age. Paul Thurrott reports in his Windows IT Pro magazine network WinInfo Daily Update that, at the IT Forum this week, Microsoft announced that the Longhorn Server wave of products will be 64-bit only (except Longhorn Server itself, which will be available in both 32- and 64-bit flavours). That means that, for example, the next version of Exchange Server (codenamed Exchange 12) will only run on a 64-bit platform. There’s no news yet as to what is happening on the desktop (except that it seems, like Windows XP, Windows Vista will be available in both 32- and 64-bit editions) but it looks like I’d better get saving for a new PC…
Exchange Server 2003 SP2 is now available
Back in July, I blogged about what to expect when Microsoft ships service pack 2 (SP2) for Exchange Server 2003. I’ve just heard that SP2 has been released for download – read more on the You Had Me at EHLO (Microsoft Exchange team) blog or check out the top 10 reasons to deploy Exchange Server 2003 SP2 on the Microsoft website.
Exchange Server RFC and standards compliance
Members of the “oppose Microsoft group” often deride the software giant, accusing them of implementing proprietary technologies to abuse their monopoly; but in recent years there has been a real shift towards standards-based technology implementations in Microsoft software. Like Microsoft Exchange Server, the messaging and collaboration platform, which implements over 50 RFCs/standards, as detailed in Microsoft knowledge base article 262986.
Preview of the new features expected in Exchange Server 2003 service pack 2
My colleague Neil Chapman sometimes blogs about Exchange Server’s mobility features, including some of what is coming in Exchange Server 2003 service pack 2 (SP2) later this year. Microsoft have also published a preview of the new features we can expect to see including:
- Mobile e-mail improvements (Neil is best placed to comment on these).
- Better protection against unsolicited commercial e-mail (commonly known as spam) with an updated intelligent message filter (IMF) and support for sender ID (which has now been approved by the Internet engineering steering group – the approval board of the Internet engineering task force – as an experimental standard, along with the competing sender policy framework technology).
- Mailbox advancements (most significantly the raising of the 16Gb information store limit on Exchange Server 2003 standard edition to 75Gb, new features for enforcing cached mode, and a new offline address book format).
New messaging and collaboration tools from Microsoft
I’m yet to be convinced of the business benefits of instant messaging (IM). My current employer doesn’t prohibit IM – in fact it is encouraged – I use Microsoft’s MSN Messenger service, as do many of my colleagues. I suspect the reason we that we haven’t implemented a corporate IM solution is cost.
According to IT Week, research conducted by Telewest business has found that due to security concerns only a third of UK companies allow staff access to IM. Many other companies are still deciding what their corporate messaging policy should be, but with the rising incidence of spam over IM (spim), ignorance of IM is no longer an option.
For those large enterprises that do allow IM, using the free services from Microsoft, Yahoo!, AOL and others are simply not an option (in fact they are a liability) and if IM is to become a business tool, a corporate IM infrastructure needs to be provided. For many years, Microsoft has produced a variety of chat-like products under the Exchange Server banner, but they were removed from Exchange Server 2003 and replaced with a new product – Microsoft Office Live Communications Server (LCS) 2005, which provides corporates with IM and presence capabilities.
Earlier this month, Microsoft revealed their vision for collaboration with a new product on the horizon – Microsoft Office Communicator 2005 (previously codenamed Istanbul) – supporting all of the current IM capabilities plus PC-to-phone integration and “rich presence awareness” (the ability to route calls by the most appropriate medium – fixed-line, mobile or IP voice, IM, e-mail, video or web conferencing). Microsoft will back up Office Communicator with a service pack for LCS due later this month and including enhancements such as IM spam (spim) controls, auditing (to address regulatory concerns), compatibility with Microsoft Operations Manager (MOM), HTTPS access (removing the need for VPN connections) and public IM connectivity (the ability to communicate with MSN Messenger, Yahoo! Messenger and AOL Instant Messenger clients). Alongside all of this, is Microsoft Office Live Meeting 2005, an upgrade to Microsoft’s web conferencing service, offering call controls for audio conference service providers and the ability to conduct live meeting sessions within Microsoft Office (in the UK this made available as a hosted service from BT, with per-minute, named user or per-seat tariffs – there is a Flash-based demonstration on the BT website).
Taken together with related initiatives, such as Exchange 12, which is expected to manage PBX-based phone messages, and the constantly increasing collaboration functionality within the Microsoft Office System, Microsoft’s efforts are wide-ranging and long-term.
Performing unattended Exchange Server installations
One of my colleagues sent me this useful link for information on how to create and edit Exchange Server 2003 unattend files.
10,000 feet view of Microsoft Exchange Server 2003
For anyone who is new to Exchange Server 2003, here’s a brief overview that may be of use.
Microsoft Exchange Server 4.0 was launched in 1996, as a replacement for Microsoft Mail 3.x. It was Microsoft’s first groupware product, competing directly with Novell GroupWise and Lotus Notes. Later versions (v5.0, v5.5) added functionality and improved the scalability of the solution, with Exchange Server 5.5 being the version which really took a hold on the market. Exchange 2000 (v6.0) was a major rewrite, leaving behind its own directory service and using Active Directory instead (which is based on the Exchange Directory technology), and featuring a new system of storage groups supporting multiple mailbox and private stores (for improved database backup and restoration times). Exchange 2000 also switched its internal message transport protocol from X.400 to SMTP, making use of (and extending) the SMTP features in Microsoft Internet Information Services (IIS). Microsoft Exchange Server 2003 (v6.5) builds on the scalability, reliability, performance and manageability of Exchange Server 2000 and is the current version, albeit without conferencing, instant messaging and chat features, which have been replaced by the Microsoft Office Live Communications Server product.
The Exchange environment is known as an organization. Within an organization, Exchange Server 4.0-5.5 divides the infrastructure into one or more sites (similar to the Active Directory site concept), linked by connectors of varying types (e.g. RPC, X.400). Exchange Server 2000 and 2003 use routing groups instead of sites, and also introduce the concept of administrative groups for organisations where administration is undertaken by different groups of staff (e.g. in a global organisation with separate teams for Americas, EMEA and Asia-Pacific).
Exchange servers may be configured for a variety of specialist purposes:
- A bridgehead, is a server dedicated to routing e-mail for a routing group.
- Front end (protocol) servers act as concentrators for client connections, where protocol conversion would be an overhead (e.g. HTTP connections using Outlook Web Access).
- Back end (storage) servers could be mailbox servers, public folders servers, or a combination of the two.
- There may also be other servers dedicated to providing services such as fax connections, mail archival and retrieval (near-line storage), or instant messaging.
Because Exchange Server 2000 and 2003 use Active Directory, Exchange servers require a global catalog server to be placed nearby. Other products, such as Microsoft ISA Server 2004 may also be used to increase security, for example filtering inbound e-mail, or publishing an Outlook Web Access (OWA) server.
Links
Microsoft Exchange Server
Microsoft Exchange Server resource site
Microsoft Exchange Server 2003 troubleshooting and disaster recovery
A couple of nights back, I attended one of the Microsoft TechNet UK events. Not sure whether to attend John Howard’s session on automating Windows Server Administration or Eileen Brown’s Microsoft Exchange Server 2003 Troubleshooting and Disaster Recovery session, I decided to return to my Microsoft Exchange roots (which go back to the Exchange Server 4.0 launch events in April 1996). In addition, Eileen has posted some useful links relating to the session content on her blog.
Configuring recovery options and general troubleshooting tools
A common issues for an e-mail administrator is the recovery of e-mail which a user has accidentally deleted, and less commonly, the need to recover a deleted mailbox. Fortunately, Exchange has a number of options which can assist with this. There is a trade off between giving users the opportunity to recover data and additional storage on the e-mail server, but because of single instance storage, in reality this is not as big an issue as it might seem.
For a mailbox store, there are three main configuration items of interest:
- Keep deleted items for (days) defines the time for which a deleted item is still recoverable from within Outlook, even if the deleted items folder has been emptied (for options in the use of this feature, see KC Lemson’s blog).
- Keep deleted mailboxes for (days) is similar, but defines the number of days that a mailbox will remain (orphaned and available for reconnection to an Active Directory user account), until it is finally removed from the store.
- Do not permanently delete mailboxes and items until the store has been backed up ensures that regardless of the deleted item retention and mailbox retention intervals, nothing is finally removed until a full backup of the store has successfully taken place.
For a public folder store, keep deleted items for (days) and do not permanently delete items until the store has been backed up are the equivalent options.
All of these options are found on the limits page of the store properties.
There are a number of troubleshooting tools available to the Exchange administrator:
- Disaster recovery setup mode (note that this doesn’t work on a cluster), can be used to reconnect mailboxes with a store.
- The application log in Event Viewer is useful (in particular, watch out fr 1012, 1018, 1019 and 1020 errors – the Exchange Information Store often gives information about database issues well in advance of failure).
- Diagnostics and protocol logging, which is an overhead on the server and should only be enabled when diagnosing an issue, allows the level of logging to be tuned. Even set to none, critical events are logged, with minimum, medium and maximum corresponding to the level of events that are logged.
- Message tracking can be used to track messages through the system, optionally recording the subject of the message in the logs. The main consideration with this is the number of days for which a message should be retained.
- Exchange System Manager’s monitoring and status tools allow monitoring of services or resources (e.g. thresholds for queue length growth) and alert notification, sending e-mail and/or executing a script (a useful alternative if e-mail is unavailable!) to notify an administrator of issues or even to automatically take corrective action.
When backing up Exchange (whether using the Backup Utility for Windows, or a third party tool), there are a number of issues to consider:
- A backup is not complete unless it includes the mailbox and public folder stores, with the transaction logs and the Windows system state. In addition, mailbox servers should never have circular logging enabled as storage prices have dropped considerably since the days of Exchange Server 4.0 and without a complete set of logs, recovery would still result in some lost data. For front end servers, Microsoft recommend that there are no stores present.
- If co-existing with Exchange Server 5.5, the Site Replication Service (SRS) also needs to be considered.
- Connectors may also contain information to be backed up.
- Recovery will typically take twice a long as backup, so keep backup times short. If quotas are used to limit mailbox usage beware as they might just lead to users storing mail in personal folder (.PST) files, leading to unmanaged offline storage (i.e. not backed up), a loss of singe instance storage, and possibly network bandwidth issues if the personal folders are stored on a network server.
Further information (and best practice) is contained in the Disaster Recovery Operations Guide for Exchange Server 2003.
Troubleshooting Internet E-mail
One of the things to remember when troubleshooting Exchange issues is that there are so many external factors to consider. Besides the obvious areas of Exchange and the e-mail client (usually Outlook), DNS and the underlying network can create issues.
When examining the process of receiving inbound e-mail, Exchange doesn’t actually do much! The originating server looks up the IP address which corresponds to the mail exchanger (MX) record for the SMTP domain in DNS. The message is then routed across the Internet based on that TCP/IP address and it is only once the message has been received (possibly via a smart host) that Exchange routes the message within the organisation for final message delivery. For outbound e-mail, it is the reverse process and from this we can tell that the two main areas to look at are DNS and TCP/IP.
The TCP/IP troubleshooting process is well known:
- Check the TCP/IP properties for the Exchange server. Are they complete?
- Can you ping localhost (127.0.0.1)? If not, there would appear to be an issue with the network card or protocol stack.
- Next, can you ping the server by its own IP address? If not, there would appear to be an issue with the server’s TCP/IP address – is that configured correctly?
- Next, can you ping the default gateway? If not, there would appear to be either an incorrectly configured router (default gateway) address, or a physical network issue (is the cable plugged in?)
- Finally, can you ping other hosts on the network – e.g. the DNS server? If not, there may be a routing issue (or the DNS server addresses could be incorrect).
Additional troubleshooting steps for mail servers are:
- Can you connect to port 25 on the mail server using Telnet? If not, then the SMTP service may not be running.
- Are the server’s host (A) and MX records correctly recorded in both the internal and external DNS, with the correct priority (1 is the lowest cost).
Other areas to examine are:
- Is a DNS suffix required and/or set in the TCP/IP properties?
- Does the computer name have the correct fully qualified domain name (FQDN) in the system properties.
- Is DNS working correctly (NETDIAG is a useful command, in particular netdiag /test:dns can be used to identify DNS issues, after which NSLOOKUP can be used to query DNS).
- Address spaces, e.g. if an organisation hosts two or more domain names, are they all configured with MX records and do users have corresponding e-mail addresses.
- Size restrictions – both internal and external restrictions can be set. If large messages are not being received, this could be the issue. Note that SMTP virtual server settings can be overridden by global settings.
- If the SMTP queues have a lot of retries pending, this will often indicate a DNS issue.
Recovering messages and mailboxes
There are a number of mailbox recovery tools available to an Exchange administrator:
- Once an Active Directory account is deleted or a mailbox removed in Exchange System Manager, the mailbox is not actually removed, but is tombstoned (shown with a red cross in System Manager and the retention time for deleted mailboxes begins). This action is carried out by the Cleanup Agent, which may be triggered manually if it has not completed its next scheduled run before a mailbox needs to be recovered.
- The mailbox recovery center allows an administrator to mount a (recovered) store and view all the disconnected mailboxes, from where a matching user account can be found using the Exchange Mailbox Matching Wizard and the mailbox reconnected using the Mailbox Reconnection Wizard.
- The Exchange Server Mailbox Merge Wizard (ExMerge) can be used to merge data into a mailbox; however the recover mailbox data functionality (new with Exchange Server 2003 SP1) replaces the need to use ExMerge in the majority of recovery cases.
- Offline folder (.OST) to personal folder (.PST) conversion has now been superseded by the recovery storage group.
Tip: “object not found” errors when Outlook synchronises with the Exchange Server are often caused by invalid entries in the default address book. Rebuilding this will usually resolve such issues.
Recovery storage groups
To use the recovery storage group feature, at least one Exchange Server 2003 server must be available within the organisation. This allows an administrator to create a recovery storage group, into which a database can be mounted for mailbox recovery, avoiding the need to recover on a separate recovery server and export e-mail via a .PST file; however recovery storage groups do have some limitations (after all, they are intended to be used purely for the purposes of recovering data):
- All protocols except MAPI (required for the Microsoft Exchange Information Store service to access the storage group) are disabled.
- Mailboxes cannot be directly connected to user accounts (except using ExMerge).
- No management policies are available (not necessary as no live users).
- No Exchange maintenance procedures are available (ESEUTIL/ISINTEG).
- Databases must be mounted manually (e.g. to run ExMerge).
- Database locations cannot be changed (but database files are not server/location specific and can be copied manually).
- Only private mailbox stores can be recovered (i.e. not public folder stores)
Exchange Server 2003-aware backup programs will automatically restore to a recovery storage group.
In a disaster recovery scenario, an Exchange administrator could perform what is known as a dial-tone database restoration. This involves creating an empty database and mounting this so that users can continue to send and receive e-mail whilst their original data is recovered. Meanwhile, the failed database can be restored to a recovery storage group and the recover mailbox data feature or ExMerge used to restore the data to the user’s mailbox whilst both stores are online. To save time in the recovery (albeit involving some more user downtime whilst the databases are swapped), it may be appropriate to swap the database files, remount the original store and then merge in the new data from the dial-tone database. Eileen Brown’s blog features a blogcast demonstrating the recovery storage group which explains the process in further detail.
Further information on using Exchange Server 2003 recovery storage groups is available on the Microsoft website.
Database corruption and recovery
Each Exchange Server storage group has its own set of transaction logs, which can be replayed to recover data up to the point at which failure occurred. In my recent post about Exchange Server best practice and preventative maintenance, I wrote about the ESEUTIL and ISINTEG tools. Using ESEUTIL, it is possible to examine the message headers (eseutil /mh database.edb
) and examine the resulting output to check the state of the store (clean or dirty) and whether or not any logs are required to be replayed.
To replay the logs, simply ensure that they are available in the correct location and mount the store. Following this, the application event log should record a number of events indicating that it is initiating recovery steps and replaying the logs before recording a successful completion.
Exchange Server best practice and preventative maintenance
Until fairly recently, Exchange was my main area of technical expertise, but since I joined Conchango, I’ve been working in other areas and my Exchange skills have become a little rusty. That was until a couple of nights back, when I attended a Microsoft TechNet UK event, where Paul Bowden (Exchange Product Manager) demonstrated the Microsoft Exchange Server Best Practices Analyzer tool (ExBPA) before Brett Johnson (one of Microsoft’s escalation engineers in the UK) talked about best practices of Exchange Server preventative maintenance.
Microsoft Exchange Server Best Practices Analyzer tool
Analysis of support incidents logged with Microsoft has shown that only 0.3% result in the generation of a hotfix and 60% are configuration errors. The ExBPA is a tool which analyses Exchange Server for the top configuration issues in a manner which is a hybrid of a proactive health check and reactive diagnosis.
ExBPA was not the first best practice analyser from Microsoft – that was the Microsoft SQL Server Best Practices Analyzer (SQLBPA), launched in May 2004 – and BPAs will eventually be produced for all Microsoft products within the Windows Server System.
The design principles used for the creation of ExBPA were:
- Concentrate on performance, scalability and availability – whilst the ExBPA does examine some security mis-configurations, e.g. open relays or too many administrators, it does not look for the latest patch levels – the Microsoft Baseline Security Analyzer (MBSA) performs that function.
- Make it easy to run – previous tools were not particularly easy to set up and ExBPA is designed on a 3-click principle (from startup to scan).
- Don’t leave me hanging – i.e. don’t just provide a strange message and a link to a Microsoft knowledge base article – provide some useful information in relation to the tool’s findings.
- Keep it up-to-date – ExBPA automatically downloads its web update packs, which are published every two weeks.
- Work in all environments – ExBPA works from single server Microsoft Small Business Server implementations right through to enterprise Exchange Server deployments.
The ExBPA can be run against on all versions of Exchange Server, although for versions prior to Exchange Server 2000 it does require that Active Directory and at least one Exchange 2000 or 2003 server are available. The tool is implemented in Visual C#, with an XML input/output data model and an XPath analysis engine. There are no server components and the tool is generally run from a Windows XP computer, collecting the data remotely. More architecture information is available in the ExBPA overview on the Microsoft website.
ExBPA is not a monitoring tool – that is Microsoft Operations Manager Server (MOM), for which there is an Exchange Server 2003 Management Pack. ExBPA provides a snapshot in time, looking for data in:
- Active Directory.
- DNS.
- WINS.
- Registry (there are over 1200 registry parameters for Exchange Server 2003).
- IIS metabase.
- Performance monitor.
- Files on disk.
- TCP/IP ports.
The first pass is a data collection and a subsequent pass is made on this for analysis against defined rules.
ExBPA understands a number of Exchange Server roles:
- Small mailbox servers.
- Large mailbox servers.
- Clustered servers.
- Front-end servers.
- Bridgehead servers.
Advice is adjusted accordingly (e.g. circular logging off for mailbox servers but on for a bridgehead) and ExBPA reports on a variety of rule types:
- Errors (something is causing, or is likely to cause a problem).
- Warnings (something looks suspicious).
- Non-default (something has been changed).
- Time (something has changed within the last 5 days).
- Information (something of interest about the environment).
- Best practice (in ExBPA v2.0, due for release within the next 3 weeks).
When running, the ExBPA will automatically detect the closest global catalog (GC) server and the credentials of the current logged on user (although these can be modified if required). The type of scan can be set to one of three options (heathcheck, connectivity test or baseline) and the network speed must be set, both to provide an estimate of the time left to run and to set appropriate thresholds for timeouts, etc. Once the elements of the organisation to be monitored are selected, the ExBPA will run (it is multi-threaded, using up to 25 threads) and following a successful analysis, a number of reports are available:
- Critical issues list.
- Full issues list.
- Non-default settings.
- Recently-changed settings.
- Baseline.
- Items of interest.
- Summary view.
- Detailed view.
- Disabled items list.
- Run time log.
Some of the success stories from using the ExBPA to identify issues include:
- Incorrectly configured DNS server address causing poor performance (even with secondary and tertiary addresses in place, Exchange will always try to contact the primary DNS server first – if that is down, or the IP address is not correct, then that means that every lookup request will first be tried against an invalid entry, before the secondary DNS entry is attempted).
- Poor performance due to placement of database files on compressed disk volumes (even though they were on a high performance SAN).
- Circular logging enabled on a 12,000 user Exchange cluster (had been enabled prior to migration from the old servers to prevent excessive log generation, but was not disabled afterwards).
- Incorrect memory configuration generating 9582 Event ID errors, leading to a server restart every two weeks.
ExBPA v1.0 was released in September 2004, with 1200 points collected using 800 rules. ExBPA v1.1 followed in December 2004, with some usability improvements and 1300 points using 900 rules. ExBPA v2.0 is due for release in March 2005 and will add:
- Localisation for all languages in which Exchange Server is available.
- Performance sampling and root cause analysis (how close to the limit is the server).
- Administrative API support (when was the last backup).
- Operational integration with MOM 2005.
- Export in XML HTML or CSV format.
- New baseline logic.
ExBPA v3.0 is already being planned for release later in 2005, with new features including more rules and refinements, and a MAPI.NET collector.
The web update pack for the ExBPA is a 650Kb XML file and just some of the elements that the ExBPA checks today are:
- Active Directory (forest) – functionality level, Exchange schema extensions, default policy changes.
- Active Directory (domain) – functionality level, renamed domains, FSMO availability, renamed/deleted/moved Exchange system containers and/or groups.
- Active Directory Connector – state of the connector (overloaded/idle/newer version available/service pack level), connection agreements (orphaned/set never to run/missing server/one-way/out-of-date).
- Exchange organisation – message size limits enforced, stray Exchange objects in LostAndFound, more than 10 Exchange administrators, ForestPrep version, mixed/native mode, Outlook Mobile Access (OMA) options, Exchange Archive Solution (EAS) options, unsolicited commercial e-mail (UCE) thresholds, recipient update service definitions, address list and offline address book (OAB) definitions.
- Exchange administration groups – validity of legacyExchangeDN, policy containers intact.
- Exchange routing groups – valid routing master, enumerate all connectors, recently changed connectors.
- Exchange server – server name validity, fully qualified domain name (FQDN) and NetBIOS name resolution, service pack/rollup level, time synchronisation with Active Directory.
- Cluster configuration (active and passive) – number of nodes, configuration discrepancies, temporary paths, quorum configuration, heartbeat configuration, DNS/WINS configuration, enumerates all resources and parameters, kerberos configuration.
- Directory access – cache configuration and non-default parameters, cache efficiency, round trip times between Exchange and each domain controller (DC)/GC, hardware configuration of each DC/GC, GC to Exchange processor ratio.
- Information store – extensible storage engine (ESE) cache configuration, virtual memory state, online maintenance window, checkpoint depth, circular logging state, log buffer configuration, log generation level, file system characteristics (compression/encryption), validity of legacyExchangeDN, database and logs on the same disk, content indexing state, non-default parameters in private/public GUID registry, database size, e-mail address on public folder stores, remote procedure call (RPC) compression/buffer packing settings, hard-codes TCP/IP ports and clashes with other Exchange ports, non-default and bad store process parameters.
- Transport – main configuration parameters within Active Directory, cross-check of AD and metabase for consistency, non-default settings, file system characteristics (compression/encryption) for mailroot folders, SMTP stack verb validation, SMTP mail submission test, enumeration of transport event sinks, enumeration of MTA settings, detection of archive sink and configuration, non-default routing parameters.
- System Attendant – service state, file system characteristics (compression/encryption) for message tracking folders, request for response (RFR) service, RFR/name service provider interface (NSPI) target server configuration, hard-coded TCP/IP ports.
- Anti-virus support – product detection, configuration and patch level (product dependent).
- Other installed applications – RPC client/server binding order, presence of LeakDiag, old versions of Simpler-Webb Exchange Resource Manager, ISA 2000 service pack level, presence of MOM agent.
- Hardware configuration – BIOS less than one year old, processor configuration, physical memory installed, specific support for Dell, HP and IBM servers.
- Disks – performance counters enabled, enumeration of physical/logical disks, enumeration of mount points, enumeration of disk controllers and driver levels, host bus adaptor (HBA) configuration, SAN multi-pathing software version.
- File versions – verify 29 key Exchange binaries (present/not too old/hotfixes), check MAPI subsystem/presence of old rollups/presence of ESE API virus scanners.
- Hotfixes – detect all installed hotfixes for Exchange Server 5.5/2000/2003 and Windows 2000 Server/Windows Server 2003, identify any updates installed within the last 5 days and the logon name of the user that performed the installation.
- Network – enumeration of all network cards and check NIC connection status, DNS/WINS configuration, IP gateway settings, primary DNS and domain suffix.
- Operating system – page table entry (PTE) levels, paged/non-paged pool configuration, CrashOnAuditFail configuration, HeapDeCommitFreeBlock threshold, temporary paths, SystemPages configuration, /3GB and /USERVA configuration, physical address extensions (PAE), version and SKU (i.e. Standard, Enterprise, etc.), Dr. Watson configuration, debug settings, Virtual PC/Virtual Server/VMware detection.
This is not an exhaustive list and changes with each web update pack.
Preventative Maintenance
As mentioned previously, 60% of Exchange incidents reported to Microsoft are traced back to people or process, and not the technology itself. Additionally, mission critical software needs to run on good hardware as the reliability of the system is only as good as the reliability of each of its components and Microsoft claims that of the 90% of Exchange support incidents that are performance related, 50% are due to hardware issues.
Microsoft also claims that 90% of Exchange administrators do not carry out any maintenance until disaster strikes. The cause of this has been identified as a number of areas, including:
- Low understanding of the issues and problems.
- No time, resource, or budget to address maintenance tasks.
- Non-availability of test equipment (and high impact of testing in a production environment).
- Assumption that the risk of doing nothing outweighs the risk of pro-activity.
- Technically capable, but see preventative maintenance as boring and time consuming.
When configuring Exchange Server, there are a number of items to consider, discussed in the following paragraphs:
In general, hardware should be selected from the Microsoft Windows Server Catalog and configured in a consistent manner, using high-quality components, the same (recent) firmware and driver levels.
Error correcting code (ECC) memory should be used, and there is little return on investment above 4Gb with Exchange Server.
Microsoft recommends the following disk layout, which separates transaction logs, data, and queues onto separate spindles for reasons of performance and data recovery:
RAID 0 (disk striping) provides fast read and write times, with RAID 1 (disk mirroring) adding
redundancy to form RAID 0 + 1. As an alternative to RAID 0 + 1, RAID 5 (disk striping with parity) may be used (requiring less disks), but this configuration is slower to write due to the need to write the parity data.
Disk caching should be disabled (to avoid database corruptions where a transaction may not be successfully written to the disk) and hardware RAID employed (software RAID is too resource intensive). Servers should be specified with enough free disk space allow database maintenance to be performed (ideally 110% of the database file size) and disk compression should never be used on an Exchange server due to the effect on performance. The Microsoft JetStress and LoadSim tools should be used to test that the server is capable of providing the required performance levels.
The Windows Server operating system should be consistent, both in version and configuration. The maximum log size value for the event logs should be at least 16Mb in size and set to overwrite as needed (to allow a reasonable amount of diagnostic information to be captured, but to avoid full log files). Dr Watson should be the default debugger (to allow capture of user dump information) – this would normally be the case but may not be if some development environments are installed on the same computer as Exchange Server (not recommended). Recovery options should be set and the /3GB switch selected in boot.ini if more than 1Gb of RAM is installed (this provides a different memory split between the application and the kernel).
There should be at least two domain controllers for each domain, with at least one GC processor for each 4 Exchange processors (assuming all processors are of a roughly equivalent specification).
Exchange Servers should be configured with circular logging disabled (for most server configurations), a staggered information store maintenance window, and mailbox quotas configured (so the maximum database size is a known value). Permissions should be set by group (not user), a solid naming convention employed for all objects and the administrative notes fields should be used (incidentally, the use of these is a good way to check that the SRS is working where Exchange Server 5.5 servers are in use).
Microsoft has two core cluster configurations for its own Exchange servers:
- Enterprise datacentres use a 7-node cluster with 4 active Exchange Server 2003 nodes, 1 passive Exchange Server 2003 node and 2 Windows Server 2003 (non-Exchange Server) nodes (for local backups).
- Regional datacentres use a 5-node cluster with 3 active Exchange Server 2003 nodes, 1 passive Exchange Server 2003 node and 1 Windows Server 2003 node.
To reduce the number of drive letters used, mount points are employed, for example on Exchange virtual server 1:
- C: System
D: Paging file and MTA data
E: Storage group 1 data
F: Storage group 2 data
G: Storage group 3 data
H: Storage group 4 data
E\MP – SMTP
E\MP – Storage group 1 logs
F\MP – Storage group 2 logs
G\MP – Storage group 3 logs
H\MP – Storage group 4 logs
This configuration has allowed Microsoft to perform a massive server consolidation exercise, removing 2 regional datacentres, 175 servers and 55 physical sites from the Exchange organisation, whilst doubling mailbox quotas.
When preparing for a Microsoft Exchange implementation there are a number of considerations:
- A server configuration log can be used to enforce consistency and provide information for support staff. It should include firmware and BIOS revisions, installed software and version information, service packs, hotfixes, hardware, services, network configuration, repair and recovery information.
- An operations logs should be created.
- Test accounts should be used and a production test server acquired.
- Operations should be considered through the up-front planning of a maintenance window for patch management and other maintenance processes, generation of a backup/restoration plan and standardised tape formats, and finally generation of a recovery and troubleshooting plan with contingency, oversized storage (for database maintenance), spare parts available locally for immediate replacement and periodic server recovery drills.
Once Exchange has been deployed, the maintenance process comes into play.
Daily tasks:
- Check event logs (and act on them).
- Check backup logs.
- Monitor performance.
- Check disk space.
- Check the badmail folders and queues.
- Check for updates.
- test mail flow.
- Back up the server.
Weekly tasks:
- Compare the server against a baseline configuration.
- Verify backed-up data with a restore (to a recovery server).
Monthly tasks:
- ESEUTIL file dump.
- ESEUTIL integrity check.
- ISINTEG all tests default mode.
Ad-hoc tasks:
- ESEUTIL defragmentation (every 12 months or after a large data move).
- Full disaster recovery test.
Daily online backups should be used, even if the volume shadow copy service (VSS) is used. Online backups check database integrity through checksum verification and full online backups purge transaction logs at the conclusion of the backup. Even whilst the backup is taking place, users can still access mailboxes and public folders.
When monitoring Exchange, compare the recorded counters with a baseline and pay particular attention to:
- Database\Log Record stalls/sec – average should be below 10 per second and maximum values should not be higher than 100 per second (indicates the number of logs records that cannot be written because the buffers are full – note that Exchange Server 2000 defaults to 84 buffers whilst Exchange Server 2003 defaults to 512).
- Database\Log Threads Waiting – average should be below 10 (indicates the number of threads waiting to complete an update to the database by writing their data to the log – if too high, the log may be a bottleneck).
- MSExchangeIS\RPC Requests – should be below 30 at all times (indicates the number of MAPI requests being serviced by the Microsoft Exchange Information Store service – the default maximum is 100).
- MSExchangeIS\RPC Average Latency – should be below 50ms at all times and should be in the 10-25ms range on a healthy server (averaged over the last 1024 packets and affects how long it takes for a user’s view to change in Outlook).
- MSExchangeIS\RPC Operations/sec – should rise and fall with MSExchangeIS\RPC Requests (indicates how many RPC operations are being requested and actually responded to).
- MSExchangeIS\Virus Scan Queue Length – if this is consistently high consider a hardware upgrade (indicates the number of outstanding requests queued for virus scanning).
- MSExchangeIS Mailbox\Active Client Logons – this is server-specific but should be baselined and monitored (indicates the number of clients which performed any action within the last 10 minutes).
- Paging File\% Usage – should remain below 50% – high values indicate that the paging file size should be increased or more RAM added to the server (indicates the amount of the paging file used).
- Memory\Available MBytes (MB) – 50Mb available at all times (indicates the amount of physical memory immediately available to a process).
- Memory\Pages/sec – below 1000 at all times (indicates the rate at which pages are written to disk to resolve hard page faults).
- Memory\Pool Nonpaged Bytes – no more than 100Mb (indicates the amount of memory available for kernel objects which must remain in memory and cannot be written to disk).
- Memory\Pool Paged Bytes – no more than 180Mb, unless a backup or restoration is taking place (indicates the amount of memory available for kernel objects which must remain in memory and can be written to disk).
- Physical Disk\Average Disk Read/sec – average below 20ms and maximum below 100ms for the database volume, average below 5ms and maximum below 50ms for the transaction log volume, average below 10ms and maximum below 50ms for the SMTP queue volume (indicates the average time to read data from the disk).
- Physical Disk\Average Disk Write/sec – average below 20ms and maximum below 100ms for the database volume, average below 10ms and maximum below 50ms for the transaction log volume, average below 10ms and maximum below 50ms for the SMTP queue volume (indicates the average time to read data from the disk).
Consider implementing management tools such as MOM to monitor these counters.
ESEUTIL is the Exchange server database utility. Full syntax may be obtained by typing ESEUTIL /? at a command prompt but for maintenance purposes, there are four main options of interest:
- Offline defragmentation (/d).
- Integrity (/g).
- File dump (/m).
- Copy file (/y).
Because of the potential to cause damage with ESEUTIL, operations should normally be performed with restored data on a non-production server.
Offline defragmentation may be necessary if a large number of mailboxes have been deleted (e.g. following a migration, or if there is a high staff turnover), or following a hard database repair (ESEUTIL /p). It is only recommended if at least 30% of the space taken my the database will be recovered (Event ID 1221 in the application log after an online defragmentation will give a conservative estimate as to how much free space is in the database).
Unless a temporary path is specified as an option, offline defragmentation requires free disk space of at least 110% of the database size to be available as well as the streaming database to reside on the same path.
An integrity check may be necessary to perform a dry run of the repair function – i.e. to validate the checksum for each 4Kb page in the database. Problems that a repair would address are written to a database.integ.raw file which logs all pages in the database, not just those with problems. An integrity check may abort prematurely if problems are of such a nature that a repair is required before some parts of the database can be checked but this does not necessarily mean that a repair would fail. Unless options are specified, an integrity check requires 20% free space.
A file dump allows the viewing of the header information for database, streaming database, checkpoint, online backup patch or transaction log files. The header information can be used to validate that a series of transaction log files forms a matched set and that all files are undamaged, to view space allocation inside the databases, or to view metadata for one or more tables within the database file. An example use of this would be to read the state of an unmounted store (i.e. clean or dirty), to provide some diagnosis as to why the store stopped, prior to mounting the store (which would attempt a soft recovery).
If a database repair is required, this is a last resort, which will strip out orphaned database pages, possibly resulting in data loss. Multiple runs may be required until the entire database is repaired.
A copy file operation simply provides a quick method of copying databases between servers.
ISINTEG is a utility to search an offline information store for integrity weaknesses. Unlike ESEUTIL (which focuses on the physical database), ISINTEG is concerned with the logical structure. It has two modes:
- Default mode – in which the tool runs the specified tests and reports its findings
- Fix mode – where options are specified to run tests and attempt a fix where possible.
For maintenance work, default mode is used. Unless addressing a particular issue in the database, the alltests option is typically the most effective course to follow.
In order to run ISINTEG, the Microsoft Exchange Information Store service must be started but the database to be checked dismounted. ISINTEG can be run against remote servers, but not against raw database files or backups.
Links
You had me at EHLO… (the Microsoft Exchange Team blog)
Exchange Server tips and tricks
Exchange Server 2003 Performance: 10 Things to Think About
Exchange Server 2003 Performance and Scalability Guide
Exchange Server 2003 Operations Guide
Using the Exchange tools ISINTEG and ESEUTIL to ensure the health of your information store
Passed Microsoft Certified Professional exam 70-224
Today I passed the Microsoft Certified Professional exam 70-224: Installing, configuring and administering Microsoft Exchange 2000 Server.
Microsoft’s non-disclosure agreement prevents me from saying too much about the exam but much to my relief I scored maximum points in three areas (“installing and upgrading Exchange 2000 Server”, “configuring Exchange 2000 Server” and “managing Exchange 2000 server growth”) – as someone who primarily designs and implements systems (rather than performing daily operational and administrative tasks) I would have hoped these would have been my strong areas!
It may seem odd taking an Exchange 2000 Server exam in 2004, but I booked this a year ago (whilst I was still working with Exchange 2000) and if I didn’t take it by tomorrow then I would have just lost my money! Perhaps I’ll get around to doing an Exchange Server 2003 exam soon, but I need to start working with the product again first…