Thursday, March 21, 2013

Dynamics CRM Advanced Performance Optimizations


Introduction

Over the course of years, I have had the honor and opportunity of working on both some very large user-count and very high volume Dynamics CRM projects.
Inevitably, these projects encounter performance problems as they move to production due to the increased load required from the systems. Sometimes, I’m around before the project goes into production and we can pro-actively plan for and implement the needed optimizations. Many times I’m called in after the system has already crashed several times while in production.  This is not a good place to end up!
Below are suggested optimizations a large-seat and/or high-volume Dynamics CRM deployment should consider implementing.

CRM 2011

Now that Dynamics CRM 2011 OnPremise has been released, I have had several inquiries as to the applicability of the optimization guidance currently suggested by Microsoft and supplemented here. I am excited to write that ALL of the existing optimization guidance is still wholly applicable!  In addition, Microsoft has released the first benchmark on CRM 2011 demonstrating 150,000 concurrent users.  I have updated the chart below with the new benchmarks.

Step 1: Understand the Baseline Benchmarks

First, let’s understand what’s possible for a Dynamics CRM environment.  There are now currently several benchmarks that have been performed with Dynamics CRM to flesh out the recommended hardware and also demonstrate the significant scalability of the product.  I strongly suggest at least a moderate review of each of the following documents and a thorough review of those that are closest to your environment and needs.
Describes general hardware sizing information that will support Microsoft Dynamics CRM version 4.0 with up to 500 concurrent users in a single deployment on-premise model.
500 Concurrent Users; 42,223 web requests / hour; 6,427 business transactions / hour

This download includes the following four papers:

2.     Microsoft Dynamics CRM_4_0_Performance and Scalability _Users.pdf

3.     Microsoft Dynamics CRM4_0_Performance and Scalability_Database.pdf

4.     Microsoft Dynamics CRM4_0_Performance and Scalability_Network.pdf

Microsoft, together with Unisys Corporation, completed benchmark testing of Microsoft Dynamics CRM 4.0 running on Microsoft® Windows Server® 2008 operating system and Microsoft SQL Server® 2008 database software. Benchmark results demonstrate that Microsoft Dynamics CRM can scale to meet the needs of an enterprise.
Microsoft Dynamics CRM_4_0_Enterprise Performance and Scalability.pdf
Overview and links to more resources

Microsoft Dynamics CRM_4_0_Performance and Scalability _Users.pdf
24,000 users; 1,051,921 web requests / hour; 169,344 business transactions / hour

Microsoft Dynamics CRM4_0_Performance and Scalability_Database.pdf
1500 users; 1.03 Billion records; 1.3 TB database

Microsoft Dynamics CRM4_0_Performance and Scalability_Network.pdf
Extensive network statistics
Microsoft, working with Intel Corporation, completed benchmark testing of Microsoft Dynamics CRM 4.0 running on Intel server and solid state drive (SSD) hardware. This white paper focuses on benchmark results associated with user scalability.
50,000 concurrent users; 2.4M web requests / hour; 374,400 business transactions / hour; .12 second average response time
This paper describes the details and results of a benchmark testing effort around the multi-tenancy and xRM capabilities of Microsoft Dynamics CRM 4.0, the virtualization features of Microsoft Windows Server 2008 R2 Hyper-V, and the enterprise capabilities of Microsoft SQL Server 2008 R2 running on IBM System xSeries hardware with intelligent quad-core Intel Xeon processers and Intel SSDs.
20 LOB xRM applications; 1,000 users EACH LOB Application; 149,760 business transactions / hour; .10 second average response time; only 37.1% SQL Server utilization.



Microsoft, working with Intel® Corporation and Dell™ Inc., completed workload test of virtualized Microsoft Dynamics CRM 4.0 on Dell™ PowerEdge™ servers equipped with Intel® Xeon® Processors 7500 Series-based and solid state drives (SSDs).
100,000 concurrent users; 5.1M web requests / hour; 778,000 business transactions / hour; .29 second average response time
Microsoft Dynamics CRM Performance and Scalability on Intel Xeon Processor-based Dell Servers with Solid-State DrivesMicrosoft, working with Intel® Corporation, completed benchmark testing of Microsoft Dynamics CRM 2011 running on Intel® Xeon® 7500 series processor-based Dell R910 servers with Pliant Technology solid state drives (SSDs).150,000 concurrent users; 5.5M web requests / hour; 703,080 business transactions / hour; .4 second average response time

Step 2: Apply the Minimal Recommended Optimizations

Microsoft has published a white paper that goes over the minimal set of optimizations that should be taken at every layer of the system from the Client, to the Application and Platform servers, and on to the database itself.  These should be considered mandatory guidance and implemented in every project regardless of size. Note that these are the minimum, and standard optimizations applied for all of the official benchmarks referenced above. Additionally, Microsoft recently released guidance for Client optimization for CRM 2011 with a focus around CRM Online.  Both these documents should be considered mandatory reading and guidance.
This white paper details techniques, considerations, and best practices for optimizing and maintaining the performance of Microsoft Dynamics CRM 4.0 implementations.
Optimizing and Maintaining Client Performance for Microsoft Dynamics CRM 2011 and CRM OnlineThe white paper provides readers with the information necessary to ensure and maintain the optimal performance of the clients connecting to a business solution based on Microsoft Dynamics CRM 2011 or Microsoft Dynamics CRM Online.
Optimizing and Maintaining the Performance of a Microsoft Dynamics CRM 2011 Server InfrastructureThis white paper provides information that designed to help readers with achieving and maintaining optimal performance of the server infrastructure supporting a Microsoft Dynamics CRM 2011-based business solution deployed in an on-premises or hosted environment.

Step 3: Apply all the latest Update Rollups and applicable Hotfixes

For a complete matrix of all of the Update Rollups, their release dates, and version numbers, view this blog:
You will want to be at the latest possible Rollup possible that fits within your build life-cycle when you go to production.
In addition to the Rollups, you will want to apply any of the manual patches and hotfixes that are applicable to your environment.  A couple must-do manual patches are listed below:
AsyncOperation Cleanup
AsyncOperation Auto-cleanup
AsyncOperation missing index
                                 
A note of caution regarding AsyncOperation Cleanup: If you apply the registry key fixes, (Number 2 above), this will delete ALL your workflow history immediately when a workflow completes. This prevents you from having any historical view into your workflows.  I actually suggest that you do NOT apply the registry fixes, but instead, use the SQL statement in the above article, (Number 1 above) and add a WHERE condition to only delete records older than n days.  I generally recommend somewhere between 7 to 14 days of history to be kept.
Note: If your AsyncOperation table has never been cleaned up before, you may experience excessive locking during the cleanup process.  You will want to schedule your initial cleanup during a maintenance window.  If you do experience excessive locks, and you may need to schedule it during a down time maintenance window when you can run the database in single-user mode during the initial cleanup and/or stop the AsyncService during the initial cleanup period.

Additionally, if once you schedule a regular cleanup, you will want to monitor for excessive locking and take action as appropriate for your environment.

Step 4: Advanced Optimizations

Your CRM environment and performance will improve significantly after applying all of the suggested optimizations discussed in the white paper Optimizing and Maintaining Microsoft Dynamics CRM 4.0 and the manual changes recommended to optimize the AsyncOperations in Step 3 above.
In addition to the Step 2 and Step 3 changes mentioned above, there are some additional advanced optimization steps required for deployments with very large quantity of users and/or very high volume of data transactions. A typical example of these types of systems would be when Microsoft Dynamics CRM is deployed within a call center. Therefore it is strongly recommended that these optimizations below be applied for call centers and similarly high volume deployments.
At the time of writing this paper, the following information has not yet been formalized into a white paper by the Microsoft team. However, almost all of what follows has been fleshed out on several deployments while working very closely with the Microsoft Dynamics CRM support team and encompasses all of their guidance and suggestions from those engagements.

SQL Server Isolation Level: RCSI

Note that the Optimizing and Maintaining Microsoft Dynamics CRM 4.0 white paper also touches on this topic, but it is frequently overlooked or ignored.
Symptom- One or more of the following situations are occurring:
1.       The installation is experiencing excessive transaction locks that are resulting in an extensive backlog of both write and read locks in SQL Server.
2.       The system frequently comes to a complete stop in the mornings when all your users show up for work and log into Dynamics CRM.
3.       You would like to be smart and proactively prevent yourself from getting into these situations.
Solution: Switch the isolation level of SQL Server to RCSI.
RCSI stands for Read Committed Snapshot Isolation.  RCSI affects two SQL Server processes:
1.       Reads will only read already committed data.
2.       Instructs SQL Server to keep a snapshot of the data in the TempDB and to use that snapshot for all Reads.

This adjustment to the RCSI will eliminate ALL of the read locks in the system, in addition, the adjustment will also eliminate the quantity of write locks that get backed up behind reads.
The configuration change is very simple to make.  Documentation for setting Isolation levels is located here: http://msdn.microsoft.com/en-us/library/ms173763.aspx
A couple caveats:
RCSI makes HEAVY use of the TempDB.  Therefore, you will want to make the following optimizations on the TempDB.
·         Ensure that the TempDB is on its own spindle or LUN.
·         Start the TempDB file size initially very large. It is suggested to start its size at the largest observed size from an existing production deployment, or at least 25% of the production data size.
·         If you allow for the auto-growth of TempDB, use a fairly large percentage, for example, 20%, otherwise, monitor and manually grow the TempDB when it needs more space.
Make sure that you are running at least Update Rollup 12 or better!  There is a known bug where some Async Jobs and bulk emails will execute multiple times when running in RCSI.  The following KBA was released separately, and as part of Update Rollup 12 which fixes this problem.
Multiple email with RCSI
Update Rollup 12

Implementing RCSI is very simple to do and will have a TREMENDOUS positive impact on your deployment.  As mentioned above, make sure you have properly setup the TempDB and you are running at least Update Rollup 12 or better.  You will be VERY PLEASED with the performance gains from these changes!

Manual data and log file size growth

By default, SQL Server sets up the data and log files to auto-grow by a certain percentage. Frequently, I observe DBAs increasing the percentage to help minimize the impact of the growth cycle.
However, there is even one step further you should take into consideration to completely manage that growth cycle.
It is suggested that you monitor the data and log file’s “used space” proactively with SCOM or other monitoring tool and send out an alert once the used space reaches a stated ceiling.  I suggest usually somewhere between 80% - 90%.
Subsequently, during your normal maintenance window, when usage of the system is minimal, manually (or scripted), increase the size of the data and log files an appropriate amount for your deployment.
Performing the changes mentioned above will add significant advantage to the deployment because it will:
·         Ensures that a file auto-growth cycle happens during low-usage windows and not at the peak of the day when it could cause serious performance impacts.
·         Allows you to proactively monitor your file storage usage and proactively plan for growth and not be stuck in a re-active mode when you all-of-the-sudden run out of disk space.

Fill Factor

I have seen on several occasions where a SQL Server deployment has had all of the index paging fill factors set to zero (0).  Zero is BAD. Zeros will cause very un-optimized index paging allocation and usage.
The recommended setting from the Microsoft Dynamics CRM product support team is that the Fill Factor should be around 80%.  This can, of course, be tuned to your exact deployment.  However, 80% is a great place to start.  You can always change it later.
Note - You will need to re-build all of the existing indexes before they will pick up the new setting.  You will want to schedule this appropriately in a proper maintenance window to minimize the impact to the users.

Parallelism

Everybody loves the concept of parallelism.  After all, isn’t parallelism in a multi-CPU, multi-process, multi-threaded environment always a good thing?!
Well, actually, not always.
The Optimizing and Maintaining Microsoft Dynamics CRM 4.0 white paper touches on this topic but does not provide explicit guidance. However, the section in the white papers should still be consulted.
In practice, once all of the previously mentioned optimizations, especially RCSI, are completed, you are very unlikely to experience any additional CxPacket waits, even with a high degree of parallelism. 
However, IF you still are experiencing a large quantity of CxPacket waits in SQL Server for the Dynamics CRM database, you may be having a problem with parallelism.  If you are, it is suggested that you set parallelism to “1”.
Details on how to do this are located here: http://msdn2.microsoft.com/en-us/library/ms181007.aspx
As always, you can experiment with other values greater than one until you find the best balance for your specific environment.

SQL Server 2008 Specific Optimizations

SQL Server 2008 introduces a whole suite of new features that are directly relevant to a Dynamics CRM deployment.  Things like:
·         Row Compression
·         Page Compression
·         Filtered Indexes
·         Sparse Columns
·         Encryption
Subsequent to the release of SQL Server 2008, the Microsoft Dynamics CRM released guidance on how to best leverage these features for Dynamics CRM.
Improving Microsoft Dynamics CRM performance and Securing Data with Microsoft SQL Server 2008


One of the most interesting new features that can very significantly improve Dynamics CRM is Filtered Indexes. 
Dynamics CRM uses two tables for every entity:
<entity>Base and <entity>ExtensionBase
And subsequently provides the full set of logical and filtered views that join these tables (and all the other supporting tables) together.
In almost all cases, the indexes on these tables, which are initially optimized to support the views, all have a WHERE clause of:  statecode = 0
As noted in the referenced paper, this creates significant overhead in the indexes and their maintenance and can often result in the indexes being completely bypassed in some situations.
Therefore, it is strongly encouraged to, at the bare minimum, look at adding filtering to existing indexes and/or adding additional filtered views to your deployment.
Refer to the referenced paper for other great optimization opportunities when using SQL Server 2008!

SAN, LUNs, and Spindles

A common scenario has been encountered when working with Dynamics CRM databases on SANs. As the above guidance has frequently mentioned, it is best practice to ensure that database files, log files, and the TempDB all be on separate LUNs and Spindles.
What is frequently encountered is that, while the SAN administrator is providing separate LUNs, all of the LUNs are on the exact same drive spindle.  This results in an excessive bottleneck at the physical drive level.
When carving out LUNs on a SAN, you will want to make sure that they also spread out over separate physical spindles.

Network Topology

Use smart subnets

The use of well thought out subnets and network segregation can very significantly manage the volume of traffic flowing through the network and keep stray traffic out of areas where it doesn’t belong and just consume bandwidth.
One of the emergency calls that I received was for a customer that was using one single subnet for their entire enterprise.  During peak hours they were experience highly degraded network I/O between the application servers and the database.  All of the user’s Facebook browsing and chat sessions where all passing through on the same wire that was sitting between the application servers and the database.  After properly creating smart subnets and routing rules, the entire problem disappeared!
It is strongly suggested that the datacenter have its own subnet that is segregated from the primary user population.

Use smart switching

On another emergency “My Dynamics CRM is Dead!” call, it was discovered that there were a couple problems with the network:
1)      The datacenter was primarily using broadcast routers and not switches
2)      The few routing rules that existed were not being managed and were routing intra-datacenter traffic half-way around the world and back, just to get to the server in the next rack over, 6 feet away.
It is strongly suggested that you use switches over broadcast routers in the data center.  And, TRIPLE CHECK your routing rules.  Make sure your local data center traffic does not have to take a world tour just to go next door.

Step 5: Ongoing Monitoring and Maintenance

Like ANY database-centric application, Dynamics CRM is a growing, changing, living database. Therefore, it needs ongoing monitoring and maintenance just like any other enterprise database does.

Scale Group Jobs

It is suggested that you set the bulk delete and re-index jobs to a time that is best for your enterprise. By default, the time these take place is the time that CRM was installed, which very likely is not when you want these to happen. You will want to run the Scale Group Editor to adjust the schedule for these to a low-usage time applicable for your environment. 
Microsoft provides a utility for editing these schedules as referenced below.
Allows for the editing of the schedule for bulk deletes and re-indexing.

Scheduled Maintenance Plans

SQL Server provides a native maintenance facility called Maintenance Plans where you can setup regular maintenance jobs to ensure that the database is always happy and healthy.  It is strongly suggested that you set up an applicable set of maintenance plans!  Some of the items that should be done on a regular basis are:
·         Re-indexing
·         Statistics Update/Generation
Additionally, you should be checking the query/index usage stored procedures looking for new areas where you may not have proper indexes in place.  You will then want to adjust your existing indexes and add any applicable new ones to ensure that the system is operating a full potential.

Quick Find Attributes

As you know, Dynamics CRM allows you to specify a set of attributes that are searched when a user uses the Quick Find feature on each entity.  It is imperative that you select ONLY those fields that are actually needed.  Additionally, you will want to ensure that you have a custom index defined for each Quick Find set of fields!  Failing to do so will usually result in table scans.
Note: When setting up these indexes, it is strongly suggested that you make them “covering” indexes where all of the attributes that are returned in the Quick Find View are included as covering fields in the index.  This will result in extremely fast quick finds in Dynamics CRM!

Special Situations

Global Deployments

Issues with latency

Most projects are always initially concerned with band-width.  For most of these projects, this is actually of minimal concern because they already have very healthy and wide Internet and Network pipes already in place.
However, the one item that Dynamics CRM is actually VERY sensitive to, and usually completely overlooked, is latency!
Users who are on low bandwidth and low latency connections will usually have very acceptable performance.
The users with high bandwidth and high latencies will usually have very bad performance.  This is especially true when using SSL.  I have seen situations on high latency connections where just the SSL handshake takes almost 2 seconds!
If you have users in remote locations that have very high latency connections, you will want to look into one or more of the following items:
·         Use Dynamics CRM’s “IFD” deployment, even if it is not actually Internet facing.  An IFD deployment with its forms authentication will reduce the total quantity of round-trips between the browser and the server significantly.
·         Use of WAN Accelerators. These usually provide combinations of compression and caching that can often bring significant performance gains.
·         Use of Citrix or Terminal Services.  In some situations where you are at the complete mercy of poor network connections with poor bandwidth and poor latency, sometimes the best approach is the use of server-side application session hosting like Citrix and Terminal Services. This will reduce the network traffic down to just the keyboard I/O and screen image refreshes.

Conclusion

The purpose of this document has been to provide you with a starting point to assist you with optimizing your Microsoft Dynamics CRM implementation. Please note this document is not all inclusive.  In addition, posted within all the MS CRM Rollups are several manual optimizations that can be applied if your particular implementation is demonstrating the problematic behavior described within that rollup’s documentation.  It is therefore necessary for you to read through the entire rollup document.  There are often gems hidden beneath the small print. 

By Robert
- One is pleased to be of service

No comments:

Post a Comment