In this document I would like to share some experiences and basic steps of troubleshooting intermittently appearing issues on such BI platforms which are having more than one processing servers or nodes configured. I will focus on the most often used report types such as WebIntelligence, Crystal Reports and Dashboards and only for on demand refresh errors.
Introduction – the scalability
SAP BusinessObjects Business Intelligence Platform services can be vertically scaled – using the advantage of the multi core CPU-s on the same machine - to take full advantage of the hardware they are running on, and can be horizontally scaled to take advantage of multiple server machines over a network environment.
For example, you can run several processing on the same machine (vertical scaling) or you can also run several processing servers on separate machines (horizontal scaling).
In these well sized or properly scaled environments are commonly used as production environments, where restarting or stopping BI Platform services cannot be restarted based on ad-hoc requests, can be done in the maintenance time slots. Troubleshooting of intermittent errors is difficult, since one of the most common used settings for reports to use the first available server on demand requested for report processing.
The 3i steps for troubleshooting
- Identification
- Isolation
- Investigation
1. Identification – the environment details and proper processing workflow
As a best practice to execute or generate a System Inspection (SI) Report from the Platform Support Tool. The SI report gives a high level overview of the BI Platform and collects information about the BI landscape such as server settings, command line arguments, memory settings, and performance metrics. (The tool and more information can be found here: http://wiki.scn.sap.com/wiki/display/BOBJ/SAP+BI+Platform+Support+Tool)
The proper workflow is mandatory and needs to be identified, which will be executed on the BI Platform while viewing or refreshing the report. In the following section I have collected the available workflows by reporting application types (the complete list of workflows can be found at http://scn.sap.com/docs/DOC-8292)
Web Intelligence
- View a Web Intelligence document on demand process flow
- Refresh a document based on a multi-source universe process flow
- Refresh a document based on a dimensional universe process flow
- Refresh a document in Web Intelligence Desktop in one-tier mode process flow
- Refresh a document in Web Intelligence Desktop in two-tier mode process flow
- Refresh a document in Web Intelligence Desktop in three-tier mode process flow
- Refresh a document based on an SAP NetWeaver BW BEx Query using BICS connectivity process flow
- Refresh a document based on an SAP Netweaver BW data using a relational UNX universe process flow
- Refresh a document based on a multi-source datasource using a relational UNX universe process flow
- Refresh a document based on OLAP data using a multidimensional UNV universe process flow
- Refresh a document based on OLAP data using a multidimensional UNX universe process flow
Crystal Reports 20xx
- View a report instance when the page is in the cache process flow
- View a report instance when the page is not in the cache process flow
- View a report on demand process flow
Crystal Reports for Enterprise
- View a report instance when the page is in the cache process flow
- View a report instance when the page is not in the cache process flow
- View a report on demand process flow
Dashboards (aka XCelsius)
- View a dashboard when the query result is in the cache process flow
- View a dashboard when the query result is not in the cache process flow
2. Isolation
Since the proper workflow identified, the next step is try to find which node or processing services and servers failing. When the landscape contains several nodes, and there are several processing servers working to complete the on-demand requests raised by the business users it is hard to find the server which failing in the processing or having the incorrect / inconsistent configuration.
In the Business Intelligence Platform, for a report a dedicated resource group can be assigned for on-demand and scheduled processing. Out of the box setting is, when a report is executed on the platform the system is turns to the first available resource (server) for processing the report.
When a report execution is dedicated to a specific server group, which contains a set of processing servers, than we speak about report execution is isolated, since we are exactly know which server is takes in place at processing on which node.
Please follow one of these articles for find the settings and steps for report isolation:
Webintelligence: How to force a Webintelligence report file to be processed by a specific server group (isolation)
Crystal Reports: How to force a Crystal Reports file to be processed by a specific Crystal Reports Server group (isolation)
Dashboards: How to force a Dashboard file to be processed by specific Dashboard servers (isolation)
3. Investigation
To do the investigation the best bet is to change the servers in the isolation group until the error occurs constantly. When the specific node or processing services and servers has been found where the issue or observed behavoiur can be always reproduced, than the services can be traced by individually by setting the trace level to high in the properties or may using the End-to-End (E2E) trace process of the SAP BI Support tool.
To do the E2E trace, please follow the instructions of SAP KBA 1861180 - Customer instructions and best practice for collecting a BI Platform 4.x end to end trace.
Summary
For closure, i think to create server groups is a good and easy way to start troubleshooting intermittently appearing issues with reports in BI platforms 4.x. With the these 3 steps the report can quickly isolated in production environments, and the issues can be localized.