IBM InfoSphere Master Data Management Collaborative Edition provides a highly scalable, enterprise Product Information Management (PIM) solution that creates a golden copy of products and becomes trusted system of record for all product related information.
Performance is critical for any successful MDM solution which involves complex design and architecture. Performance issues become impedance for smooth functioning of an application, thus obstructing the business to get the best out of it. Periodic profiling and optimizing the application based on the findings is vital for a seamless application.
This blog post details on optimizing IBM InfoSphere MDM Collaborative Edition application based on the tacit knowledge acquired from implementations and upgrades carried out over the years.
Performance is paramount
Performance is one of the imperative factors that make an application more reliable. Application performance of MDM Collaborative Edition is influenced by various factors such as solution design, implementation, infrastructure, data volume, DB configurations, WebSphere setup, application version, and so on. These factors play a huge role in affecting business either positively or otherwise. Besides, even in a carefully designed and implemented MDM CE solution, performance issues creep up over a period of time owing to miscellaneous reasons.
The following questions might help you to narrow down a performance problem to a specific component.
- What exactly is slow - Only a specific component or general slowness which affects all UI interactions or scheduled jobs?
- When did the problem manifest?
- Did performance degrade over time or was there an abrupt change in performance after a certain event?
Answers to the above queries may not be the panacea but provide a good starting point to improve the performance.
Hardware Sizing and Tuning
Infrastructure for the MDM CE application is the foundation on top of which lays the superstructure.
IBM recommends a hardware configuration for a standard MDM CE Production server. But then, that is just a pointer towards the right direction and MDM CE Infrastructure Architects should take it with a pinch of salt.
Some of the common areas which could be investigated to tackle performance bottlenecks are:
- Ensuring availability of physical memory (RAM) so no or little memory swapping and paging occurs.
- Latency and bandwidth between the application server boxes and database server. This gains prominence if the Data Centers hosting these are far and away. Hosting Primary DB and App Servers on Data Center could help here.
- Running MDM CE on dedicated set of boxes will greatly help so that all the hardware resources are up for grabs and isolating performance issues becomes a fairly simple process, of course, relatively.
- Keeping an eye on disk reads, writes and queues. Any of these rising beyond dangerous levels is not a good sign.
Clustering and Load Balancing
Clustering and Load balancing are two prevalent techniques used by applications to provide “High Availability and Scalability”.
- Horizontal Clustering – Add more firepower to MDM CE Application by adding more Application Servers
- Vertical Clustering – Add more MDM CE Services per App Server box by taking advantage of MDM CE configuration – like more Scheduler and AppServer services as necessary
- Adding a Load Balancer, a software or hardware IP sprayer or IBM HTTP Server will greatly improve the Business user’s experiences with the MDM CE GUI application
Go for High Performance Network File System
Typically clients go with NFS filesystem for MDM CE clustered environments as it is a freeware. For a highly concurrent MDM CE environment, opt for a commercial-grade, tailor-made high performance network file system like IBM Spectrum Scale .
The performance and reliability of MDM CE is highly dependent on a well-managed database. Databases are highly configurable and can be monitored to optimize performance by proactively resolving performance bottlenecks.
The following are the few ways to tweak database performance.
- Optimize database lock wait, buffer pool sizes, table space mappings and memory parameters to meet the system performance requirements
- Go with recommended configuration of a Production-class DB server for MDM CE Application
- Keeping DB Server and Client and latest yet compatible versions to take advantage of bug fixes and optimizations
- Ensuring database statistics are up to date. Database statistics can be collected manually by running the shell script from MDM CE located in $TOP/src/db/schema/util/analyze_schema.sh
- Check memory allocation to make sure that there are no unnecessary disk reads.
- Defragmenting on need-basis
- Checking long running queries and optimizing query execution plans, indexing potential columns
- Executing $TOP/bin/indexRegenerator.sh whenever the indexed attributes in MDM CE data model is modified
MDM CE Application Optimization
The Performance in MDM CE application can be overhauled at various components like data model, Server config., etc. We have covered the best practices that have to be followed in the application side.
Data Model and Initial Load
- Carefully choose the number of Specs. Discard the attributes that will not be mastered, governed in MDM CE
- Similarly, larger number of views, Attribute Collections, Items and attributes slower the user interface screen performance. Tabbed views should come handy here to tackle this.
- Try to off-load cleansing and standardization activities outside of MDM solution
- Workflow with a many steps can result in multiple problems ranging from an unmanageable user interface to very slow operations that manage and maintain the workflow, so it should be carefully designed.
MDM CE Services configuration
MDM CE application comprises of the following services which are highly configurable to provide optimal performance – Admin, App Server, Event Processor, Queue Manager, Workflow Engine and Scheduler.
All the above services can be fine-tuned through the following configuration files found within the application.
- $TOP/bin/conf/ini – Allocate sufficient memory to the MDM CE Services here
- $TOP/etc/default/common.properties – Configure connection pool size and polling interval for individual services here
Document Store is a placeholder for unstructured data in MDM CE – like logs, feed files and reports. Over a period of time the usage of Document Store grows exponentially, so are the obsolete files. The document store maintenance report shall be used to check document store size and purge documents that do not hold significance anymore.
- Use the IBM® MDMPIM DocStore Volume Report and IBM MDMPIM DocStore Maintenance Report jobs to analyze the volume of DocStore and to clean up the documents beyond configured data retention period in IBM_MDMPIM_DocStore_Maintenance_Lookup lookup table.
- Configure IBM_MDMPIM_DocStore_Maintenance_Lookup lookup table to configure data retention period for individual directories and action to be performed once that is elapsed – like Archive or Purge
Cleaning up Old Versions
MDM CE does versioning in two ways.
This occurs when the current version of an object is modified during the export or import process.
This kind of versioning occurs when you manually request a backup.
Older versions of items, performance profiles and job history need to be cleansed periodically to save load on DB server and application performance in turn.
- Use the IBM MDMPIM Delete Old Versions Report and IBM MDMPIM Estimate Old Versions Report in scheduled fashion to estimate and clear out old entries respectively
- Configure IBM MDMPIM Data Maintenance Lookup lookup table to hold appropriate data retention period for Old Versions, Performance Profiles and Job History
Best Practices in Application Development
MDM CE presents couple of programming paradigms for application developers who are customizing the OOTB solution.
- Scripting API – Proprietary scripting language which at runtime converts the scripts into java classes and run them in JVM. Follow the best practices documented here for better performance
- Java API – Always prefer Java API over the Scripting API to yield better performance. Again, ensure the best practices documented here are diligently followed
If Java API is used for the MDM CE application development, or customization, then :
- Use code analyzing tools like PMD, Findbung, SonarQube to have periodic checkpoints so that only the optimized code is shipped at all times
- Use profiling tools like JProfiler, XRebel, YourKit or VisualVM to constantly monitor thread pools use, memory pools statistics, details about the frequency of garbage collection and so on. Using these tools during resource-intensive activities in MDM CE, like running heavyweight import or export jobs, will not just shed light on inner workings of JVM but offers cues on candidates for optimization
Keeping frequent accessed objects in cache is a primary technique to improvement performance. Cache hit percentage need to be really high for smooth functioning of the application.
- Check the Cache hit percentage for various objects in the GUI menu System Administrator->Performance Info->Caches
- The $TOP/etc/default/mdm-ehcache-config.xml and $TOP/etc/default/mdm-cache-config.properties files can be configured to hold large number of entries in cache for better performance
A successful performance testing will project most of the performance issues, which could be related to database, network, software, hardware etc. Establish a baseline, identify targets, and analyze use cases to make sure that the performance of the application holds good for long.
- You should identify areas of solution that generally extends beyond normal range and few examples are large number of items, lots of searchable attributes, large number of lookup tables.
- Frameworks such as JUnit, JMeter shall be used in a MDM CE engagement where Java API is the programming language of choice
About the author
Sruthi is a MDM Consultant at Mastech InfoTrellis and worked in multiple IBM MDM CE engagements. She has over 2 years of experience in technologies such as IBM Master Data Management Collaborative Edition and BPM.
Selvagowtham is a MDM Consultant at Mastech InfoTrellis and plying his trade in Master Data Management for over 2 years. He is a proficient consultant in IBM Master Data Management Collaborative Edition and Advanced Edition product.