This blog touches upon the basics of Informatica MDM Fuzzy Matching.
Informatica MDM – SDP approach
A master data management (MDM) system is installed so that the core data of an organization is secure, is accessible by multiple systems as and when required and does not have multiple copies floating in the system, in order to have a single source of truth. A solid Suspect Duplicate Process is required in order to achieve the 360 degree view of an entity.
The concept of Suspect Duplicate Processing represents the broad category of activities related to identifying entities that are likely duplicates of each other. Suspect duplicate processing is the process of searching for, matching, creating associations between and, when appropriate, merging data for existing duplicate party records in the system.
To achieve this functionality, Informatica MDM has come up with its own Suspect Duplicate Processing (SDP) approach. An organization based on its use case can opt any of the following two approaches:
- Deterministic Matching Approach
- Fuzzy Matching Approach
Deterministic Matching Approach
Deterministic Matching uses a series of rules, like nested if statements, to run a series of logical tests on the data sets. This is how we determine relationships, hierarchies, and households within a dataset. Deterministic matching seeks a clear “Yes” or “No” result on each and every attribute, based on which we define whether:
- Two records are duplicates
- should be resolved by a data steward or
- Two unique entities.
It doesn’t leave any room for error and provides the result in an ideal scenario. But most of the data in organizations is far from an ideal scenario. These are the cases when the Fuzzy Matching Approach of Informatica comes handy.
Fuzzy Matching Approach
A fuzzy matching approach is required when we are dealing with less than perfect data to improve the quality of results. Fuzzy Matching measures the statistical likelihood that two records are the same. By rating the “matchiness” of the two records, the fuzzy method is able to find non-obvious correlations between data and hence rates the two records by saying how close they are to each other.
Informatica MDM fuzzy matching offers the above in an easy to configure, flexible, repeatable and probabilistic manner. It gives us the flexibility to define which attributes are required to be matched deterministically (such as Country IDs) and which using the fuzzy logic (such as Names).
The fuzzy matching in Informatica works on different aspects of the data. The algorithm can be configured depending on whether we are catering our algorithm to match an Individual or a household, contact person or an organization, etc. This helps us to handle different scenarios in the data. Also based on the understanding of the data we can choose the strictness of the algorithm, not only in terms of the matching but in terms searching as well.
The main strength of Informatica MDM Fuzzy matching is that it is a rule-based matching system and unless and until the match criterion is met we won’t be getting a match, which makes it a business user-friendly matching system.
The match criteria can be defined into two categories,
- Automatic Merge and
- Manual Merge.
Automatic Merge is a scenario where the system by itself finds out that the two entities in question are duplicates whereas Manual merge is a scenario where we need a Data Steward to decide whether two parties in question are duplicates or not. Based on the rule (Automatic or Manual) that is satisfied by a suspect pair, the fate of the pair is decided whether the records merge automatically or a task is created for a Data Steward. If none of the defined rules satisfy the suspect pair then the two records are treated two unique parties/entities.
The rule based approach of Fuzzy logic makes it easy for Business Users and Data Stewards to identify what record patterns can constitute of a duplicate pair. Thus making it a hit with Business Users and resonating the effect with the program sponsors by making the MDM implementation successful.
About the Author
Ripudaman Singh Dhaliwal, Manager at Mastech InfoTrellis has considerable experience in Probabilistic (Fuzzy) Matching Algorithms.