Remove duplicate Stage in Datastage : Quick Example

Remove duplicate Stage in Datastage : Quick Example

The Remove duplicate Stage in Datastage allows dropping duplicate records from the input data links.It can have one input and one output link. It also enable us to retain first or last record from the duplicates.

Mandatory: The data source must pre-sorted to get accurate results. 

Remove duplicate is common practice done as part of data cleansing process before starting data loading .

Properties

Category/Property Values
Keys that Define Duplicates/Key Input Column
Keys that Define Duplicates/Sort as EBCDIC True/False
Keys that Define Duplicates/Case Sensitive True/False
Options/Duplicate to retain First/Last

Example Operation:

consider the records of first data set as follows .

Input file 

EMP_ID NAME DEPT
1 RAMESH IT
2 JOHN HR
3 KRIHSNA IT
3 KRISHNA IT
4 SAM CA

Output file :

EMP_ID NAME DEPT
1 RAMESH IT
2 JOHN HR
3 KRISHNA IT
4 SAM CA

 

Working Example

Input dataset: 

Remove duplicate stage: input dataset
Remove duplicate stage: input dataset 
Sort stage properties tab
Sort stage properties tab
Remove duplicate stage: properties tab
Remove duplicate stage: properties tab
Remove duplicate stage: output dataset
Remove duplicate stage: output dataset

Share this post

Leave a Reply