Compare two data sets and updated the table sas

I am trying to create a report by comparing 2 sas datasets with SAME variable name but different values due to different extraction dates

----Goal: to highlight differences : New, Deleted, Updated and find out which record is different.

---The challenge is :

1.There are lots of variables about 20-30 in each ds to be compared ;

2. Variables are mixed with numerical var. and character including dates.

3. Need to highlight different colors in excel file per the difference

4. used many arrays and %let variables to identify old, new and flags

---Steps I used but very prone to mistake:

*1.Rename old datasets variables for comparing with new datasets by suffix old_ for comparation purpose; EG. TEST TO OLD_TEST
*2.Convert all selected Numeric variables to Character variables by adding suffix c_ . So I may use one set of array for all same type of variables.

EG. DATE TO C_DATE
*3. Drop original numerical variables and keep all c_: .

Otherwise the values will not be able to overwrite. DROP DATE;

*4. Rename again variables with suffix c_: Back to old & new data original names for merge and compare in data step;

EG: RENAME C_DATE =DATE

Does any one have any code sample which might use proc content level to work on this ?

4 REPLIES 4 Super User Re: How to find the updates/changes between 2 datasets with different extraction date Posted 01-26-2022 05:32 PM (1939 views) | In reply to purpleclothlady

You are going to have a hard time dealing with two data sets with the same name. Unless dealing with generation sets you only have one SAS data set of a given name in a library.

If you mean to different data sets then sort them using identification variables other than those you know to change so the order should be identical.

Use Proc compare. It will compare every value of all the variables and report on differences. There are multiple choices for output data sets and contents. I am not quite sure what you want.

If one data set may have a different number of records us a BY statement with the By variables used for sorting.

I suggest the sort and run Proc Compare with the defaults to see the output. The basic syntax is extremely simple:

Proc Compare base=thisdatasetname compare=thatdatasetname; run;

If variables are of different types with the same name you won't get a value comparison. There is a header section that describes differences in the data sets, like/unlike named variables, number of observations, variables with same name but different type, different properties such as Label or Format for the same name variables.

"Different colors in Excel" is a report function and will depend on what your rules are.