SN Pro Tips

View Original

Detecting Duplicate Records with GlideAggregate

Let's say you're working for a company who has just acquired another company, and you've just imported all of the users from the acquired company's User (sys_user) table.
Unfortunately, you realize just a second too late, that their user database contains some of the same people as are already in your database - for example, some executives had accounts in both services.
Or maybe you were previously running the instance without enforcing number uniqueness on the incident table for some reason, and you want to change that.

The first step to resolving these issues, is detecting duplicate records, based on a given field. Here's a simple script include that will do that for you:

See this content in the original post

Make sure that the "client callable" box is checked on your script include, and that the 'name' exactly matches the name of the function ('getDupes').
Now, you can call this script from any query builder like so:

In the 'field' selector for our query, we select the 'name' field. Then we select 'is one of...' because we're going to pass in an array of values. Finally, in the input box, we enter: 

See this content in the original post

The "javascript:" in the beginning, tells the query builder that we're going to execute some javascript inside the query, before it is evaluated. This means that it's not the javascript itself that'll be used to build the filter, but whatever the javascript returns after it has executed.

Since we wrote the getDupes function so that it would take two inputs, we need to provide them.
The first input is the table. Since we want results relevant to this table, we should enter the name of the table we're currently building our filter/query against. In this case, it's "sys_user".
The second parameter the getDupes function accepts, is the name of the field that we're going to check for duplicates. In this case, that field name is... 'name'!
Note that the field name refers to the name in the database, not the friendly name that shows up on the form!
Finally, it's a good idea to add an "Order by", to sort the results by the field we're checking for duplicates, as this will make it easier to read the results if we have multiple duplicates.

There you have it! We've found at least two users who have the same name.

Our function will return an array of values (in this case, email addresses) that correspond to duplicate records.
As you can imagine, a simple function that returns duplicate values might have many potential uses; not the least of which, might be remediation of duplicate values.
For example, rather than pushing each email address to an array and returning it, you might call another function that takes the first and last name as arguments, and builds an email address like firstname_lastname@company.com -- then checks if any user has that email address. If so, it could add a '1', like firstname_lastname1@company.com, and check again -- and so on, until it finds an available email. It could then return that email, and set the correct value for the user record.