How can I delete a duplicate entry automatically?

Hey everyone, nice to be a part of the fib fam :slight_smile:

So, we import lots of data through CSV, and sometimes there might be duplicates from different files that get imported into the same type.

Is there a way that I can set up an automation which checks for duplicate name fields and then deletes one of them?

Thanks in advance

1 Like

hey there, and glad to have you in the fib fam :wink:

When you have duplicates, does it matter which one gets deleted?
Is it enough to assume that they are duplicates if the name is the same?
Does the name have to match exactly, or does name=Name=NAME for example?

2 Likes

I think for this specific issue at hand :

  1. it doesn’t matter which one gets deleted
  2. Yes, lets just say the names can be used to identify the duplicates
  3. Name has to match exactly

I think the best thing would be if while importing fibery told me which ones are duplicate and then I could choose what the best course of action would be from there after reviewing. But if that’s not possible then setting up an automation that deletes these duplicates might be best.

Thanks!

1 Like

It’s actually quite difficult to do automatically, but there is a workaround way of highlighting duplicates:

Create a many-to-many auto-relation to the type itself, where the criteria for a match is Name = Name:
e.g. for the Task type:

Then create a formula that counts the number of tasks (or whatever) with the same name:

image

When looking at a view of the entities

adding a filter will make it easy to see the duplicates:

I think adding a duplicate check on CSV import would be a nicer solution, but until that’s available, this was the best I could come up with :thinking:

2 Likes

That’s some great logic work there @Chr1sG :slight_smile: thanks for this, I think this could work for the time being :slight_smile:

There’s still one problem though, that filter is going to list all the duplicates including one original copy that I would like to leave behind. So I guess I would still have to manually go through and delete them carefully. If there was a way to leave behind one of them and filter out all extra copies in a view, then it would be much easier to batch select and delete things.

Well, here’s another neat little trick that will allow you to select only one of the duplicate items:

Create a formula field as follows:
image

This will be true for all but the oldest of the duplicates:

You can now filter on this flag and all but one of the entities in each group of matches will be shown:

These can now be safely deleted :slight_smile:

4 Likes

Related: Check this post for (among other things) a script that does entity de-duplication using the GraphQL API:

2 Likes