To normalize or denormalize? That's the question

I have a database design question regarding Fibery. The answer likely depends on Fibery’s underlying technical architecture. Here’s my use case:

I import datasets into Fibery from Google Contacts, LinkedIn, Substack, Discourse and various other sources that contain contact data. For each of these datasets, the email address is the unique key. What is the optimal way to combine these sources in Fibery?

Option 1: (This is what I do right now ).

Have one large Email database that combines everything from all sources. For example, all data imported from Google Contacts sits in field names that start with GC (to keep them visually together). All data imported from Substack sits in fields beginning with SS, and so on.

The benefit of this approach is that there is only one unique record per email address and the data from various sources sits conveniently together. It’s very easy to make overviews and formulas across multiple sources. The downside is that in practice many fields remain empty because the set of contacts in LinkedIn is not the same as the set of contacts in Substack, etc.

Option 2: (I am considering this)

Have one long but thin Email database and store the imported data in separate databases: one for LinkedIn, one for Google Contacts, one for Substack, etc. Create one-on-one relationships between those databases and the central Email database.

The benefit here is that there is no “waste of space”: no empty fields are used for data that doesn’t exist. The downside is that overviews and formulas need to do more work: they will have to traverse the various relationships to combine the data.

Question: How do I decide the best approach? What does it depend on? I now have 60,000 Email records (I use option one) which might grow to a few hundred thousand over the next years. Should I consider to refactor to option 2?

Is the number of records in the imported database relevant?

Is the number of formulas and combined overviews relevant?

Is the percentage of overlap between the imported databases relevant?

I’m sure all of this depends on the underlying architecture of Fibery and I don’t know much about that. If I’m going to make a switch from design option 1 to option 2, I want to do that now. And I don’t want to have to switch back later.

Any advice appreciated.

I personally like to keep things normalised. The nice thing with Fibery as opposed to something like Notion is that you can display multiple databases in the same view, show fields in the same column, and sort/filter by the same field as long as you keep naming consistent.

I have a saying “Same fields, same database. Different fields, different database”.

From a data approach I think its superior, but it does also depend on the UX. Sometimes splitting into different databases can cause a worse end-user experience, depending on how you set it up. But it might also be a better UX!

They use Postgres tables. Each database is a different postgres table im pretty sure. Not sure the way the formula engine works.

I think the approach that would scale best (but not sure how you’d import it properly…) is have a central database with all the shared fields, then each extra database contain only the EXTRA fields from that particular import. You could honestly make an “import” database to dump the data there and have automations map it to the right place and then delete the thing that was imported. But this will cost lots of automation runs…

You don’t want 5 different databases all with “email” and “name” fields. Same fields, same database. I didn’t fully understand if that was your plan separating or not.

If you’re not facing any issues, you should be fine with option 1 honestly though. It’s not the perfect way, but if it works for you, it’s simpler to maintain, understand, and doesn’t cost many automations.

Fibery will store the data and run formulas on it without problems (from what i’ve encountered with formulas on 60,000 records). If you want to display all the data, things will slow down to load on table view, then once loaded they will be smooth. Other views are not as good for lots of data. Reports max out at 200,000 entities. If you need more and are going to aggregate, do it using formulas and entities, then bring the aggregated entities into the report. Is there anywhere else you fear limitations?

I hope this helps! TLDR, technically option 1 isn’t as good or proper, but you’ll probably be okay.

Curious what others think as well! I haven’t worked with long wide databases, just long narrow databases.

2 Likes