The process of field mapping includes a check that the data in the CSV file is compatible with the chosen type.
So, if for example there is text in the CSV file in a column that is to be mapped to a number field then this will be flagged.
I think that’s why the whole dataset is loaded into memory.
Of course, if the incompatible data is beyond the 100th row, the mismatch won’t be visible
It seems reasonable therefore to only load the first 100 rows, I suppose.
Having said that, I don’t know that this would solve all issues with importing large files - there may be problems with the browser freezing during the actual import.
In my very humble non-developer opinion this is something that 100% should be done server-side. I mean obviously part of it is now, ultimately. More what I mean is I would send a CSV by selecting a file and clicking Import, it sends just the file to the server as efficiently as possible, the server munches on it, sends back to my UI just the info pertinent to me (as you note, field mapping issues, etc.), I select options from a very smooth, responsive UI, and when I click Finalize Import (or whatever) it sends my choices back to the server where it has my CSV ready to parse at its leisure, as server resources allow, etc. But critically, none of this should have any real effect on me in my browser for larger data sets. If Fibery wants to replace a multitude of other tools it will need more robust data ingest that doesn’t rely so much on user’s local hardware and browser resources.
I know that to-date Fibery has tried to do a lot of work on the user side. It seems to have mostly worked. But we’re starting to see the performance issues that can result from that.
Yeah, this was actually my first time trying the csv importer out. Overall, it is super easy to use and helps provide feedback about importing, but once you click “import” on large files you enter an unknown state where you don’t know what is going on. The client-side aspect of it makes it so that you also don’t know the state of the import if your browser crashes in the middle of it.
Even still, I don’t see why the client-side import of it has such an issue with this file. 60k rows is not crazy. I have 32GB of ram on a 2019 MacBook pro, so the device is very capable. 13MB of csv data should be able to be iterated through without freezing the client. I do agree though that ideally this would be sent to some kind of background task in some way or another that you can monitor the progress of while still using fibery in other areas.
Yeah, i try to give singular names in general, but forgot here. I did try this approach with our category tree I was trying to map to the amazon category tree. So, this is a similar import, but we only have ~450 categories. Same approach though where there is a reference to the parent on each row. I have the database name as DN Cat as a shortened version of <our company name> category. With the full-length word of category in there, the text truncates for me, so you can’t see the ending of the words.
I think even with the improved naming, there is room for confusion. This is tangentially related to performance, but as the numbers of items gets large, the uncertain aspects of the UX tend to make things worse.
Where things still are confusing is around the auto-relationship portion. Because the second line of the relationship setup has the same naming “DN Cat to DN Cat (arrow pointing up, meaning the parent cat),” I assumed it was referring to the same order as defined above (red arrows below). So, DN Cat (child) to DN Cat (parent), which means it would mean the relationship would be defined as DN Cat (child).parentcategoryid=DN Cat (parent).categoryid.
However, the configuration pictured above is not the correct order. I had to reverse the auto-relationship to have categoryid on the left side and parentcategoryid on the right side to get the relationship setup correctly.
Anyway, yeah I think some of these issues could use splitting off, but wanted to give kind of a complete account of the friction points I came across while trying to import this particular data set, which should give a good hierarchical test case.