Working with a DB having ~160k records, I am consistently having problems like:
Views will not load; "Canceling statement due to statement timeout"
Cannot perform even fairly simple graphQL queries with a limit of 20-100 records; "The upstream server is timing out"
Loading a DB setup page or a Rule page can take minutes, causing browser to repeatedly prompt that āpage is unresponsiveā before eventually loading⦠maybe.
The UI for all Fibery pages grinds to a halt and uses up all memory
I am faced with the simple task of deleting a few thousand records, but can find no way to easily accomplish this, because everything I try simply times out or gets canceled due to the above issues.
Additionally, when these errors occur, there is no feedback about whether the aborted operation was completely or partially executed, or rolled-back, or �
E.g. the graphQL API call below might succeed if the limit is <80, but otherwise will consistently return āupstream server is timing outā:
Deleting a few thousand records defined by such a simple filter should be easy. But It is impossible without resorting to complex API-call loops.
The obvious thing would be to use a Rule to delete old records, but this does not work because:
Rules also suffer from the same timeout issues.
There is no way to use LIMIT with Rules (unless in a script)
Rules only run max once per hour - not enough if each API call is limited to deleting 50 records and there are tens of thousands to delete.
I have resorted to looping a local script to repeatedly send a graphQL query to delete 50-70 records at a time - each query can take 20-60 seconds to finish, or it might just return an error, with no indication of what (if anything) if actually accomplished.
Is this really the best we can do in a modern āno codeā platform?
It should not take days to delete a few thousand records.
I canāt see an unanswered chat from you, but I can see a chat where we are still digging into the problem. I expect @Kseniya_Piotuh will get back to you when there is news.
It certainly sounds like there is something funny going on, beyond what should be expected for a large database.
Iām also keen to understand the architectural decisions behind Fibery. When a system like fibery struggles with rendering pages or performing tasks already with 100k records, its a serious scalability or performance optimization issue.
Specifically, how have you designed and optimized the system to efficiently handle large datasets and address potential performance bottlenecks? For example:
Challenges:
Handling performance slowdowns associated with data growth.
Addressing physical storage limitations.
Managing resource-intensive complex queries.
Data Management Strategies:
Techniques to optimize database queries.
Implementation of connection pools.
Use of database indexing and caching.
Approaches to avoid complex joins and reduce the number of database queries.
Methods for data compression to reduce storage space and improve retrieval times.
User Experience and Data Retrieval:
Use of pagination to manage data display.
Implementation of lazy loading for on-demand data retrieval.
Scaling Approaches:
Vertical scaling strategies to enhance a single serverās capabilities.
Horizontal scaling methods, including the use of replication and sharding.
Scaling Complexities:
How you handle query routing in sharded systems.
Ensuring data consistency in replicated databases.
Managing the increased administrative tasks associated with scaling.
Infrastructure and Hardware:
Strategies for hardware and infrastructure scaling, including cloud solutions.
Techniques for archiving old, infrequently accessed data.
Monitoring:
Techniques to pinpoint performance bottlenecks.
Strategies to ensure optimal resource usage.
Thank you in advance for shedding light on these aspects.
I donāt think you can expect to get an in-depth answer to this.
Fibery has a team of experienced developers, and the system has been designed with various performance criteria in mind, including with large datasets.
We are considering making some constraints/limitations explicit, so that users can set their expectations at the correct level.
We are actively reviewing the case that @Matt_Blais has raised.
Thanks for feedback. Regarding the issue related to GraphQL upstreaming errors, I would like to notify you that we are working on functionality allowing to start and monitor background jobs for mentioned tasks. Looks like it is an only way in the nearest future to execute long running processes without failures. Will keep you informed about that. Sorry for inconvenience.
We implemented the way to execute GraphQL mutations as background job to avoid timeouts.
You can find documentation here: Fibery GraphQL API
Please let me know how it goes if you will have a chance to try.
I know, it doesnāt solve all of your issues, but we donāt give up and will proceed with automations performance tuning to improve things there.
This is what I need to do to clean up and delete older entities:
I have ~586k āCallā records which are linked to ~35k āCall Statsā records.
I will use a long-running graphQL call to unlink 450k Call records from Stats.
Every unlink triggers a Stats Rule which adds the unlinked Callās values to the Stats.
When I do this manually, I can see the Stats values increasing as the Rule processes through all the unlinked Calls (about 10-20 per second).
I have no problem waiting a long time for everything to finish, but this will create a very deep queue of unlinked Calls for the Rule to process, and if the Rules do not all eventually finish successfully, that would be a problem (data inconsistency).
Should I wait to start this process? Or do you believe the system can currently handle this scenario correctly?
@Oleg Iām having trouble understanding the syntax for the new graphQL capability - itās not clear how to specify the filter/search criteria for selecting the records to be mutated.
Could you please translate this ānormalā graphQL query to work in the extended mode? Note that the filtered Calls are first updated, then unlinked:
It appears that the results ultimately returned from a completed background job can only return the id of affected entities, but no other fields.
Because I cannot see the Name, Public Id, etc, of affected entities, it is difficult to verify which entities were actually mutated. This makes testing/verifying a batch query difficult.
It appears that the results ultimately returned from a completed background job can only return the id of affected entities, but no other fields.
It works in the same way for regular mutations. Unfortunately due to high memory consumption we can return only ids since hundreds of entities can have place.
I am seeing that the Unlink Rule is having some trouble when 1000 entities are unlinked
We are working on tuning automations rules for large amounts of data. Will publish fixes in nearest future and hope it will help to avoid having such issues.