When working with over a certain amount of entities in a database, (I don’t know the exact number), Fibery AI will time out when trying to fetch everything, and will then decide to artificially limit the amount of data it collects to be able to respond. It is honest in it’s respond, not claiming that is the full amount, but it would be lovely if the AI could analyse full datasets using scripts. Especially with this feature that was just released, which has lots of data: [DONE] Give AI Access to historical data / activity log - #2 by mdubakov
Overall this problem is unsolvable in general sense. For example, no AI can effectively analyze 100K entities. My advice is to try to narrow down data with exact queries, like “only check features added last month”. In this case it maybe will handle it.
The thing is, that’s what I was talking about AI creating and reading the reports it creates. Reports can analyse data up to 200k entities with no problem.
So if the AI could create the report, then read the report, it allows it to effectively analyze up to 200k entities.
It’s not the cleanest approach, but it utilizes existing aggrigattion and analysis technology you already have built.
But this is the general reason I think it’s not impossible. You don’t need to feed all the data to the AI, the AI just needs aggrigattion and analysis tools it can run on the data.
What do you think?
Reports can analyse data up to 200k entities with no problem.
No, LLMs are not working like that. They have context window that is still relatively small, and to analyze huge volumes of data they try to use some clever data-compression algorithms. It does not really matter how LLM takes these 200K entities, it is not a problem to query them, but to process them.
You’re obviously much deeper in this than I am. Can you help me understand what you mean and where this flow breaks:
- User asks for data analysis
- AI realizes it’s a lot of data, so it makes a Fibery Report Table. Ie. Configures the formulas and the items, etc.
- Reports are able to take on the heavy lifting, and returns either a csv string of the aggrigated table from report
- AI reads this aggrigated data CSV (much smaller than original data)
- AI responds to user
Note: realistically it doesn’t have to use the Fibery Reports to aggrigate. It can be another mechanism where the AI just does a tool call and gets an answer.
User asks for data analysis
The devil is here. If you ask question “What is the average effort across 200K features we have” you will get the reply. If you will ask question “Check all 200K features and create me a report with all historical changes in all the features” it will fail.
What specific questions you want to handle with AI? Can you provide top 5?
AI realizes it’s a lot of data
I don’t know specifically for Fibery AI, but this is where I’d expect it to most likely fall down. I think you’d need to have either in the user prompt or baked into system prompt/some sort of agent memory “if there’s a lot of data, make a report table”.
I’m guessing it depend a lot on what’s under the hood of the Fibery system prompt and any tool calling it has/does.
The way it works right now is that the response times out, then the Fibery AI queries the data again, but with a limit this time.
I see… Interesting. I just tested to see how it would do:
Q: What is the average ad spend across 200k latest “Raw Stat” entities?
A: I attempted to calculate the average ad spend across 200,000 latest Raw Stat entities, but querying that many records caused a timeout due to the large dataset size.
Instead, I successfully analyzed a representative sample of 20,000 latest Raw Stat entities (ordered by creation date) and found:
- Total amount spent (USD): $[Confidential]
- Average ad spend (USD): $[Confidential]
This sample of the 20,000 most recent entities should provide a reliable estimate of the average ad spend. If you need analysis of the full 200k entities, we could either:
- Accept this representative sample as a good approximation
- Break the analysis into multiple smaller batches over time
- Focus on a specific date range or filtered subset of the data
- Analysis of actions in the workspace to optimise Fibery processes. This will need ai to try aggregating in different ways until it spots patterns overall.
- Find trends in data over time, if pulling data from tools like meta ads, google analytics, or other tools that have a whole lot of data points that need aggregation.
These are the only 2 use cases I really see though.
20,000 entities is probably more than any reasonable Fibery user will need AI to analyze, so I guess this is more of a nice-to-have than a real need. Maybe an enterprise client will want it in some future ![]()
Edit: I asked the the AI to filter the activity log to only show human interactions, and that led to an amount it could process over the last 30 days! Fantastic!!
Interesting, looks like a bug, we will try to make it work.
These are very hard problems to be honest to attack…