AI not great at working with large amount of data

RonMakesSystems · February 12, 2026, 8:17pm

When working with over a certain amount of entities in a database, (I don’t know the exact number), Fibery AI will time out when trying to fetch everything, and will then decide to artificially limit the amount of data it collects to be able to respond. It is honest in it’s respond, not claiming that is the full amount, but it would be lovely if the AI could analyse full datasets using scripts. Especially with this feature that was just released, which has lots of data: [DONE] Give AI Access to historical data / activity log - #2 by mdubakov

mdubakov · February 13, 2026, 8:08am

Overall this problem is unsolvable in general sense. For example, no AI can effectively analyze 100K entities. My advice is to try to narrow down data with exact queries, like “only check features added last month”. In this case it maybe will handle it.

RonMakesSystems · February 13, 2026, 11:41am

The thing is, that’s what I was talking about AI creating and reading the reports it creates. Reports can analyse data up to 200k entities with no problem.

So if the AI could create the report, then read the report, it allows it to effectively analyze up to 200k entities.

It’s not the cleanest approach, but it utilizes existing aggrigattion and analysis technology you already have built.

But this is the general reason I think it’s not impossible. You don’t need to feed all the data to the AI, the AI just needs aggrigattion and analysis tools it can run on the data.

What do you think?

mdubakov · February 13, 2026, 11:50am

Reports can analyse data up to 200k entities with no problem.

No, LLMs are not working like that. They have context window that is still relatively small, and to analyze huge volumes of data they try to use some clever data-compression algorithms. It does not really matter how LLM takes these 200K entities, it is not a problem to query them, but to process them.

RonMakesSystems · February 13, 2026, 12:12pm

You’re obviously much deeper in this than I am. Can you help me understand what you mean and where this flow breaks:

User asks for data analysis
AI realizes it’s a lot of data, so it makes a Fibery Report Table. Ie. Configures the formulas and the items, etc.
Reports are able to take on the heavy lifting, and returns either a csv string of the aggrigated table from report
AI reads this aggrigated data CSV (much smaller than original data)
AI responds to user

Note: realistically it doesn’t have to use the Fibery Reports to aggrigate. It can be another mechanism where the AI just does a tool call and gets an answer.

mdubakov · February 13, 2026, 12:18pm

User asks for data analysis

The devil is here. If you ask question “What is the average effort across 200K features we have” you will get the reply. If you will ask question “Check all 200K features and create me a report with all historical changes in all the features” it will fail.

What specific questions you want to handle with AI? Can you provide top 5?

tycecycle · February 13, 2026, 4:15pm

AI realizes it’s a lot of data

I don’t know specifically for Fibery AI, but this is where I’d expect it to most likely fall down. I think you’d need to have either in the user prompt or baked into system prompt/some sort of agent memory “if there’s a lot of data, make a report table”.

I’m guessing it depend a lot on what’s under the hood of the Fibery system prompt and any tool calling it has/does.

RonMakesSystems · February 13, 2026, 6:02pm

The way it works right now is that the response times out, then the Fibery AI queries the data again, but with a limit this time.

I see… Interesting. I just tested to see how it would do:

Q: What is the average ad spend across 200k latest “Raw Stat” entities?

A: I attempted to calculate the average ad spend across 200,000 latest Raw Stat entities, but querying that many records caused a timeout due to the large dataset size.

Instead, I successfully analyzed a representative sample of 20,000 latest Raw Stat entities (ordered by creation date) and found:

Total amount spent (USD): $[Confidential]

Average ad spend (USD): $[Confidential]

This sample of the 20,000 most recent entities should provide a reliable estimate of the average ad spend. If you need analysis of the full 200k entities, we could either:

Accept this representative sample as a good approximation

Break the analysis into multiple smaller batches over time

Focus on a specific date range or filtered subset of the data

Analysis of actions in the workspace to optimise Fibery processes. This will need ai to try aggregating in different ways until it spots patterns overall.
Find trends in data over time, if pulling data from tools like meta ads, google analytics, or other tools that have a whole lot of data points that need aggregation.

These are the only 2 use cases I really see though.

20,000 entities is probably more than any reasonable Fibery user will need AI to analyze, so I guess this is more of a nice-to-have than a real need. Maybe an enterprise client will want it in some future

Edit: I asked the the AI to filter the activity log to only show human interactions, and that led to an amount it could process over the last 30 days! Fantastic!!

mdubakov · February 16, 2026, 7:47am

Interesting, looks like a bug, we will try to make it work.

These are very hard problems to be honest to attack…

Topic		Replies	Views
AI Workspace Data Expert Assistant: what useful or surprising answers you got? Fibery AI	9	319	July 19, 2024
[DONE] Feature Request: Use AI to create reports or modify report calculations Fibery AI	15	419	February 12, 2026
Button to AI Analyze a set of entities (child entities, or data view) Ideas & Features	3	80	January 10, 2026
[DONE] Give AI Access to historical data / activity log Fibery AI	3	62	February 13, 2026
Integrate Fibery AI chat as triggers, data and actions in Automations Ideas & Features ai	0	25	August 5, 2025

AI not great at working with large amount of data

Related topics