Expanding AI Capabilities: Automatic Transcription and Image/Video Recognition for Files

Hi Fibery team :slightly_smiling_face: ! I’ve been exploring the AI capabilities in Fibery and noticed that while transcription works in Rich Text fields, it’s quite limited for file-based workflows. I have two proposals to make Fibery AI more powerful for creative and media-heavy teams:

1. AI Transcription for Attached Files & Automation Support Currently, to transcribe a file, you have to upload it specifically to a Rich Text field or Doc. This creates unnecessary manual work.

  • Proposal: Allow transcription for files attached to standard “File” fields.

  • Automation: Add the ability to trigger transcription via Automations/Buttons (e.g., “When a video is attached → Transcribe → Save result to a specific Text field”).

  • Benefit: This would eliminate the need to use Rich Text fields and will provide possibility to use automations for AI Transcription.

2. AI Vision: Automatic Image/Video Descriptions Currently, Fibery AI doesn’t “see” what’s inside images or videos. For teams managing hundreds of media files, manual tagging is a bottleneck.

  • Proposal: Integrate AI Vision to analyze and describe the contents of attached images and videos.

  • Use Case: An automation could “scan” an uploaded files and fill a “Description” field with details like: “A person holding a smartphone, blue background, outdoor setting.”

  • Benefit: This would make media assets searchable by content without manual data entry. It’s a ultra game-changer for creative agencies and marketing teams.

Conclusion: Moving transcription beyond Rich Text and adding computer vision would make Fibery a true “Single Source of Truth” for media-heavy projects.

The only way I can imagine this being viable is if you provide your own API keys for a compatible LLM of choice. This amount of AI processing will undoubtedly far exceed Fibery’s included AI usage per-account. Even then I suspect it falls outside Fibery’s intended use case.

If it were me and this kind of thing was important to my use cases I’d probably wire up an external processing pipeline for all media, which could trigger on attachment of a file to any attachment field perhaps, process it on a local machine or cloud service of choice, then return e.g. keywords, transcript, etc. which can be stored in an appropriate Fibery field type. In other words the capability you want could be implemented fairly effectively today, if you’re willing to do a bit of vibe coding.

I agree that this should be available to easily set up as an automation for attached files without having to vibecode workarounds and would happily provide connections to the services i want to pay for the usage

You are right about the costs :slightly_smiling_face: , and in fact, we already use an external pipeline for this inside our company. But having it native maybe be a huge improvement for Fibery.

Regarding the processing load: Fibery already uses a third-party provider for captions. The logic is the same as with subtitles in the Fibery: the file is sent for processing (with user consent in settings), and the result is returned back to a specific field or description. It could be a paid option where users buy extra processing hours.