Extract raw text from attached file

Yuri_BC · July 14, 2025, 11:08am

It would be extremely helpful to have this simple automation action:

Extract raw text from attached file

Chr1sG · July 14, 2025, 11:42am

What file types did you have in mind?

(I don’t think you would want to extract raw text from an attached .jpg file )

Yuri_BC · July 14, 2025, 12:32pm

Problem

Up to now, external content as well as code snippets I need to insert in entities their rich text field. This is a big problem, because:

ProseMirror enforces a strict document schema. It automatically validates, escapes, and restructures input, which can alter formatting, remove unsupported elements, and escape special characters. This makes it difficult to preserve the original structure or exact content of external text.
Also, a rich text field is unable to contain very large content (it rejects that)

Proposal

We need:

A replacement of a rich text field that better holds exact text and code. (Similar to the (currently invisible) JSON field, but just a generic text field would be great to have, as well as access of file content)
Automations to fetch the text from inside attached files.

This serves two perposes:

Exact content/code (no distortion by ProseMirror)
Large content (e.g. long scripts or long transcripts, can be attached as file, instead of crammed in a rich text field either loads very slow or simply rejects the amount of text)

Suggested file formats to support for text extraction:

Markdown (.md)
– Structured knowledge blocks
– Semantic entity conversion
Plain Text (.txt)
– Logs or raw transcripts
– Unstructured input for NLP
Microsoft Word (.docx)
– Meeting notes
– Headings and tables
PDF (text-based)
– Reports or whitepapers
– Legal or archival records
CSV (.csv)
– Tabular task or metric data
– Rows mapped to entities
JSON (.json)
– Config or schema files
– Structured database population
YAML (.yaml/.yml)
– Workflow or deployment configs
– Nested structure definitions
HTML/XHTML (.html/.xhtml)
– Static documentation pages
– Article bodies or metadata
Code Files (.js, .py, .sh, etc.)
– Logic or automation scripts
– Snippets linked to functions
Rich Text Format (.rtf)
– Formatted notes
– Legacy content migration

Matt_Blais · July 14, 2025, 3:35pm

Perhaps you could automate sending your uploaded (attached to entity) files to an external service that would do the needed extraction/conversion, then store the result into another field/entity.
File API

Yuri_BC · July 14, 2025, 3:50pm

I understand that, but given that Fibery is such a workflow-centric platform, I’m pointing to a gap in what many users would reasonably expect out of the box.

The ability to extract and work with the raw text of attached files—especially for content like transcripts, scripts, and code—is a foundational capability for many knowledge and automation workflows.

Even as custom integration I would not like it, for something so basic it adds significant friction to an internal process experience.

Matt_Blais · July 14, 2025, 3:56pm

A counter-argument might be that “document conversion” is too broad and deep a function to be a core feature (with many differing needs), and thus it would make sense to use an integration (of some sort) to outsource this to a dedicated service, so it can be better customized for each use case.

But it would be great if implementing such an integration was simpler!

Chr1sG · July 14, 2025, 4:03pm

I don’t mean to sound churlish, but although it may feel like ‘something so basic’ to you, I don’t imagine it is a part of many people’s knowledge workflows, let alone ‘a foundational capability’.

I’d even go as far as to say that Fibery is a ‘data-centric’ platform, rather than a 'workflow-centric’ platform.

Anyway, I’m willing to be proven wrong - let’s see what the votes say.

Yuri_BC · July 14, 2025, 4:08pm

I’d say Fibery is an ‘insight-centric’ platform, or its moving towards being that, seen the developments in the last year.
If Fibery becomes the AI-co-operating system of an organization, then its insight-generating workflows based. @mdubakov ?

Topic		Replies	Views
Easily transfer data from Rich Text field to Text field? Get Help	5	1365	November 28, 2023
Export table including rich-text Get Help export	10	937	October 24, 2022
Save rich text docs in github (github integration) Ideas & Features	5	256	April 7, 2023
Full Rich text retrieval API & Programming	1	166	March 5, 2024
Allow pasting markdown in rich text field Ideas & Features rich-text	7	450	March 28, 2025

Extract raw text from attached file

Problem

Proposal

Suggested file formats to support for text extraction:

Related topics