How to Extract Comment and Cited Text from Rich Text Using Indices

I’m trying to create new functionality using existing in-line comments in a rich text field.

My goal with this is to create automations to aggregate comments with cited text in documents and reports; e.g. to comment on AI output, include the comments with a re-feed to AI to improve it.

My strategy is to create a script to fetch the comment itself, as well as cited text that was selected as commented-upon (the yellow underlined text that gets highlited when hovering over the text)

In the JSON of the rich text field, the comment text itself is clearly visible, but the cited text is only indicated with indices. For example:

  "comments": [
    {
      "from": 142,
      "to": 181,
      "id": "0c6c7927-dcab-4014-8552-ab22be4f4fd7",
      "body": {
        "doc": {
          "type": "doc",
          "content": [
            {
              "type": "paragraph",
              "attrs": {
                "guid": "551cc0fa-c40e-4f2d-a102-81a9e897ccf5"
              },
              "content": [
                {
                  "type": "text",
                  "text": "Comment one"
                }
              ]
            }
          ]
        },
        "comments": []
      },
      "date": 1746570173161,
      "author": {
        "id": "9e75b117-a9fa-44a1-96bd-4cebe6b6bbed"
      },
      "thread": null,
      "state": "open",
      "detached": false
    },

I tried to create a script that generates the cited text based on the indices “from” and “to” while mapping that to the characters in the rich text field.

Markdown extraction shows offsets, meaning wrong start and end characters result in the wrong cited text. What I suspect is that I need to:

  • Map the rich text JSON to Markdown indices.
  • Use \n per blockquote paragraph.
  • Capture snippets for all nodes.

Could you pllease give pointers or code example how this is done?

In the meantime, I discovered that:

  • Deleting text makes cited text available in hidden JSON - When text is deleted that contained comments, the comments are still present in the (hidden) JSON comment remnants of the field of the deleted text, and they include the cited text!
  • But only if deleting manually - However, automatically deleting the text does not preserve the JSON comment remnants. So it only works if I manually delete the contents of the rich text field.

This means that a (proof of concept, not efficient) workaround to get the cited text, is:

  1. Copy the rich text field A to another (temporary) rich text field B
  2. Manually delete all text in rich text field B
  3. Use a script to get the JSON remnants and extract the cited text.

After that, I have a script that create real ‘Comment’ entities of these inline comments, and display them in a textual thread, as well as in nested list views. This allows a team to be much more productive with their comments across content and by listing and filtering them as entities.

Prototype
In the meantime, I created an prototype using the manual Temp field deletion method:


msedge_GjYERg1TIn


Fibery and Prosemirror - some insights

  • Second potential workaround: External Node.js Script: Use @fibery/prosemirror-schema to parse document JSON, extract highlights, and map gaps (e.g., \n) with Node.textBetween and nodesBetween. Import results into Fibery manually or via API. Pros: Accurate, uses ProseMirror indexing. Cons: Non-native, requires external setup.
  • Better solution (request for the Fibery developers): Fibery API Enhancements: Request fibery.getTextRange(secret, from, to) and schema exposure in Fibery’s sandbox to enable native text extraction and gap mapping. Pros: Streamlines workflow, scalable. Cons: Not immediate, depends on Fibery’s timeline.

FYI here is an example of the hidden JSON with comment remnant and Cited Text after deleting the source text, in which the interesting part is:
“cite”: “this part is the cited text of the comment”,

    {
      "from": 27,
      "to": 69,
      "id": "0e47f9e5-702b-4c88-a963-7801ef7455c3",
      "body": {
        "doc": {
          "type": "doc",
          "content": [
            {
              "type": "paragraph",
              "attrs": {
                "guid": "38986533-9132-4356-b0ba-41fed5fa4c66"
              },
              "content": [
                {
                  "type": "text",
                  "text": "This is the comment text"
                }
              ]
            }
          ]
        },
        "comments": []
      },
      "date": 1747819863479,
      "author": {
        "id": "9e75b117-a9fa-44a1-96bd-4cebe6b6bbed"
      },
      "thread": null,
      "state": "open",
      "cite": "this part is the cited text of the comment",
      "detached": true
    }

For whomever is interested the (working) prototype script:

const fibery = context.getService('fibery');

/******************************
 *      CONFIGURATION         *
 ******************************/

// Database Names
const DB = {
    NODE: 'Testspace/Block',
    THREAD: 'Testspace/Thread',
    COMMENT: 'Testspace/Comment',
    USER: 'fibery/user'
};

// Field Names
const NODE_FIELDS = {
    BODY: 'Body',
    THREADS: 'Threads',
    NAME: 'Name',
    TEMP: 'Temp'
};

const THREAD_FIELDS = {
    NAME: 'Name',
    DOC_REFERENCE: 'Block',
    DESCRIPTION: 'Body',
    BODY: 'Body'
};

const COMMENT_FIELDS = {
    DESCRIPTION: 'Body',
    PARENT: 'ParentComment',
    AUTHOR: 'Author',
    CREATED_AT: 'CreatedAt',
    SUBCOMMENTS: 'SubComments',
    THREAD: 'Thread',
    BODY: 'Body',
    CITED_TEXT: 'CitedText',
    COMMENT_TEXT: 'CommentText'
};

// Behavior
const THREAD_NAME = "Inline Comments";

/******************************
 *  CORE FUNCTIONALITY BELOW  *
 ******************************/

function getTextFromProseMirrorDoc(docNode) {
    let text = '';
    if (!docNode) return text;

    if (Array.isArray(docNode.content)) {
        for (const child of docNode.content) {
            text += getTextFromProseMirrorDoc(child);
        }
    }

    if (docNode.text) {
        text += docNode.text;
    }
    return text;
}

function formatDateYYYYMMDD_HHMM(dateObj) {
    const year = dateObj.getFullYear();
    const month = String(dateObj.getMonth() + 1).padStart(2, '0');
    const day = String(dateObj.getDate()).padStart(2, '0');
    const hours = String(dateObj.getHours()).padStart(2, '0');
    const minutes = String(dateObj.getMinutes()).padStart(2, '0');

    return `${year}.${month}.${day}-${hours}:${minutes}`;
}

function buildCommentThread(comment, indentLevel, childrenMap, userMap) {
    const authorId = comment.author && comment.author.id ? comment.author.id : null;
    const authorName = authorId && userMap[authorId] ? userMap[authorId][NODE_FIELDS.NAME] : 'Unknown Author';

    const commentDate = new Date(comment.date);
    const formattedDate = formatDateYYYYMMDD_HHMM(commentDate);

    const commentText = getTextFromProseMirrorDoc(comment.body.doc);
    if (!commentText.trim()) {
        console.log(`Skipping comment with empty text at ${formattedDate} by ${authorName}`);
        return [];
    }

    // Build ProseMirror JSON for this comment
    const paragraphContent = [
        {
            type: "text",
            marks: [{ type: "strong" }],
            text: authorName
        },
        {
            type: "text",
            text: ` (${formattedDate}):`
        },
        {
            type: "hard_break"
        }
    ];

    // Add cited text only for top-level comments (indentLevel === 0)
    if (indentLevel === 0 && comment.cite && typeof comment.cite === 'string' && comment.cite.trim()) {
        const citedText = comment.cite.trim();
        paragraphContent.push(
            {
                type: "text",
                marks: [{ type: "em" }],
                text: "Cited: "
            },
            {
                type: "text",
                marks: [
                    { type: "em" },
                    {
                        type: "highlight",
                        attrs: {
                            guid: "",
                            color: "yellow"
                        }
                    }
                ],
                text: citedText
            },
            {
                type: "hard_break"
            }
        );
    }

    // Add the original comment content, preserving formatting
    if (comment.body.doc && Array.isArray(comment.body.doc.content)) {
        comment.body.doc.content.forEach(node => {
            if (node.type === 'paragraph' && Array.isArray(node.content)) {
                paragraphContent.push(...node.content);
            } else if (node.type === 'hard_break') {
                paragraphContent.push({ type: "hard_break" });
            }
            // Add support for other node types (e.g., bullet_list) if needed
        });
    }

    const commentNode = indentLevel > 0
        ? {
            type: "blockquote",
            content: [
                {
                    type: "paragraph",
                    attrs: { guid: "" },
                    content: paragraphContent
                }
            ]
        }
        : {
            type: "paragraph",
            attrs: { guid: "" },
            content: paragraphContent
        };

    // Process replies
    const replies = childrenMap[comment.id] || [];
    const replyNodes = [];
    for (const reply of replies) {
        const replyContent = buildCommentThread(reply, indentLevel + 1, childrenMap, userMap);
        replyNodes.push(...replyContent);
    }

    return [commentNode, ...replyNodes];
}

async function linkChildCommentToParent(parentId, childId) {
    try {
        await fibery.addCollectionItem(DB.COMMENT, parentId, COMMENT_FIELDS.SUBCOMMENTS, childId);
        console.log(`Linked child comment ${childId} to parent ${parentId}`);
    } catch (err) {
        console.error(`Failed to link child comment ${childId} to parent ${parentId}: ${err.message}`);
    }
}

async function createCommentEntity(comment, parentCommentId, threadId, entityId, childrenMap, userMap) {
    const commentText = getTextFromProseMirrorDoc(comment.body.doc);
    if (!commentText.trim()) {
        console.log(`Skipping comment entity creation due to empty text for thread ${threadId}`);
        return null;
    }

    const dateObj = new Date(comment.date);
    const commentName = commentText.trim() || "Untitled Comment";

    const userId = comment.author && comment.author.id ? comment.author.id : null;
    const authorRef = userId && userMap[userId] ? userMap[userId].id : null;

    const newCommentData = {
        [THREAD_FIELDS.DOC_REFERENCE]: entityId,
        [COMMENT_FIELDS.THREAD]: threadId,
        [COMMENT_FIELDS.PARENT]: parentCommentId || null,
        [COMMENT_FIELDS.AUTHOR]: authorRef || null,
        [COMMENT_FIELDS.CREATED_AT]: dateObj,
        [NODE_FIELDS.NAME]: commentName
    };

    let newComment;
    try {
        console.log(`Creating comment for node ${entityId}, thread ${threadId}`);
        newComment = await fibery.createEntity(DB.COMMENT, newCommentData);
        console.log(`Comment ${newComment.id} linked to thread ${threadId}`);
    } catch (err) {
        console.error(`Comment creation failed for node ${entityId}, thread ${threadId}: ${err.message}`);
        return null;
    }

    // Set Body field (includes both cited text and comment text)
    if (newComment[COMMENT_FIELDS.BODY] && newComment[COMMENT_FIELDS.BODY].Secret) {
        const commentBodyDoc = {
            type: "doc",
            content: [
                {
                    type: "paragraph",
                    attrs: { guid: "" },
                    content: []
                }
            ]
        };

        if (parentCommentId === null && comment.cite && typeof comment.cite === 'string' && comment.cite.trim()) {
            const citedText = comment.cite.trim();
            commentBodyDoc.content[0].content.push(
                {
                    type: "text",
                    marks: [{ type: "em" }],
                    text: "Cited: "
                },
                {
                    type: "text",
                    marks: [
                        { type: "em" },
                        {
                            type: "highlight",
                            attrs: {
                                guid: "",
                                color: "yellow"
                            }
                        }
                    ],
                    text: citedText
                },
                {
                    type: "hard_break"
                }
            );
        }

        // Append original comment content
        if (comment.body.doc && Array.isArray(comment.body.doc.content)) {
            commentBodyDoc.content[0].content.push(...comment.body.doc.content.flatMap(node => {
                if (node.type === 'paragraph' && Array.isArray(node.content)) {
                    return node.content;
                } else if (node.type === 'hard_break') {
                    return [{ type: "hard_break" }];
                }
                return [];
            }));
        }

        const commentBodyContent = {
            doc: commentBodyDoc,
            comments: []
        };

        const jsonString = JSON.stringify(commentBodyContent);
        console.log(`Setting Comment ${newComment.id} Body with JSON: ${jsonString}`);
        try {
            await fibery.setDocumentContent(
                newComment[COMMENT_FIELDS.BODY].Secret,
                jsonString,
                'json'
            );
            console.log(`Set Body for Comment ${newComment.id}`);
        } catch (err) {
            console.error(`Body update failed for comment ${newComment.id}: ${err.message}`);
        }
    }

    // Set CitedText field (only for top-level comments)
    if (newComment[COMMENT_FIELDS.CITED_TEXT] && newComment[COMMENT_FIELDS.CITED_TEXT].Secret && parentCommentId === null && comment.cite && typeof comment.cite === 'string' && comment.cite.trim()) {
        const citedText = comment.cite.trim();
        const citedTextDoc = {
            type: "doc",
            content: [
                {
                    type: "paragraph",
                    attrs: { guid: "" },
                    content: [
                        {
                            type: "text",
                            marks: [
                                { type: "em" },
                                {
                                    type: "highlight",
                                    attrs: {
                                        guid: "",
                                        color: "yellow"
                                    }
                                }
                            ],
                            text: citedText
                        }
                    ]
                }
            ]
        };

        const citedTextContent = {
            doc: citedTextDoc,
            comments: []
        };

        const citedJsonString = JSON.stringify(citedTextContent);
        console.log(`Setting Comment ${newComment.id} CitedText with JSON: ${citedJsonString}`);
        try {
            await fibery.setDocumentContent(
                newComment[COMMENT_FIELDS.CITED_TEXT].Secret,
                citedJsonString,
                'json'
            );
            console.log(`Set CitedText for Comment ${newComment.id}`);
        } catch (err) {
            console.error(`CitedText update failed for comment ${newComment.id}: ${err.message}`);
        }
    }

    // Set CommentText field
    if (newComment[COMMENT_FIELDS.COMMENT_TEXT] && newComment[COMMENT_FIELDS.COMMENT_TEXT].Secret) {
        const commentTextDoc = {
            type: "doc",
            content: comment.body.doc && Array.isArray(comment.body.doc.content)
                ? comment.body.doc.content
                : [
                    {
                        type: "paragraph",
                        attrs: { guid: "" },
                        content: [
                            {
                                type: "text",
                                text: commentText
                            }
                        ]
                    }
                ]
        };

        const commentTextContent = {
            doc: commentTextDoc,
            comments: []
        };

        const commentJsonString = JSON.stringify(commentTextContent);
        console.log(`Setting Comment ${newComment.id} CommentText with JSON: ${commentJsonString}`);
        try {
            await fibery.setDocumentContent(
                newComment[COMMENT_FIELDS.COMMENT_TEXT].Secret,
                commentJsonString,
                'json'
            );
            console.log(`Set CommentText for Comment ${newComment.id}`);
        } catch (err) {
            console.error(`CommentText update failed for comment ${newComment.id}: ${err.message}`);
        }
    }

    if (parentCommentId) {
        await linkChildCommentToParent(parentCommentId, newComment.id);
    }

    const childComments = childrenMap[comment.id] || [];
    for (const childComment of childComments) {
        await createCommentEntity(
            childComment,
            newComment.id,
            threadId,
            entityId,
            childrenMap,
            userMap
        );
    }

    return newComment.id;
}

async function run() {
    const currentEntities = args.currentEntities || [];
    if (currentEntities.length === 0) {
        console.log('No selection');
        return;
    }

    for (const entity of currentEntities) {
        let nodeEntity;
        try {
            nodeEntity = await fibery.getEntityById(
                DB.NODE,
                entity.id,
                [NODE_FIELDS.BODY, NODE_FIELDS.THREADS, NODE_FIELDS.NAME, NODE_FIELDS.TEMP]
            );
        } catch (err) {
            console.error(`Failed to fetch node ${entity.id}: ${err.message}`);
            continue;
        }

        if (!nodeEntity || !nodeEntity[NODE_FIELDS.TEMP]) {
            console.log(`Skipping ${entity.id} - missing Temp field or Node not found`);
            continue;
        }

        let tempJSON;
        try {
            tempJSON = await fibery.getDocumentContent(
                nodeEntity[NODE_FIELDS.TEMP].Secret,
                'json'
            );
        } catch (err) {
            console.error(`Temp document read failed for node ${entity.id}: ${err.message}`);
            continue;
        }

        const inlineComments = tempJSON.comments || [];
        if (inlineComments.length === 0) {
            console.log(`No comments in node ${entity.id}`);
            continue;
        }

        const userIds = [...new Set(inlineComments
            .filter(c => c.author && c.author.id)
            .map(c => c.author.id)
        )];

        let userMap = {};
        if (userIds.length > 0) {
            try {
                const users = await fibery.getEntitiesByIds(DB.USER, userIds, [NODE_FIELDS.NAME]);
                users.forEach(u => {
                    userMap[u.id] = u;
                });
            } catch (err) {
                console.error(`User fetch error for node ${entity.id}: ${err.message}`);
            }
        }

        const threadData = {
            [THREAD_FIELDS.DOC_REFERENCE]: entity.id,
            [NODE_FIELDS.NAME]: THREAD_NAME
        };

        let newThread;
        try {
            console.log(`Creating thread for node ${entity.id} with data: ${JSON.stringify(threadData)}`);
            newThread = await fibery.createEntity(DB.THREAD, threadData);
            console.log(`New Thread: ${newThread.id} for node ${entity.id}`);
        } catch (err) {
            console.error(`Thread creation failed for node ${entity.id}: ${err.message}`);
            continue;
        }

        const childrenMap = {};
        const topLevelComments = [];
        inlineComments.forEach(comment => {
            const parentId = comment.thread;
            if (parentId) {
                childrenMap[parentId] = childrenMap[parentId] || [];
                childrenMap[parentId].push(comment);
            } else {
                topLevelComments.push(comment);
            }
        });
        console.log(`Built childrenMap for node ${entity.id} with ${Object.keys(childrenMap).length} parent IDs`);
        console.log(`Found ${topLevelComments.length} top-level comments for node ${entity.id}`);

        for (const topLevelComment of topLevelComments) {
            await createCommentEntity(
                topLevelComment,
                null,
                newThread.id,
                entity.id,
                childrenMap,
                userMap
            );
        }

        // Build ProseMirror JSON for the thread's Body
        const threadContent = [];
        for (let i = 0; i < topLevelComments.length; i++) {
            if (i > 0) {
                threadContent.push({ type: "horizontal_rule" });
            }
            const commentNodes = buildCommentThread(topLevelComments[i], 0, childrenMap, userMap);
            console.log(`Comment nodes for top-level comment ${i + 1}: ${JSON.stringify(commentNodes)}`);
            threadContent.push(...commentNodes);
        }

        console.log(`Thread content for node ${entity.id}: ${JSON.stringify(threadContent)}`);

        if (threadContent.length === 0) {
            console.log(`No valid content for Thread ${newThread.id}, skipping Body update`);
            continue;
        }

        const threadDoc = {
            doc: {
                type: "doc",
                content: threadContent
            },
            comments: []
        };

        const threadJsonString = JSON.stringify(threadDoc);
        console.log(`Setting Thread ${newThread.id} Body with JSON: ${threadJsonString}`);
        if (newThread[THREAD_FIELDS.BODY] && newThread[THREAD_FIELDS.BODY].Secret) {
            try {
                await fibery.setDocumentContent(
                    newThread[THREAD_FIELDS.BODY].Secret,
                    threadJsonString,
                    'json'
                );
                console.log(`Set Body for Thread ${newThread.id} with aggregated comments`);
            } catch (err) {
                console.error(`Failed to set Body for Thread ${newThread.id}: ${err.message}`);
            }
        } else {
            console.error(`Thread ${newThread.id} does not have a valid Body field or Secret`);
        }

        try {
            await fibery.addCollectionItem(DB.NODE, entity.id, NODE_FIELDS.THREADS, newThread.id);
            console.log(`Updated Node ${entity.id} with thread ${newThread.id}`);
        } catch (err) {
            console.error(`Thread reference update failed for node ${entity.id}: ${err.message}`);
        }
    }
}

await run();

Working:

  1. Fetch Node Entity:
  • Retrieves Testspace/Block entity by ID, including Body, Threads, Name, and Temp fields.
  1. Read Temp Document:
  • Extracts JSON content from the Temp field, containing inline comments.
  1. Create Thread:
  • Creates an Testspace/Thread entity linked to the node, named “Inline Comments”.
  1. Build Comment Hierarchy:
  • Organizes comments into top-level and child comments using a childrenMap.
  1. Create Comment Entities:
  • For each comment:
    • Skips if text is empty.
    • Creates an Testspace/Comment entity with Name, Author, CreatedAt, and links to Thread and ParentComment (if applicable).
    • Sets Body (cited text for top-level, original comment content).
    • Sets CitedText (for top-level comments, with italic and yellow highlight).
    • Sets CommentText (original comment content, preserving formatting).
  1. Build Thread Body:
  • Constructs ProseMirror JSON for Thread’s Body, including:
    • Author, date, cited text (top-level), and original comment content with formatting.
    • Blockquotes for replies, horizontal rules between top-level comments.
  1. Update Thread and Node:
  • Sets Thread’s Body with the aggregated comment content.
  • Links the Thread to the node’s Threads field.