Showcase: Script to split AI chat into child entities

Script Explanation: Splitting AI Chat into Sections

This script processes a long AI chat transcript stored in the Description field of a parent entity, splits it into sections based on user-defined divider strings, and creates child entities for each section. Each section is formatted with headers and structured text for clarity. Below is a detailed explanation of how the script works and how to customize it.


Purpose

The script is designed to:

  1. Split Long Text: Divide the chat into sections based on specific phrases (e.g., “You said:” and “ChatGPT said:”).
  2. Create Child Entities: Each section is saved as a new child entity in a separate database, linked back to the parent entity.
  3. Format the Content:
    • Add H2 headers to label the user’s message and the assistant’s response.
    • Italicize the user’s message for emphasis.

Key Features

  • Customizable Dividers:
    • DIVIDER: The phrase indicating the start of the user’s message (default: "You said:").
    • SECOND_DIVIDER: The phrase indicating the start of the assistant’s response (default: "ChatGPT said:").
    • The script automatically removes any preceding hash marks (e.g., ######You said:) for clean text processing.
  • H2 Headers:
    • H2_FOR_DIVIDER: Inserted above the user’s message (default: "User Message").
    • H2_FOR_SECOND_DIVIDER: Inserted above the assistant’s response (default: "Assistant Message").
  • Formatted Content:
    • The user’s message is italicized using Markdown (*...*).
    • Each section is structured with clear headers for improved readability.
  • Incremental Numbering:
    • A numeric field (e.g., Weight) is assigned to each child entity, starting at 1 and incrementing for each subsequent section.

Customization

The script includes constants at the top to make adjustments easy:

  1. Database Configuration:

    • CURRENT_DATABASE_NAME: The database of the parent entity.
    • TARGET_DATABASE_NAME: The database where child entities are created.
    • COLLECTION_NAME: The collection linking the parent entity to its child entities.
  2. Divider Strings:

    • DIVIDER: Customize the phrase that separates the user’s message from the rest of the text.
    • SECOND_DIVIDER: Customize the phrase that separates the assistant’s response from the user’s message.
  3. H2 Headers:

    • H2_FOR_DIVIDER: The text inserted as an H2 header for the user’s message.
    • H2_FOR_SECOND_DIVIDER: The text inserted as an H2 header for the assistant’s response.
  4. Name Field:

    • NAME_MAX_LENGTH: Limits the number of characters in the name of the child entity. Adjust this to suit your needs.
  5. Numeric Field:

    • NUMBERING_FIELD: The field name for storing the incremental number in the child entity.

How It Works

  1. Input: A parent entity contains a long text in its Description field.
  2. Processing:
    • The text is split into sections based on DIVIDER.
    • Each section is further split into a User Message and Assistant Message using SECOND_DIVIDER.
    • Leading hashes (e.g., ######) are removed from lines containing dividers.
  3. Output:
    • A new child entity is created for each section:
      • The first part (user message) is italicized and labeled with an H2 header (H2_FOR_DIVIDER).
      • The second part (assistant response) is labeled with an H2 header (H2_FOR_SECOND_DIVIDER), if present.
    • The child entities are linked to the parent entity in a collection.

Tips for Usage

  • Divider Matching: Ensure the DIVIDER and SECOND_DIVIDER match the actual phrases in your text. The script will ignore leading hashes and whitespace automatically.
  • Header Customization: Change the H2_FOR_DIVIDER and H2_FOR_SECOND_DIVIDER constants to use more descriptive headers for your use case.
  • Text Formatting: Use the NAME_MAX_LENGTH constant to adjust how much of the user’s message appears as the child entity’s name.

This script ensures that long AI chat transcripts are split into structured, manageable sections with clear formatting and linked for easy navigation and analysis.


Script

const fibery = context.getService('fibery');

// Configuration section
const DB_CONFIG = {
    CURRENT_DATABASE_NAME: 'Content/Leaf',   // The database of the current entity
    TARGET_DATABASE_NAME: 'Content/Section', // The database where new entities will be created
    ENTITY_FIELDS: ['Name', 'Description'],  // Fields to retrieve for the entity
    DOCUMENT_FORMAT: 'md',                   // Format of the document content
    COLLECTION_NAME: 'Sections'              // Collection name to add new entities to
};

// Dividers
const DIVIDER = 'You said:';             // First divider string
const SECOND_DIVIDER = 'ChatGPT said:';  // Second divider string for the end of the first part

// Name and field configurations
const NAME_MAX_LENGTH = 50;     // Maximum number of characters for the 'Name' field of the child entities
const NUMBERING_FIELD = 'Weight'; // Name of the numeric field to store the numbering value

// H2 Headers for the replaced dividers
const H2_FOR_DIVIDER = 'User Message';      // H2 header to insert where the first divider was
const H2_FOR_SECOND_DIVIDER = 'Assistant Message'; // H2 header to insert where the second divider was

// Helper function to escape special characters in divider strings for regex
function escapeRegExp(string) {
    return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

async function cloneEntitiesByDividerWithNumbering() {
    try {
        const currentEntity = args.currentEntities[0];
        const entity = await fibery.getEntityById(DB_CONFIG.CURRENT_DATABASE_NAME, currentEntity.id, DB_CONFIG.ENTITY_FIELDS);

        let descriptionContent = await fibery.getDocumentContent(entity['Description'].Secret, DB_CONFIG.DOCUMENT_FORMAT);

        // Regex to remove leading hashes before the main DIVIDER
        const dividerRegex = new RegExp('^\\s*#+\\s*' + escapeRegExp(DIVIDER), 'gm');
        descriptionContent = descriptionContent.replace(dividerRegex, DIVIDER);

        // Split the content into sections based on the primary DIVIDER
        const sections = splitContentByDivider(descriptionContent, DIVIDER);

        let numbering = 1; // Initialize numbering

        for (const section of sections) {
            let sectionContent = section.content.trim();
            if (sectionContent.length > 0) {
                // Regex to remove leading hashes before the SECOND_DIVIDER
                const secondDividerRegex = new RegExp('^\\s*#+\\s*' + escapeRegExp(SECOND_DIVIDER), 'gm');
                sectionContent = sectionContent.replace(secondDividerRegex, SECOND_DIVIDER);

                // Locate the SECOND_DIVIDER in the section
                const secondDividerIndex = sectionContent.indexOf(SECOND_DIVIDER);

                let firstPart;
                let secondPart = '';

                if (secondDividerIndex !== -1) {
                    firstPart = sectionContent.substring(0, secondDividerIndex).trim();
                    secondPart = sectionContent.substring(secondDividerIndex + SECOND_DIVIDER.length).trim();
                } else {
                    firstPart = sectionContent;
                }

                // Clean up the first part: remove backslashes and turn newlines into spaces
                firstPart = firstPart.replace(/\\/g, '').replace(/\n+/g, ' ');

                // Create the name from the first part
                const name = firstPart.substring(0, NAME_MAX_LENGTH).trim();

                // Construct the formatted content with H2 headers and italic formatting
                let formattedContent = `## ${H2_FOR_DIVIDER}\n\n*${firstPart}*`;
                if (secondPart && secondPart.length > 0) {
                    formattedContent += `\n\n## ${H2_FOR_SECOND_DIVIDER}\n\n${secondPart}`;
                }

                // Create the new entity in the target database
                const clonedEntityData = {
                    'Name': name,
                    [NUMBERING_FIELD]: numbering
                };
                const clonedEntity = await fibery.createEntity(DB_CONFIG.TARGET_DATABASE_NAME, clonedEntityData);

                // Update the new entity's description with the formatted content
                await fibery.setDocumentContent(clonedEntity['Description'].Secret, formattedContent, DB_CONFIG.DOCUMENT_FORMAT);

                // Add the new entity to the specified collection of the current entity
                await fibery.addCollectionItem(DB_CONFIG.CURRENT_DATABASE_NAME, currentEntity.id, DB_CONFIG.COLLECTION_NAME, clonedEntity.id);

                console.log(`Created entity in ${DB_CONFIG.TARGET_DATABASE_NAME} with ID: ${clonedEntity.id}, Name: ${name}, and ${NUMBERING_FIELD}: ${numbering}`);

                // Increment numbering for the next entity
                numbering++;
            }
        }

    } catch (error) {
        console.error('Error in script:', error);
    }
}

function splitContentByDivider(content, divider) {
    // Split by the divider and ignore the first chunk (before the first occurrence)
    const parts = content.split(divider);
    const sections = [];

    for (let i = 1; i < parts.length; i++) {
        sections.push({ content: parts[i] });
    }

    return sections;
}

await cloneEntitiesByDividerWithNumbering();

2 Likes