Script Explanation: Splitting AI Chat into Sections
This script processes a long AI chat transcript stored in the Description field of a parent entity, splits it into sections based on user-defined divider strings, and creates child entities for each section. Each section is formatted with headers and structured text for clarity. Below is a detailed explanation of how the script works and how to customize it.
Purpose
The script is designed to:
- Split Long Text: Divide the chat into sections based on specific phrases (e.g., “You said:” and “ChatGPT said:”).
- Create Child Entities: Each section is saved as a new child entity in a separate database, linked back to the parent entity.
- Format the Content:
- Add H2 headers to label the user’s message and the assistant’s response.
- Italicize the user’s message for emphasis.
Key Features
- Customizable Dividers:
DIVIDER
: The phrase indicating the start of the user’s message (default:"You said:"
).SECOND_DIVIDER
: The phrase indicating the start of the assistant’s response (default:"ChatGPT said:"
).- The script automatically removes any preceding hash marks (e.g.,
######You said:
) for clean text processing.
- H2 Headers:
H2_FOR_DIVIDER
: Inserted above the user’s message (default:"User Message"
).H2_FOR_SECOND_DIVIDER
: Inserted above the assistant’s response (default:"Assistant Message"
).
- Formatted Content:
- The user’s message is italicized using Markdown (
*...*
). - Each section is structured with clear headers for improved readability.
- The user’s message is italicized using Markdown (
- Incremental Numbering:
- A numeric field (e.g.,
Weight
) is assigned to each child entity, starting at 1 and incrementing for each subsequent section.
- A numeric field (e.g.,
Customization
The script includes constants at the top to make adjustments easy:
-
Database Configuration:
CURRENT_DATABASE_NAME
: The database of the parent entity.TARGET_DATABASE_NAME
: The database where child entities are created.COLLECTION_NAME
: The collection linking the parent entity to its child entities.
-
Divider Strings:
DIVIDER
: Customize the phrase that separates the user’s message from the rest of the text.SECOND_DIVIDER
: Customize the phrase that separates the assistant’s response from the user’s message.
-
H2 Headers:
H2_FOR_DIVIDER
: The text inserted as an H2 header for the user’s message.H2_FOR_SECOND_DIVIDER
: The text inserted as an H2 header for the assistant’s response.
-
Name Field:
NAME_MAX_LENGTH
: Limits the number of characters in the name of the child entity. Adjust this to suit your needs.
-
Numeric Field:
NUMBERING_FIELD
: The field name for storing the incremental number in the child entity.
How It Works
- Input: A parent entity contains a long text in its Description field.
- Processing:
- The text is split into sections based on
DIVIDER
. - Each section is further split into a User Message and Assistant Message using
SECOND_DIVIDER
. - Leading hashes (e.g.,
######
) are removed from lines containing dividers.
- The text is split into sections based on
- Output:
- A new child entity is created for each section:
- The first part (user message) is italicized and labeled with an H2 header (
H2_FOR_DIVIDER
). - The second part (assistant response) is labeled with an H2 header (
H2_FOR_SECOND_DIVIDER
), if present.
- The first part (user message) is italicized and labeled with an H2 header (
- The child entities are linked to the parent entity in a collection.
- A new child entity is created for each section:
Tips for Usage
- Divider Matching: Ensure the
DIVIDER
andSECOND_DIVIDER
match the actual phrases in your text. The script will ignore leading hashes and whitespace automatically. - Header Customization: Change the
H2_FOR_DIVIDER
andH2_FOR_SECOND_DIVIDER
constants to use more descriptive headers for your use case. - Text Formatting: Use the
NAME_MAX_LENGTH
constant to adjust how much of the user’s message appears as the child entity’s name.
This script ensures that long AI chat transcripts are split into structured, manageable sections with clear formatting and linked for easy navigation and analysis.
Script
const fibery = context.getService('fibery');
// Configuration section
const DB_CONFIG = {
CURRENT_DATABASE_NAME: 'Content/Leaf', // The database of the current entity
TARGET_DATABASE_NAME: 'Content/Section', // The database where new entities will be created
ENTITY_FIELDS: ['Name', 'Description'], // Fields to retrieve for the entity
DOCUMENT_FORMAT: 'md', // Format of the document content
COLLECTION_NAME: 'Sections' // Collection name to add new entities to
};
// Dividers
const DIVIDER = 'You said:'; // First divider string
const SECOND_DIVIDER = 'ChatGPT said:'; // Second divider string for the end of the first part
// Name and field configurations
const NAME_MAX_LENGTH = 50; // Maximum number of characters for the 'Name' field of the child entities
const NUMBERING_FIELD = 'Weight'; // Name of the numeric field to store the numbering value
// H2 Headers for the replaced dividers
const H2_FOR_DIVIDER = 'User Message'; // H2 header to insert where the first divider was
const H2_FOR_SECOND_DIVIDER = 'Assistant Message'; // H2 header to insert where the second divider was
// Helper function to escape special characters in divider strings for regex
function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
async function cloneEntitiesByDividerWithNumbering() {
try {
const currentEntity = args.currentEntities[0];
const entity = await fibery.getEntityById(DB_CONFIG.CURRENT_DATABASE_NAME, currentEntity.id, DB_CONFIG.ENTITY_FIELDS);
let descriptionContent = await fibery.getDocumentContent(entity['Description'].Secret, DB_CONFIG.DOCUMENT_FORMAT);
// Regex to remove leading hashes before the main DIVIDER
const dividerRegex = new RegExp('^\\s*#+\\s*' + escapeRegExp(DIVIDER), 'gm');
descriptionContent = descriptionContent.replace(dividerRegex, DIVIDER);
// Split the content into sections based on the primary DIVIDER
const sections = splitContentByDivider(descriptionContent, DIVIDER);
let numbering = 1; // Initialize numbering
for (const section of sections) {
let sectionContent = section.content.trim();
if (sectionContent.length > 0) {
// Regex to remove leading hashes before the SECOND_DIVIDER
const secondDividerRegex = new RegExp('^\\s*#+\\s*' + escapeRegExp(SECOND_DIVIDER), 'gm');
sectionContent = sectionContent.replace(secondDividerRegex, SECOND_DIVIDER);
// Locate the SECOND_DIVIDER in the section
const secondDividerIndex = sectionContent.indexOf(SECOND_DIVIDER);
let firstPart;
let secondPart = '';
if (secondDividerIndex !== -1) {
firstPart = sectionContent.substring(0, secondDividerIndex).trim();
secondPart = sectionContent.substring(secondDividerIndex + SECOND_DIVIDER.length).trim();
} else {
firstPart = sectionContent;
}
// Clean up the first part: remove backslashes and turn newlines into spaces
firstPart = firstPart.replace(/\\/g, '').replace(/\n+/g, ' ');
// Create the name from the first part
const name = firstPart.substring(0, NAME_MAX_LENGTH).trim();
// Construct the formatted content with H2 headers and italic formatting
let formattedContent = `## ${H2_FOR_DIVIDER}\n\n*${firstPart}*`;
if (secondPart && secondPart.length > 0) {
formattedContent += `\n\n## ${H2_FOR_SECOND_DIVIDER}\n\n${secondPart}`;
}
// Create the new entity in the target database
const clonedEntityData = {
'Name': name,
[NUMBERING_FIELD]: numbering
};
const clonedEntity = await fibery.createEntity(DB_CONFIG.TARGET_DATABASE_NAME, clonedEntityData);
// Update the new entity's description with the formatted content
await fibery.setDocumentContent(clonedEntity['Description'].Secret, formattedContent, DB_CONFIG.DOCUMENT_FORMAT);
// Add the new entity to the specified collection of the current entity
await fibery.addCollectionItem(DB_CONFIG.CURRENT_DATABASE_NAME, currentEntity.id, DB_CONFIG.COLLECTION_NAME, clonedEntity.id);
console.log(`Created entity in ${DB_CONFIG.TARGET_DATABASE_NAME} with ID: ${clonedEntity.id}, Name: ${name}, and ${NUMBERING_FIELD}: ${numbering}`);
// Increment numbering for the next entity
numbering++;
}
}
} catch (error) {
console.error('Error in script:', error);
}
}
function splitContentByDivider(content, divider) {
// Split by the divider and ignore the first chunk (before the first occurrence)
const parts = content.split(divider);
const sections = [];
for (let i = 1; i < parts.length; i++) {
sections.push({ content: parts[i] });
}
return sections;
}
await cloneEntitiesByDividerWithNumbering();