Mapping links in imported content into Fibery links/references

I am about to import a bunch of content into Fibery from several other tools I use, within some of which I have more than 1000 files/pages/docs. I link between files/docs/pages quite a lot in many of my workflows and processes. In some of these tools I also get “backlinks”/references, similar to Fibery, which I would ideally like to maintain in some way.

Since I plan to import basically all of this content into Fibery, and so any page that another would link to should also be in Fibery somewhere, I want those links to still work after import, and go to the respective Fibery Entity. And if they did still work, things like “References” should also work. Thus I would really like some way of mapping various types of “internal” linking formats into Fibery References/internal links! I’ll outline the various types of link syntax in my source data, and I’m hopeful someone will have some ideas for how to handle some or all of them, either with pre-import changes, or a post-import automation, button, or script, ideally something that can operate on all files sequentially, or at least on large chunks of files per “run”. I am unfortunately not a coder so I can’t dig in to scripting, so I would be very grateful of any scripts/snippets anyone could share, if that’s the best/only way to do this.

My current data sources are:

Obsidian

Markdown files with .md extension. Links show up in the files as square brackets, e.g. [[Coaching homework]], which links to a Coaching homework.md file. Note that the square bracket links do not include the file extension.

I plan to import this content using CSV format into a “Docs” Database, with the results showing up as 1 Entity for each .md file. (if anyone has any brilliant ideas for bulk-importing .md file contents into a CSV column, let me know!).

Quip (Salesforce)

These will also handled as markdown files, though I have to individually export each page :weary:. The link syntax is standard wiki, i.e. [link text](url) and the urls themselves look like this: https://USERNAME.quip.com/iTavAk0rLMVX or sometimes, with a human-readable page title in the URL (newer links, I think): https://USERNAME.quip.com/spEXArjrGtJF/Notes-About-Fibery and in some cases links also include a link to a Header, e.g. https://USERNAME.quip.com/iTavAk0rLMVX#MOSACALgMSZ

You can technically link to any page with arbitrary text if you copy/paste the URL, but I pretty much have never done that (because it requires several extra steps). So I think I can count on the vast majority of links to have linked text that represents the page name. So even though there are weird, unique identifiers in the actual URLs, I’m hopeful I can just use the page name for the Fibery Entity Name, and then work out a way to convert the Link Text into an Entity Mention.

That said, the Quip export process appears to be quite manual, one page at a time, which is frustrating. So if there were a way to automate that, and hopefully the URL update-to-reference at the same time, that would be great. But from what I can see that may not really be possible. Quip can connect to Zapier and there is also a Quip API. Unfortunately the Zapier Triggers and Actions appear to be much more oriented about getting content into Quip than out (those bastards). I can’t find a “read doc” option, for example: Quip Integrations | Connect Your Apps with Zapier

As for the API, it appears I would also have to upgrade my Quip license to even have access to using it, and the version I’d need to get appears to only be available annually (uncertain, but that seems to be the case), which would simply not be worth it for me. So unless I’m missing something, maybe this one is a lost cause (i.e. must do manually), which I can accept if necessary.

Notion

These will also be exported as markdown, since the Fibery Integration with Notion doesn’t yet support Pages/Documents. But at least I can do it in bulk, unlike Quip. :smile: The files have a unique identifier appended to the file name, like this: Blogging 688b53c1d0c047b79fd5a9180ceb9212.md and links are similarly Wiki-style, with full https URLs, e.g. [Why you should be a rum enthusiast](https://www.notion.so/Why-you-should-be-a-rum-enthusiast-1137f884ffce47a28cbe9a14f3139f2e) So this situation appears similar to the Quip one, where the link text is the page title. The unique identifier appears to be the same between links and actual .md file, but I intend the Page Titles to of course not include the identifier, so I’ll have to work that out…

Anyway, with both Quip and Notion I think the idea would be to get the pages into Fibery using the proper Page Name, with no identifiers. And then hopefully reference the Link Text to figure out what Entity Reference to make in Fibery. Does that make sense?

So I think that’s the outline of where I’m at. If anyone has any ideas for doing the relinking/referencing in bulk, I would be ecstatic to hear! Otherwise I’m looking at a lot of manual work to relink 1000s of files and potentially 10s of 1000s of references. :grimacing: This appears to perhaps be part of the answer?

1 Like

I’m not a coder either (although I intend on learning!), but here is a potentially helpful resource I stumbled upon:

Note Link Janitor

Sorry if this does nothing for you!

2 Likes

Oh hey, that might indeed be handy, thanks!

1 Like

Here’s what I would suggest, with the Obsidian import as an example:

Import the various text files (as is) into a Docs DB as you suggest.

Write an automation script something like this:

const fibery = context.getService('fibery');

//get the whole schema
const schema = await fibery.getSchema();
//get the entity type
const typeName = args.currentEntities[0].type;
// filter the schema for the specific type and get the spacename and database id
const namespace = schema['typeObjects'].filter((obj) => obj.name == typeName)[0]['nameParts']['namespace'];
const databaseID = schema['typeObjects'].filter((obj) => obj.name == typeName)[0]['id']
//get all documents in the database (assumes DB is called Doc)
const allDocs = await fibery.graphql(namespace, "{ findDocs{ id, name }}");

//create a lookup table that matches entity names to 'mentions'
//the mentions are constructed from the database id and the entity id 
const lookupTable = allDocs['data']['findDocs'].reduce((obj, item) => (obj[item.name] = "[[#^" + databaseID + "/" + item.id + "]]", obj), {});

//for each entity
for (const entity of args.currentEntities) {
    //get the contents of the Description field (change to suit)
    const textToUpdate = await fibery.getDocumentContent(entity['Description']['Secret']);
    //check for not null doc
    if (textToUpdate !== null) {
        //look for substrings that match the Obsidian formatting (as exported from Fibery)
        //Note: Fibery escapes the [ and ] characters with backspaces, so we are looking for
        //  \[\[Title of obsidian document\]\]
        //and the regex needs to escape these characters!
        const replacementText = textToUpdate.replace(/\\\[\\\[[^\\\]]+\\\]\\\]/g, (match, key) => {
            //trim to get the title to be looked up (remove the \[\[ and \]\] bits )
            const title = match.replace(/\\\[\\\[|\\\]\\\]/g, '');
            //if a match exists in the lookupTable, then replace
            return lookupTable[title] !== undefined
                ? lookupTable[title]
                : match;
        });
        //write the resultant text back to the Description field
        await fibery.setDocumentContent(entity['Description']['Secret'], replacementText)
    }
};

I hope the comments explain what it is doing.

I suggest running the script on a couple of sample Obsidian docs to check it works, then doing all your Obsidian docs.

Then you will probably have to experiment a bit to get the right regex for Quip and Notion, but hopefully you can figure these out.
Adding some console logging will no doubt make it easier to debug :wink:
If not, come back to the community and we’ll help :slight_smile:

You might want to read this topic to see why the lookupTable is formatted the way it is:

Entity mentions work similarly to comments, but use [[#^xxxxxx/yyyyyy]] instead.

2 Likes

@Chr1sG, I agree with @Oshyan that having a DB to house/control one’s rich text information is ideal (versus, I assume, just a bunch of uncontrolled docs listed under each space/folder breakdown). My thoughts are, however:

  1. With the full-featured “pages with blocks” coming out at some point in the near-ish future, would going through the trouble of 1st importing all external rich-text/markdown info into Fibery “documents” and/or rich-text description fields (just calling them both “documents” for now), and 2nd designing and implementing a Fibery rich-text workflow around “documents”, really just be a stop gap until “pages” are fully implemented? I’m assuming “pages” are the better option in the long run for all use cases versus “documents” (for notes and really any written work)… and if so, will “pages” be able to eventually have the same “metadata” style attribution in databases?

I’m asking all of this because I am considering doing a similar import exercise as @Oshyan describes here, and I’m just curious if after going through this somewhat intensive process (even with your automation script) we will need to do something similar again (if even possible) to then get all of our Fibery documents + other external rich-text/markdown information over to Fibery’s pages…if the page functionality is expected to be fully implemented in the near future, I was thinking I’d just wait until that happens so I can do a similar import exercise with that just once rather than having to do another big transfer all over again later!

P.S. - sorry in advance if I’m overlooking something so fundamental that my concerns are completely unjustified :wink:

Thanks Chris, you’re always super helpful! I haven’t had a chance to test this yet, and my scripting abilities are… limited. But hopefully I can figure it out as this seems like exactly what I need.

Most critically, what this tells me for now is that I do not need to hold off on importing all the content and do some pre-process before bringing it into Fibery. If that were the case it means I need to figure this all out before I can migrate. Now it seems that if I can figure out how to get my markdown files into a CSV in bulk, I can just import into Fibery and get started using them, and work out the linking as time/energy/brain power allow. Which is a big win for me!

I could be wrong, but my understanding is:

  1. “Pages” are a temporary implementation for testing purposes. Ultimately “Docs” should gain these Block-based capabiltiies.
  2. It’s quite unclear to me if there is any specific plan to add Fields of some kind to Docs/Pages. Personally I would like to have this option, but also have it be very unobtrusive, so it doesn’t get in the way of the core doc/note functionality, which I think is highly needed.

I do think that Docs would benefit greatly from e.g. a References section (optionally, maybe easy show/hide at the bottom of the doc? or in a sidebar?). And I could see value in having other field types as well, e.g. “tags”, or a “type” of document (meeting note, planning doc, brainstorming, etc.). Then should Whiteboards get Fields too? :man_shrugging: Interesting stuff to consider, but again I’m not clear that any of this is in the medium or even long-term plans (some clarification from the team would be appreciated!).

So with that in mind, if “meta data” (Fields) are desirable for your (and/or my) “docs”, then doing it in a Database with Entities seems to make the most sense.

1 Like

@ESGdataBoi I think you can assume that the long term future will see Pages superseding the existing Documents. The long-term may also see Pages getting more capabilities that come with entities e.g. extra info fields (like creation date, created by etc.) and probably something like back-refs etc.
There may also be the possibility that Pages will support nesting of sorts, so in the long run, the differences between a db of ‘docs’ and a set of pages will become smaller.
With that in mind, I wouldn’t say it is guaranteed that importing into entities will always be the recommended solution, but conversely, I would imagine that automation/scripting will always be powerful enough that you wouldn’t be tied into whatever choice you make now, and that a within-Fibery migration would be relatively simple.

1 Like

It depends on how you look at it. Eventually, block-based rich text documents (lower-case d) are the future, but whether they’re called Pages or Documents (or something else!) is not set in stone. And certainly we wouldn’t hang users of the existing Documents out to dry :slight_smile:

Absolutely spot on. Docs/pages have intrinsic value, and we would avoid overloading them with unnecessary/mandatory features.

See my answer to ESGdataboi above.
But the implementation is highly dependent on the blocks work, where references/links would be superseded by ‘transclusions’ (in whatever suitably vague way you choose to interpret that!)

Right now, I wholly agree.

2 Likes

Thanks to you both @Oshyan and @Chr1sG for being the community’s brain trust on this and many other topics! I’m thinking now that, while having the text blocks in pages right now is pretty great, I honestly don’t have a need for embedding the tables in there quite yet as is currently available (looking forward to the other views though!), so taking a chance on beta “pages” isn’t worth it…plus, docs having the “header linking” abilities (while pages do not) makes docs the best medium-term choice for people like me I think.

@Oshyan, I’m very curious to see how your import experiment goes, please let us know.

2 Likes

I absolutely will! I hope to document things around all this over time with the intention of helping others who might be interested in this. Maybe even a video or series of. I have big plans! :smile:

2 Likes

is there a workflow on how to best import Obsidian Markdown files with many internal links and even more internal tasks?

it would be amazing if a sync would be possible somehow even … so that obsidian can be used as offline aggregation tool and files are then upgraded to “online and team” on export/sync

4 Likes

It sounds like you’d want: