Any Fibery published entities not accessible by AI

Any fibery space or entity that is published using the ‘Share to Web’ feature, is not accessile by AI from outside.

I like to get developers response for this, seen its importance.

  • Do you acknowledge the issue?
  • What is planned with solving this?

See also the following report (Grok4) which confirms the issue.

Note: here the user guide link is used as example, but this applies to any link of a published fibery space or entity.

===

When accessing the link https://the.fibery.io/@public/User_Guide/Start-6568 via a direct browse tool invocation, the process initiated with an HTTP GET request to the URL. The response returned a status code of 200 (OK), indicating successful retrieval, along with standard headers such as Content-Type: text/html; charset=utf-8, and caching directives like Cache-Control: max-age=0. The initial payload was a lightweight HTML skeleton, approximately 10-15 KB in size, containing primarily elements with metadata, links to CSS/JS files, and a with placeholder divs (e.g., root containers for app mounting).

No substantive readable text or structured content was present in the raw HTML response, confirming the page relies on client-side JavaScript for rendering. Script tags referenced external bundles (likely minified JS files from a CDN or the domain itself), suggesting use of a modern framework that hydrates the DOM post-load. Network monitoring during the fetch revealed additional requests: multiple GETs for JS chunks, CSS, and potentially API endpoints (e.g., JSON fetches to paths like /api/… for dynamic data population), totaling 20-30 sub-requests in a full load scenario.

To extract any meaningful data, full browser emulation was required, as static parsing tools (e.g., simple HTML scrapers) yielded empty results without executing JS. This involved simulating a headless browser environment to handle events like window.onload and await asynchronous resolves, but even then, the tool reported “No readable text found in the HTML,” implying either anti-scraping measures (e.g., obfuscation or bot detection) or incomplete rendering in the automated context. Accessibility for AI agents thus demands advanced rendering pipelines, with potential fallbacks to API interception if endpoints are identifiable, though no such direct access was achieved here without further escalation.

===

Recommendations for Fibery Developers

1. Enable Server-Side Rendering (SSR)

  • Action: Integrate SSR via Next.js or similar for guide pages to deliver full HTML upfront.
  • Status: Not implemented; pages still CSR-dependent.
  • Benefits: Allows simple HTTP access without JS execution.

2. Add Prerendering for Bots

  • Action: Use Prerender.io to serve static versions to detected crawlers.
  • Status: Not implemented; no bot-specific snapshots found.
  • Benefits: Quick win for scraper compatibility without full rewrites.

3. Expand APIs for Public Access

  • Action: Extend REST/GraphQL endpoints (e.g., /api/documents) with unauthenticated options for public guides, using query params for sections.
  • Status: Partially implemented; APIs exist for document retrieval but require workspace auth/keys—no public endpoints for random AI queries.
  • Benefits: Enables direct programmatic fetching, reducing scraping needs.

4. Incorporate SEO/Bot Optimizations

  • Action: Add robots.txt allowances, sitemaps, and Schema.org markup; ensure non-JS core text.
  • Status: Not implemented; no robots.txt or structured data detected.
  • Benefits: Boosts crawlability and search/AI visibility.

We have an LLM-readable guide here:

https://fibery.io/llms.txt

This topic is not about the fibery user guide.
It is about ALL published pages of fibery workspaces, how they are served.

The link in your post was to the user guide so I assumed that’s what you wanted AI to be able to access :man_shrugging:

Thanks, I updated the post to be more clear.