Parsing SharePoint pages with SPFx is straightforward and opens up a variety of exciting use cases. In this post, I’ll show how you can access SharePoint page content and parse it to extract various information.
Table of Contents
- Introduction
- SharePoint Online Page Structure - Generalized
- SPFx and the SharePoint Canvas
- Microsoft Graph API for SharePoint Pages
- Conclusion
Introduction
SharePoint Online is a modern web application. It is a React app, often called a Single Page Application (SPA). Before parsing SharePoint Online page content, it’s important to understand how SharePoint Online works.
The SharePoint web application splits content into two main categories:
Static content
Static content includes text from Web Parts (including custom SPFx Web Parts), properties panel settings, and content marked as indexable. This content is saved in the Canvas Content of SharePoint Online, accessible via the SharePoint REST API. This post will focus on Canvas Content.
Dynamic content
Dynamic content relies on two triggers:
- It is based on execution logic, such as retrieving data from a list based on certain criteria.
- It only executes as soon as the content becomes visible to the user. SharePoint takes care of all of this for us, enabling fast page loading. This type of content is not part of Canvas Content.
SharePoint Online Page Structure - Generalized
Let’s take a look at how the Canvas Control in SharePoint is structured.
Imagine a SharePoint page with the following content:
- A banner containing the title
- Various sections also called horizontal sections
- Each section is divided into one or more vertical sections
- Each vertical section contains either SharePoint’s standard text Web Parts or custom Web Parts
Horizontal-/ Vertical Section-Container
<div data-sp-canvascontrol="" data-sp-canvasdataversion="1.0">
- Holds each vertical/horizontal section.
- Attributes:
- data-sp-canvascontrol: Always seems to be an empty string.
- data-sp-canvasdataversion: Version number, generally “1.0”.
- data-sp-controldata: JSON data that defines positioning and layout.
layoutIndex
: Layout index, not relevant for this context.zoneIndex
: Index of the zone within the layout, representing a horizontal section.zoneId
: Unique GUID for each horizontal section.sectionIndex
: Defines which vertical section the content belongs to; if a section has a two-column layout, there will be twosectionIndex
entries.sectionFactor
: Width factor. SharePoint divides layouts into 12 parts, so if you see 12, it represents a full-width layout, while a 4 would be a one-third layout.controlIndex
: Position index within the vertical section.
Web Part Controls
<div data-sp-webpart="" data-sp-webpartdataversion="X.X" data-sp-webpartdata="JSON">
- Attributes:
- data-sp-webpart: Empty string, indicates a web part.
- data-sp-webpartdataversion: Web part data version.
- data-sp-webpartdata (JSON): Encodes web part properties.
- Common JSON fields:
id
: Unique web part ID.title
: Display name of the web part.description
: Description text.
- Common JSON fields:
- Attributes:
Rich Text Controls (OOTB Text Web Part control)
<div data-sp-rte="">
- Attributes:
- data-sp-rte: Flags the container as a Rich Text Editor (RTE) control.
- Content:
- Contains the text content.
- Attributes:
Important notes on the SharePoint Canvas:
- There are only two hierarchy levels: the first level defines the
Vertical / Horizontal Section-Container
, and the second contains the Web Parts included. - Because it’s a flat structure, SharePoint uses the properties
zoneIndex
,sectionIndex
, andcontrolIndex
to build the visual layout during rendering.
Now that we have a basic understanding of the SharePoint Canvas, let’s see how to access it and what we can do with it.
SPFx and the SharePoint Canvas
Accessing the SharePoint Canvas
I enjoy using PnP JS
, so the following example is based on PnP JS
:
const page = await this.sp.web.lists.getById('listid_of_your_sitepages_library').items.getById('listitemid_of_your_page')
.select('CanvasContent1', 'FileRef')();
const content: string = page.CanvasContent1;
That’s it; now you have all the content in your content
variable.
Preparing the SharePoint Canvas for Processing
The best way to read the SharePoint Canvas is to use the built-in JavaScript Web API class DOMParser , which is widely supported in browsers.
const parser = new DOMParser();
const doc = parser.parseFromString(content, 'text/html');
Now, we’re ready to start parsing.
Extracting Information from the SharePoint Page
Use Case 1: Accessing all Headings (h2/h3/h4) in a SharePoint Page
Suppose we want to access and list all headings on a SharePoint page. The code is quite simple at this point:
const headings = Array.from(doc.querySelectorAll('h2, h3, h4'));
We have full support in our browser for the DOM API
, so we can leverage standardized and optimized methods in our code.
Reminder: If your hX
tags are dynamic content (for example, generated by a custom SPFx web part), they won’t appear in the Canvas content.
Use Case 2: Checking if a Web Part Exists
Suppose we want to check if a specific Web Part is available on the page:
let myWebPartFound = false;
const webPartElement = doc.querySelector('[data-sp-webpartdata*="mywebpartguid"]');
if (webPartElement) {
myWebPartFound = true;
}
We can go further and also access all the properties available for the Web Part.
For a comprehensive implementation on how to read the SharePoint Canvas and allow SPFx solutions to share information, check out the In Page Navigation
solutions from PuntoBello
:
Microsoft Graph API for SharePoint Pages
Microsoft began rolling out the Microsoft Graph API for SharePoint Pages in April 2024, enabling programmatic page manipulation.
Conclusion
The SharePoint Canvas is a valuable source of information that can be easily leveraged to build solutions. Whenever possible, rely on standard Browser APIs to process it.