MGDC for SharePoint: New, Updated and Upcoming Datasets
In this post, I’ll cover some exciting news on Microsoft Graph Data Connect for SharePoint as of May 2024. This feature delivers rich data assets to SharePoint and OneDrive tenants. If you’re new to MGDC for SharePoint, start by reviewing this post: https://aka.ms/SharePointData.
TL;DR
We have been busy updating our existing SharePoint datasets in MGDC and adding new ones. You can see the full list at https://aka.ms/SharePointDatasets.
We have updated our 3 publicly available datasets, just published 1 new dataset and will deliver 3 new datasets in the next few months. Here are some details…
New and Upcoming Datasets
The new SharePoint File Actions dataset was released in May 2024. This dataset delivers one object for each file accessed, deleted, downloaded, modified, moved, renamed, or uploaded. This helps you understand how documents are being used in detail. This dataset is now publicly available, billed through Azure at the regular MGDC rate.
The new OneDrive Sync Health datasets include information on devices running OneDrive for Business. This includes a dataset with one object for every Sync-enabled device in the tenant and a dataset with details on errors faced by these devices. They were announced by the Sync team at the Microsoft 365 Conference. Sync Health and Sync Errors are in private preview. They will be publicly available by the end of June. This was a joint project between SharePoint, OneDrive Sync and MGDC.
The SharePoint Files dataset includes information about files in SharePoint and OneDrive. This delivers one object for every file in the tenant stored in a SharePoint document library, including OneDrives. The Files dataset is in private preview, with Public ETA expected in a few months.
Updated datasets
Archimedes also added columns to the existing Sites, Groups and Permissions datasets.
For SharePoint Sites, our most popular dataset, we added several new properties. Here’s the list:
ArchiveState: The archive state of the site: None, Archiving, Archived, or Reactivating
RootWeb.Configuration: Root web template configuration id
RecycleBinItemCount: Number of items in the recycle bin
RecycleBinItemSize: Size of the items in the recycle bin
SecondStageRecycleBinStorageUsage: Size of the items in the second stage recycle bin
IsCommunicationSite: Indicates that the site is a communication site
IsOneDrive: Indicates that the site is a OneDrive
IsExternalSharingEnabled: Indicates if the site is configured to enable external sharing
SiteConnectedToPrivateGroup: Indicates if a site is connected to Private Group
Privacy: Privacy of the site: Private or Public. Applies only to team sites
Owner.UPN: User Principal Name for the owner of the site
SecondaryContact.UPN: User Principal Name for the secondary contact for the site
LastUserAccessDate: Last access by a real user for the site (in UTC)
That last column is very useful to identify sites that have been inactive for a long time.
For Groups, we introduced a new TypeV2 property for owners and members, to specify the type of user. The old Type property can contain User, SecurityGroup or SharePointGroup, while the new TypeV2 can be InternalUser, ExternalUser, B2BUser, SecurityGroup and SharePointGroup.
For the SharePoint Permissions dataset, we added the following columns:
SharedWith.TypeV2: Expands User types to InternalUser, ExternalUser and B2Buser, as described in the Groups section above
SharedWith.UPN: User Principal Name of sharing recipient
SharedWith.AadObjectId: AAD Object Id of sharing recipient. Blank if this is not an AAD object.
SharedWith.UserCount: Unique user count for this sharing recipient. For groups, this is the number of users in the group, including nested groups. For users, this is always 1. It will be blank if the group is empty or if the count is unavailable
TotalUserCount: Unique user count for this entire permission. This will be blank if the count is zero or unavailable
ShareCreatedBy.UPN: User Principal Name of user who created the sharing link
ShareLastModifiedBy.UPN: User Principal Name of user who modified the sharing link
The two new user count columns are a major improvement here. They do group expansion, so you can have the total number of users impacted by the permissions, including nested groups, without having to pull the SharePoint Groups and AAD Groups datasets. You can now detect oversharing using only the Permissions dataset.
General improvements
MGDC for SharePoint also improved the overall infrastructure for analytics, including:
Filtering datasets: Downloading only rows that match specific site ids or template id. See details at How can I filter rows on a dataset?
Dataset sampling: Get a small sample of the dataset and an full object count without pulling the entire dataset. See details at How can I sample or estimate the number objects in a dataset?
Improved messages: Better error messages, including when dates are out of range, or when a region has no SharePoint data.
Guidance: Improved documentation, including updated step-by-step guides and schema docs. We also have a new Official MGDC for SharePoint blog in Tech Community with information like useful links and frequently asked questions. Since you’re reading this on the blog, I imagine you already knew about that one :-).
Conclusion
These are the main improvements to Microsoft Graph Data Connect for SharePoint in the last few months. I hope these changes will improve the feature for your analytics scenarios. We are busy cooking up more improvements and will share them here in the blog as they become available.
Microsoft Tech Community – Latest Blogs –Read More