MGDC for SharePoint FAQ: How do I join File Actions with Files?
1. File Actions
If you use the SharePoint File Actions dataset in Microsoft Graph Data Connect, you might want to join it with the SharePoint Files dataset to get the details for each file. This post is about how you can do it, including the most common issues.
2. What is in File Actions
The SharePoint File Actions dataset includes details about every time a file was accessed, deleted, downloaded, modified, moved, renamed, or uploaded. The information about each action includes the type of action, when it happened, what file was acted upon, what application was used, and the user who performed the action (the actor).
For more information about the dataset, see the full schema documentation at data-connect-dataset-sharepointfileactions.md.
3. Joining with other datasets
A common use of this dataset is to join with the SharePoint Sites or the SharePoint Files to understand how much activity is going on for a specific Site or File. You could use this to see which Site or File was the most accessed on a given day. If you accumulate the actions for a longer period, you could find which sites had no activities in the last 90 days.
To find the details for site associated with a specific action, group File Actions by site ID then join with Sites. Here’s a SQL example to get a list of most popular sites during the period covered by File Actions:
WITH ActionsBySite AS (
SELECT
SiteId,
MIN(ActionDate) AS EarliestAction,
MAX(ActionDate) AS LatestAction,
COUNT(*) AS ActionCount
FROM FileActions
GROUP BY SiteId
)
SELECT
S.Id AS SiteId,
S.[RootWeb.Title] AS SiteName,
S.CreatedTime AS SiteCreated,
S.[RootWeb.LastItemModifiedDate] AS SiteLastModified,
S.[RootWeb.TemplateId] AS SiteTemplateId,
A.EarliestAction,
A.LatestAction,
A.ActionCount
FROM ActionsBySite AS A
JOIN Sites S
ON A.SiteId = S.Id
ORDER BY ActionCount DESC
LIMIT 10;
To find details of the file associated with a specific action, group File Actions by File then join with Files. Here’s a similar example to get a list of most popular files during the period covered by File Actions:
WITH ActionsByFile AS (
SELECT
SiteId,
WebId,
ListId,
ListItemId,
MIN(ActionDate) AS EarliestAction,
MAX(ActionDate) AS LatestAction,
COUNT(*) AS ActionCount
FROM FileActions
GROUP BY SiteId, WebId, ListId, ListItemId
)
SELECT
F.DirName,
F.FileName,
F.AuthorEmail,
F.TimeCreated AS FileCreated,
F.TimeLastModified AS FileLastModified,
A.EarliestAction,
A.LatestAction,
A.ActionCount
FROM ActionsByFile AS A
JOIN Files F
ON A.SiteId = F.SiteId
AND A.WebId = F.WebId
AND A.ListId = F.ListId
AND A.ListItemId = F.ItemId
ORDER BY ActionCount DESC
LIMIT 10;
NOTE: Please check with your compliance team if they approve of keeping the File Actions data for longer periods of time. There might be restrictions on how long you should keep private information.
4. Potential issues
With huge datasets like Files and File Actions, there are scenarios where the JOIN is not a perfect match. Here are a few possible issues when joining File Actions with Sites or Files.
4a. Different regions
IMPORTANT: This section applies only to tenants that use the Microsoft 365 Multi-Geo capabilities, as described at Microsoft 365 Multi-Geo and Multi-Geo Capabilities in OneDrive and SharePoint.
The JOIN operation will exclude a specific action if the File Action is in a different region than the File. When a tenant has multiple regions, the actions are recorded in the region of the actor. An actor in that region might be accessing files in other regions, which you might not have in that environment. To overcome that, you must run collection for SharePoint Files from all the regions and combine (union) them into a single dataset.
NOTE: These datasets are partitioned by region for compliance reasons, so please make sure to run this scenario by your compliance team before combining data from multiple regions.
4b. Matching dates
There are certain situations where the Files will be missing when you try the join because your File Action snapshot is more recent than your File snapshot date. For instance, if you have a File Actions snapshot for July 15th joined with a Files snapshot for July 1st. Essentially, the JOIN will exclude the File Action because the File data did not make it yet.
Also, the SharePoint Files dataset might take about a week to fully update. You might exclude an action for a file that was recently created and did not make its way into the Files dataset yet. So even if both the File Actions snapshot and the Files snapshot are from July 1st, there is the possibility that some of the recent files have not been captured yet.
There is also the chance of some File Actions arriving late (typically less than 0.01%). For instance, a small percentage of actions for July 1st might show in the snapshot for July 2nd.
So, in general, if you want to get the most complete dataset, it’s a good idea to wait until July 8th to process and report on actions for July 1st. This will guarantee that you are using a fresher Files dataset and allowing for late-arriving File Actions data.
4c. Not a Document Library
Another reason for not finding a match between the File Actions dataset and the Files dataset is when the file action is for a file not in a document library. The SharePoint Files dataset only includes files in a document library, ignoring other types of lists. This is by design to keep the Files dataset to a more reasonable size.
A common scenario here are actions related to SharePoint pages (files with the “aspx” extension), typically in Communication Sites. While you will not see these “aspx” pages in the Files dataset, you could still count actions per site.
Covering this gap would require adding specific datasets for SharePoint Pages and/or SharePoint Lists, providing additional details. This is something the team is considering for the future, but there is no estimated date for this.
4d. List Item Id missing
The File Actions dataset shows the telemetry as reported by applications. For some of these applications, it is possible that the ListItemId is not reported or is reported as “null”. In that case, the join will exclude those actions.
Again, you could summarize these actions at a site level even if the ListItemId is missing. You can also look at a summary by User Agent to identify specific applications that are not providing data.
5. Conclusion
I hope you learned more about the SharePoint File Actions dataset in Microsoft Graph Data Connect. You can read more about the Microsoft Graph Data Connect for SharePoint at https://aka.ms/SharePointData. There you will find many details, including a list of datasets available and frequently asked questions.
Microsoft Tech Community – Latest Blogs –Read More