Rolling Up Opportunity Contact Roles with Change Data Capture and Async Apex Triggers

Salesforce admins, and perhaps especially nonprofit admins, have been wishing for a long time for the ability to build more functionality around Opportunity Contact Roles - like roll-up summary fields, validation rules, and triggers.

The new Change Data Capture offered intriguing possibilities because, unlike regular Apex triggers, the feature supports change notifications for OpportunityContactRole. With the new ability, in Summer ‘19, to subscribe to Change Data Capture events in Async Apex Triggers, we now have a viable route to build completely on-platform features using these tools - and the first demo I wanted to build was a solution to roll up Opportunity Contact Roles.

If you want to drive straight into the code, find it here on GitHub.

The Simple Route and Why It Doesn’t Work

My first strategy was simply to call into Declarative Lookup Rollup Summaries from an Async Apex trigger. That approach runs afoul, though, of one of the key differences between Async Triggers and standard Apex Triggers:

Apex triggers operate on sObjects. Async Apex triggers operate on change events.

Change events aren’t sObjects, and don’t contain a complete snapshot of the sObject record at a point in time. Rather, since the underlying Change Data Capture feature targets synchronization of changes to some persistent store, each event only contains enough information to apply a specific change to a stored record.

The section Apex Change Event Messages in the Change Data Capture Developer Guide explains what data is actually available in each message:

Create
For a new record, the event message contains all fields, whether populated or empty. […]
Update
For an updated record, the event message contains field values only for changed fields. Unchanged fields are present and empty (null), even if they contain a value in the record. […]
Delete
For a deleted record, all record fields in the event message are empty (null).
Undelete
For an undeleted record, the event message contains all fields from the original record, including empty (null) fields and system fields.

For roll-up applications, the key is the update and delete events. Note that for update, we don’t get old data, only new, and at the time we receive the event the records in the database have already been updated (we can’t query for old values). Upon delete, we don’t get the field data for the deleted sObjects at all! It’s assumed that we, as consumers of this event stream, have some other persistent data store against which we’re applying the ordered stream of change events.

To build a roll-up, we need to know the parent objects for deleted children, the parent objects for changed records, and the old parents for children that are reparented.

So Change Data Capture + Async Apex Triggers is a dead end for building rollups? Not at all. There’s another strategy that allows us to take advantage of Change Data Capture’s inherent facility for synchronizing data stores: we’ll use a shadow table.

Rolling Up a Shadow Table

The SFDX project for everything we’re going to demonstrate here is available on GitHub.

We can’t roll up Opportunity Contact Roles to the Opportunity, but we easily can roll up some arbitrary Custom Object with a master-detail relationship. A shadow table, in which we mirror the Opportunity Contact Role object with a Custom Object, fits nicely as a solution: it both gives us the roll up we need without code, and provides us with an on-platform persistent data store against which Change Data Capture events can be applied as we create, update, and delete Opportunity Contact Role records.

Here’s the schema we’ll use: a custom object plus a native Rollup Summary Field on Opportunity.

Shadow Object

Rollup Field

Note the External Id field we’ve created to hold the Id of the corresponding Opportunity Contact Role. That’s the linchpin of our synchronization effort, and we’ll use it below to build an efficient sync trigger.

Synchronizing the Data

To synchronize data from OpportunityContactRole into our shadow table, we’ll use an Async Apex trigger processing the OpportunityContactRoleChangeEvent entity. First, we’ll select the needed object in Setup under Change Data Capture:

Change Data Capture Setup

Then, we build a trigger. The code is here.

Our trigger’s operation is different from what we’d see in typical, synchronous Apex. We iterate in order over the change events we receive and use them to build up two data sets. The first is a Map<Id, Shadow_Opportunity_Contact_Role__c>, which we use to store both new and updated shadow records derived from the change events. The keys, worth noting, are OpportunityContactRole Ids, ensuring that we create or update exactly one record for each OpportunityContactRole whose state incorporates all of the changes in our inbound event stream.

We also store a Set<Id> of the Ids of deleted OpportunityContactRole records. As we find delete events in our stream, we remove corresponding entries from our create and update Map, and add those Ids to the Set.

When we finish iterating through events - remember that this is an ordered time stream - these two data structures contain the union of all of the changes we need to apply to our shadow table. At that point, it’s two simple DML statements to persist the changes:

upsert createUpdateMap.values() Opportunity_Contact_Role_Id__c;
delete [
    SELECT Id
    FROM Shadow_Opportunity_Contact_Role__c
    WHERE Opportunity_Contact_Role_Id__c IN :deleteIds
];

The index on Opportunity_Contact_Role_Id__c should keep these operations performant, and once they complete, the system updates our native Rollup Summary Field on the parent Opportunities.

Testing

There’s just a couple of extra wrinkles to testing Async Apex Triggers. We have a new method in the system Test class to enable the Change Data Capture feature, and it overrides system CDC settings to ensure that the code under test executes regardless of org settings:

Test.enableChangeDataCapture();

Then, we require that CDC events are delivered and processed synchronously, using the tried-and-true Test.startTest() and Test.stopTest() calls, or by calling Test.getEventBus().deliver().

Intermediate testing results, while working towards passage and full code coverage, can be tough to interpret: most logs are found under the Automated Process user rather than the context user, requiring the use of trace flags, but some (possibly those from @testSetup methods) do appear for the context user. Code coverage maps can also produce misleading results until full passage is achieved.

As part of the demo, I built a test class that achieves full coverage on the Async Apex Trigger. It’s also part of the GitHub project. (While the tests have good logic path coverage, they could stand to exercise bulk use cases better!)

Results

The CDC + Async Apex Trigger solution doesn’t add everything we might want with Opportunity Contact Roles. We still cannot write Validation Rules against the Opportunity that take Roles into account, because the rollup operation is run asynchronously, after the original transaction commits. It’s also a near-real time, rather than real time, solution, so a brief delay may be perceptible before the rollup field updates. And lastly, because Async Apex Triggers run as the Automated Process user, Last Modified By fields won’t show actual users’ names once the rollup operation completes.

But what it does, it does well: we get all the functionality and performance of native roll-up summary fields against an object that’s never supported that feature, with a minimal investment in code. Users can see, and report on, rolled-up totals of Opportunity Contact Roles across their Opportunity pipeline.

And it’s a really neat way to apply some of the latest Salesforce technologies to solve real-world problems.

Real-World Unit Testing: Get to 100% Coverage, the Right Way

My presentation from PhillyForce ‘19, “Real-World Unit Testing: Get to 100% Coverage, the Right Way”, is now available on YouTube. It was a great experience to return to PhillyForce as a speaker and organizing committee member to talk about one of the subjects I’m most passionate about - automated testing - and make the case that writing good unit tests is actually a moral, not just a technical, imperative.

The talk covers four big headings:

  • Thinking about tests as promises we’re making, and how that changes the focus of a testing program.
  • Writing good unit tests in Apex.
  • Writing testable code, thinking about tests as API consumers, and using tests to guide refactoring.
  • Advanced testing strategies with mocking and dependency injection for working with complex asynchronous Apex.

Three Routes to Time-Based Rollup Summary Fields

Rollup Summary Fields are great. Users love them because they offer at-a-glance insight at the top level of the data hierarchy, without needing to run a report or drill down in the user interface to child records. Unfortunately, native Rollup Summary Fields come with a variety of limitations on what data you can roll up, where, and applying which criteria. In particular, time-based rollup summary fields are a common need that’s tricky to meet with this native functionality.

These fields come from business requirements like this:

As a Sales representative, I need to see on the Account a running total of Opportunities on a year-to-date and month-to-date basis.

Naïve Solutions and Why They Don’t Work

Most naïve solutions to this class of requirements fail because they’re based on formula fields or date literals. One might try creating a formula like this, for example, for a Checkbox field denoting whether or not to include a specific record:

MONTH(CloseDate) = MONTH(TODAY()) && YEAR(CloseDate) = YEAR(TODAY())

Immediately, though, we find that this new field isn’t available for selection in our Rollup Summary Field criteria screen.

Likewise, we can’t save a Rollup Summary Field where we write criteria like

CloseDate EQUALS THIS_MONTH

using a date literal like we might in a report.

The same underlying limitation gives rise to both of these restrictions: native Rollup Summary Fields require a trigger event on the rolled-up object in order to update the rolled-up value. Rollup Summary Fields are evaluated and updated as part of the Trigger Order of Execution (steps 16 and 17).

Formula fields, though, don’t have any underlying storage, and their values are not persisted to the database. Rather, their values are generated at the time the record is viewed. For this reason, there’s no event when a formula field’s value “changes”. In fact, some formula fields can be nondeterministic and incorporate date or time fields, such that their value is essentially always in flux.

λέγει που Ἡράκλειτος ὅτι ‘πάντα χωρεῖ καὶ οὐδὲν μένει’
Herakleitos says that ‘everything changes, and nothing stands still’
— Plato, Cratylos 402a

Likewise, there’s no trigger event, on any record when the current day changes from one to the next. That’s why we cannot embed date literals in our Rollup Summary Field criteria - when today ticks over to tomorrow, not only is there no trigger event, but a recalculate job could be very computationally expensive.

The approaches we will take to solve this requirement ultimately take both tacks: creating a trigger event based on the date, and arranging for broad-based recalculation on a schedule.

Approach 1: Native Rollup Summary Fields with Time-Based Workflow or Process

Our first approach maximizes use of out-of-the-box functionality, requiring no Apex and no additional packages. However, it can scale poorly and is best suited for smaller Salesforce orgs.

We can’t create a native Rollup Summary Field, as we’ve seen, based on a formula field. But we can rollup based upon a Checkbox field, and we can use Time-Based Workflow and Processes to update that Checkbox on a schedule, using formula fields to get the dates right.

Here’s how it works:

  1. We create two Date formula fields on the child object (here, Opportunity). One defines the date when the record enters rollup scope and the other the date when it exits rollup scope.
    First Day in Rollup Scope Last Day in Rollup Scope
  2. We create a Checkbox field on the child object. This is what our Time-Based Workflow Actions will set and unset, giving us a triggerable event for the Rollup Summary Field.
  3. We create a Rollup Summary Field on the parent object (here, Account). We use the criterion that our Checkbox field is set to True.
    Rollup Summary Field

  4. We create a Workflow Rule with two Time-Based Actions attached to it.
    Workflow Rule

This approach works well for most sorts of time-based rollup requirements. Because it uses formula fields to define when a record enters and exits the rolled-up period, it’s not limited to “this month”, “this year”, and other simple definitions. The time period doesn’t even need to be the same for each record!

Costs and Limitations

This solution, as noted above, is best suited for smaller Salesforce orgs with moderate data volume and small numbers of time-based rollups. Each time-based rollup will consume three custom fields on the child and one Rollup Summary Field on the parent (the limit for which is 25 per object), as well as a Process or a Workflow Rule with two attached Actions.

Because of the mechanics of time-based workflow actions, updates made won’t roll up instantaneously, like with a trigger. Rather, they’ll be on a small delay as the workflow actions are enqueued, batched, and then processed:

Time-dependent actions aren’t executed independently. They’re grouped into a single batch that starts executing within one hour after the first action enters the batch.

More important, though, is the platform limitation on throughput of time-based workflow actions. From the same document:

Salesforce limits the number of time triggers an organization can execute per hour. If an organization exceeds the limits for its Edition, Salesforce defers the execution of the additional time triggers to the next hour. For example, if an Unlimited Edition organization has 1,200 time triggers scheduled to execute between 4:00 PM and 5:00 PM, Salesforce processes 1,000 time triggers between 4:00 PM and 5:00 PM and the remaining 200 time triggers between 5:00 PM and 6:00 PM.

Data volume is therefore the death of this solution. Imagine, for example, that your org is rolling up Activities on a timed basis - say “Activities Logged This Month”. Your email-marketing solution logs Tasks for various events, like emails being sent, opened, or links clicked. Your marketing campaigns do well, and around 30,000 emails or related events are logged each month.

That immediately poses us a problem. When the clock ticks over to the first of the month, for example, we suddenly have 30,000 emails that just went out of scope for our “Activities Logged This Month” rollup. Salesforce now begins processing the 30,000 enqueued workflow actions that have accumulated in the queue for this date, but it can only process 1,000 per hour. Worse, the limit is 1,000 per hour across the whole org - not per workflow. So we now have a backlog 30 hours deep that impacts both our Rollup Summary Field and any other functionality based on timed actions across our org.

If you have a relatively small number of rollups to implement and relatively little data volume, this solution is both effective and expedient. Other organizations should read on for more scalable solutions.

Approach 2: DLRS in Realtime + Scheduled Mode

Declarative Lookup Rollup Summaries can help address many limitations of native Rollup Summary Fields. DLRS doesn’t have inherent support for time-based rollups, but it does have features we can combine to achieve that effect.

Instead of using a native Rollup Summary Field, we start by defining a DLRS Rollup Summary. We can use any SOQL criteria to limit our rolled-up objects, including date literals. As a result, we don’t need formula fields on the child object - and we also win freedom from some of the other limitations of native Rollup Summary Fields.

Here’s how we might configure our putative Opportunity rollup in DLRS:

DLRS

To start with, this Rollup Summary will only work partially. It’ll yield correct results if we run a full calculate job by clicking Calculate. If we configure it to run in Realtime mode and deploy DLRS’s triggers, we’ll see our rollup update as we add and delete Opportunities.

What won’t work, though, is the shift from one time period to the next. On the first of the month, all of our Opportunities from last month are still rolled up - there was no trigger event that would cause them to be reevaluated against our criteria.

With DLRS, rather than using time-based workflow actions to create a trigger event by updating a field, we recalculate the rollup value across every parent record when each time period rolls over. Here, we’ve scheduled a full recalculation of the rollup for the first day of each month.

DLRS Scheduler

Because DLRS also deploys triggers dynamically to react to record changes in real time, we get in some sense the best of both worlds: instant updates when we add and change records on a day-to-day basis, with recalculation taking place at time-period boundaries.

Costs and Limitations

Each DLRS rollup that’s run in Scheduled mode consumes a scheduled Apex job, the limit for which is 100 across the org - and that limit encompasses all Scheduled Apex, not just DLRS jobs.

Running rollups in real time requires the deployment of dynamically-generated triggers for the parent and child objects. This may challenge deployment processes or even cause issues with existing Apex development in rare cases.

Processing rollup recalculations can be expensive, long-running operations. A full rollup recalculation must touch each and every parent object in your org.

In general, this solution offers the greatest breadth and flexibility, but also demands the greatest awareness of your org’s unique challenges and performance characteristics.

Approach 3: Native Rollup Summary Fields with Scheduled Batch Apex

We can scale up Approach 1 by replacing the limited Time-Based Workflow Actions with a scheduled Batch Apex class. We retain the native Rollup Summary Field based on a Checkbox on the child record, but we don’t require the formula fields on the child.

The Batch Apex class we use is very simple: all it does it query for records whose Checkbox value does not match the position of the relevant Date field vis-a-vis the rollup’s defined time interval, at the time the scheduled class is executed.

Continuing the example developed above, where we’re rolling up Opportunities for the current month only, our batch class’s start() method would run a query like this:

SELECT Id, This_Month_Batch__c
FROM Opportunity
WHERE (CloseDate = THIS_MONTH AND This_Month_Batch__c = false) 
      OR (CloseDate != THIS_MONTH AND This_Month_Batch__c = true)

Here we just locate those Opportunities that are not marked as in-scope but should be (where CloseDate = THIS_MONTH - note that we have the freedom to use date literals here), or which are marked as in scope but should not be.

Then, the work of our execute() method is extremely simple: all it does is reverse the value of the Checkbox field This_Month_Batch__c on each record:

public void execute(Database.BatchableContext bc, List<Opportunity> scope) {
    for (Opportunity o : scope) {
        o.This_Month_Batch__c = !o.This_Month_Batch__c;
    }
    update scope;
}

We’d schedule this batch class to run every night after midnight. When executed, it’ll update any records moving into or out of rollup scope that day, allowing the native Rollup Summary Field machinery to do the work of recalculating the Account-level totals.

Costs and Limitations

While it’s not Apex-intensive, this is the only solution of the three that is not declarative in nature, and all of the attendant costs of code-based solutions are present.

Organizations with high data volume or data skew may need to experiment carefully to ensure queries are properly optimized.

As with DLRS, each rollup using this strategy consumes one scheduled Apex job. Unlike DLRS, we need to run our scheduled class nightly, because we’re not combining it with a trigger for (partial) real-time operation. We could add more Apex to do so if desired.

Conclusions

Time-based rollup summary fields are something many organizations need, and there’s a lot of ways to get there. Beyond the three approaches discussed here, one could explore other, more customized and site-specific options — like a full-scale Apex implementation using triggers and Scheduled Apex, or applying a data warehouse or middleware solution to handle rollups and analytics. Each org’s ideal approach will depend on the resources available, preference for code, declarative, or hybrid solutions, and important considerations around data volume and performance.

Amaxa: A Multi-Object Data Loader for Salesforce

I’ve just released Amaxa, an open-source project I’ve been working on for several months. Amaxa is a multi-object ETL tool/data loader for Salesforce. It’s designed to extract and load whole networks of records, like a selected set of Accounts with all of their Contacts, Opportunities, Contact Roles, and Campaigns, in a single repeatable operation while preserving the relationships between those records.

Core use cases for Amaxa include sandbox seeding, data migration, and retrieving connected data sets. The current release of Amaxa is version v0.9.2, in beta. Feedback and bug reports, via GitLab Issues, are welcome. There’s a lot more to come.

Amaxa is built in Python with Salesforce DX, simple_salesforce, and salesforce_bulk. Instructions and examples are included in the GitLab repository.

Locating Salesforce Compound and Component Fields in Apex and Python

One of the odder corners of the Salesforce data model is the compound fields. Coming in three main varieties (Name fields, Address fields, and Geolocation fields), these fields are accessible both under their own API names and in the forms of their component fields, which have their own API names. The compound field itself is always read-only, but the components may be writeable.

For example, on the Contact object is a compound address field OtherAddress. (There are a total of four standard Address fields spread across the Contact and Account objects, with a handful of others across Lead, Order, and so on). The components of OtherAddress are

  • OtherStreet
  • OtherCity
  • OtherState
  • OtherPostalCode
  • OtherCountry
  • OtherStateCode
  • OtherCountryCode
  • OtherLatitude
  • OtherLongitude
  • OtherGeocodeAccuracy.

Similarly, Contact has a compound Name field, as do Person Accounts, with components like FirstName and LastName.

So, if we’re working in dynamic Apex or building an API client, how do we acquire and understand the relationships between these compound and component fields?

API

In the REST API, the Describe resource for the sObject returns metadata for the object’s fields as well. This makes it easy to acquire all the data we need in one go.

GET /services/data/v43.0/sobjects/Contact/describe

yields, on a lightly customized Developer Edition, about 250KB of JSON. Included is a list under the key "fields", which contains the data we need (abbreviated here to omit irrelevant data points):

"fields": [
    {
        "compoundFieldName": null,
        "label": "Contact Id",
        "name": "Id"
    },
    {
        "compoundFieldName": "null",
        "label": "Name",
        "name": "Name"
    },
    {
        "compoundFieldName": "Name",
        "label": "First Name",
        "name": "FirstName"
    }
]

Each field includes its API name ("name"), its label, other metadata, and "compoundFieldName". The value of this last key is either null, meaning that the field we’re looking at is not a component field, or the API name of the parent compound field. There’s no marker indicating that a field is compound.

This structure can be processed easily enough in Python or other API client languages to yield compound/component mappings. Given some JSON response (parsed with json.loads()), we can do

def get_compound_fields(response):
    return {
        field["compoundFieldName"] for field in response["fields"] if field["compoundFieldName"] is not None
    }

Likewise, we can get the components of any given field:

def get_component_fields(response, api_name):
    return [field["name"] for field in response["fields"] if field["compoundFieldName"] == api_name]

Both operations can be expressed in various ways, including uses of map() and filter(), or can be implemented at a higher level if the describe response is processed into a structure, such as a dict keyed on field API name.

Apex

The situation in Apex is rather different because of the way Describe information is returned to us. Rather than a single, large blob of information covering an sObject and all of its fields, we get back individual describes for an sObject (Schema.DescribeSobjectResult) and each field (Schema.DescribeFieldResult). (We can, of course, call out to the REST Describe API in Apex, but this requires additional work and an API-enabled Session Id).

Moreover, Schema.DescribeFieldResult does not include the critical compoundFieldName property.

… or rather, it isn’t documented to include it. In point of fact, it does contain the same data returned for a field in the API Describe call, as we can discover by inspecting the JSON result of serializing a Schema.DescribeFieldResult record.

Unlike some JSON-enabled Apex magic, we can get to this hidden value without actually using serialization. Even though it’s undocumented, these references compile and execute as expected:

Contact.OtherStreet.getDescribe().compoundFieldName

and

Contact.OtherStreet.getDescribe().getCompoundFieldName()

This makes it possible to construct Apex utilities like we did in Python to source compound fields and compound field components. In Apex, we’ll necessarily be a bit more verbose than Python, and performance is a concern in broad-based searches. Both finding compound fields on one sObject and locating component fields for one compound field take between 0.07 and 0.1 second in unscientific testing. Your performance may vary.

public class CompoundFieldUtil {
    public static List<SObjectField> getCompoundFields(SObjectType objectType) {
        Map<String, SObjectField> fieldMap = objectType.getDescribe().fields.getMap();
        List<SObjectField> compoundFields = new List<SObjectField>();
        Set<String> compoundFieldNames = new Set<String>();

        for (String s : fieldMap.keySet()) {
            Schema.DescribeFieldResult dfr = fieldMap.get(s).getDescribe();

            if (dfr.compoundFieldName != null && !compoundFieldNames.contains(dfr.compoundFieldName)) {
                compoundFields.add(fieldMap.get(dfr.compoundFieldName));
                compoundFieldNames.add(dfr.compoundFieldName);
            }
        }

        return compoundFields;
    }

    public static List<SObjectField> getComponentFields(SObjectType objectType, SObjectField field) {
        Map<String, SObjectField> fieldMap = objectType.getDescribe().fields.getMap();
        List<SObjectField> components = new List<SObjectField>();
        String thisFieldName = field.getDescribe().getName();
                
        for (String s : fieldMap.keySet()) {
            if (fieldMap.get(s).getDescribe().compoundFieldName == thisFieldName) {
                components.add(fieldMap.get(s));
            }
        }
        
        return components;
    }
}

Then,

System.debug(CompoundFieldUtil.getComponentFields(Contact.sObjectType, Contact.OtherAddress));

yields

14:15:14:523 USER_DEBUG [1] DEBUG (OtherStreet, OtherCity, OtherState, OtherPostalCode, OtherCountry, OtherStateCode, OtherCountryCode, OtherLatitude, OtherLongitude, OtherGeocodeAccuracy)

and

System.debug(CompoundFieldUtil.getCompoundFields(Contact.sObjectType));

yields

22:15:30:089 USER_DEBUG [1] DEBUG (Name, OtherAddress, MailingAddress)

Simple modifications could support the use of API names rather than SobjectField tokens, building maps between compound field and components, and similar workflows.


This post developed out of a Salesforce Stack Exchange answer I wrote, along with work on a soon-to-be-released data loader project.