Nested Iteration: A Quick Look at Computational Complexity in Apex

The logic of nested iteration can be a real trap for new programmers. I see this a lot on Salesforce Stack Exchange, coming from two different angles. One angle simply asserts, having been taught so, that “nested loops are bad.” Well, not exactly - not as such, although they can be in specific implementations. The other side perceives no danger at all in nested loops - indeed, finds them the most natural, if naive, route to expressing certain search constructs.

Both miss the mark through not understanding the notion of computational complexity and how to evaluate it in programmatic logic. While it’s important to programmers everywhere, computational complexity can be especially important for developers on the Salesforce platform, where governor limits bite hard on reckless use of resources like CPU time.

I want to look at a simple but broadly applicable example of a “bad” nested iteration, one that needlessly consumes CPU time through inefficient logic, and the patterns by which one fixes that problem. Secondly, I’d like to look at an example of nested iteration that’s quite correct, and needed for its intended purpose.

Updating Opportunities Based on Account Changes

Let’s start with an Account trigger. Our requirement is as follows:

As a Sales representative, I need the Description field on each Opportunity to match that on the parent Account so that I always have access to critical Account information while working deals.

We’ll suppose that using a custom formula field is contraindicated, for the sake of illustration. We prepare the following trigger:

trigger AccountTrigger on Account (after update) {
    List<Opportunity> opps = [
        SELECT Id, Description
        FROM Opportunity
        WHERE AccountId IN :Trigger.newMap.keySet()
    ];
    Map<Id, Opportunity> updateMap = new Map<Id, Opportunity>();

    for (Account a : Trigger.new) {
        for (Opportunity o : opps) {
            if (o.AccountId == a.Id) {
                o.Description = a.Description;
                updateMap.put(o.Id, o);
            }
        }
    }

    update updateMap.values();
}

This trigger does some things right. In particular, it’s bulkified, costing us only one SOQL query and one DML statement. However, the structure of the nested for loops puts us at risk.

I’m calling this pattern a “matrix search” - a term I got from Andrés Catalán - because I like how it evokes scanning across rows and columns of data. It’s a pattern that can be rather nasty where it’s not needed, and it eats CPU time. Let’s look at why.

The key question to ask here is “How many times will the body of the inner loop be executed?” Here, how many times will we ask if (o.AccountId == a.Id)?

Suppose we update 20 Accounts, and those Accounts have an average of 15 Opportunities, for a total of 300 Opportunities returned to our SOQL query. Our outer loop will execute 20 times (once per Account). Our inner loop will execute 300 times per Account, because we’re iterating over all of those Opportunities - even the ones that are totally irrelevant to the Account we’re looking at.

So what’s the total invocation count of if (o.AccountId == a.Id)? It’s the total number of Accounts (20) multiplied by the total number of Opportunities (300): 6,000 iterations, to process just 300 records.

Imagine if you’re in a large-data-volume org, or if you’re performing updates via Data Loader. These numbers can get much, much worse. 200 Accounts updated via Data Loader, with an average of 20 Opportunities each (total of 4,000)? Suddenly, we’re looking at 800,000 iterations of that loop. We’ll be in CPU limit trouble before we know it, and our users will notice the performance hit well before that point. We’ve needlessly increased our computational complexity.

Οἰόνται τινές, βασιλεῦ Γέλων, τοῦ ψάμμου τὸν ἀριθμὸν ἄπειρον εἶμεν τῷ πλήθει
Some believe, King Gelon, that the number of the sand - in regard to its multitude - is without limit…
— Archimedes, The Sand Reckoner

The Fix: Maps

Nearly any time we’re iterating over two lists in nested for loops and making comparisons between every possible pair of values, which is exactly what we’re doing here, we can eliminate the inner loop using a Map for significant performance and complexity gains.

We’re interested in an equality between two values, Account.Id and Opportunity.AccountId. For each AccountId, we need to find the corresponding Account - fast - and access other data on the Account. A Map<Id, Account> lets us do that.

Instead of nesting iteration, we’d arrange for a Map<Id, Account> before we start iterating over our Opportunities. Then, inside the Opportunity loop, we don’t need to iterate over Accounts to find the one with the matching Id, because we can access it directly via the Map’s get() method, with no iteration - in constant time, that doesn’t get more expensive with the number of Accounts we have.

In this use case, we don’t need to build the Map ourselves, because we’ve already got Trigger.newMap, which here is a Map<Id, Account>.

If we needed to build the Map, or we were working with some other property than the Id, like the Name, we’d do

Map<String, Account> accountMap = new Map<String, Account>();

for (Account a : accountList) {
    accountMap.put(a.Name, a); 
}

However we obtain our Map, with it in hand, we iterate over Opportunities - once, and with no nested iteration.

for (Opportunity o : opps) {
    Account a = accountMap.get(o.AccountId); // No iteration!
    o.Description = a.Description;
    updateMap.put(o.Id, o);
}

Note that the first loop, on Account, will execute exactly once per Account. Then, the second loop on Opportunity will execute exactly once per Opportunity. Taking the higher-data example from above, we go from

200 Accounts * 4,000 total Opportunities = 800,000 iterations

to

200 Accounts + 4,000 total Opportunities = 4,200 iterations

a reduction of roughly 99.5%.

This pattern is highly generalizable, and it’s why it’s so critical to ask those questions about how many times your code will execute given various data inputs.

In computer science, these analyses are typically done in terms of Big-O notation. We won’t go into the details of Big-O notation here, but keep the phrase in the back of your mind - it’s a mathematical way to formalize the cost of specific algorithms.

Note: there are at least two potential performance optimizations in the final version of the trigger given above. Both optimizations reduce the scope of the operations to be performed. Finding them is left as an exercise for the reader.

Counter-Example: Iterating Over SOQL Parent-Child Query

Let’s look at another example where nested iteration isn’t pathological - and in fact is the correct implementation pattern. Here’s a new user story:

As a Customer Support supervisor, I need to mark my Accounts’ Status field “Critical” whenever they have more than 10 open Cases matching a complex set of criteria.

There’s a few different ways to implement this requirement. We’ll look at a fragment of one: we develop a service class that takes a set of Account Ids against whose Accounts new Cases have been opened. Because the Case criteria are so complex, we’re using a pre-existing utility class to evaluate each Case to see whether it matches, and we’re running a parent-child SOQL query to walk through available Cases on each Account.

public static void evaluatePotentialCriticalAccounts(Set<Id> accountIds) {
    List<Account> updateList = new List<Account>();

    for (Account a : [
        SELECT Id, Status__c,
               (SELECT Id, Subject, Status FROM Cases)
        FROM Account
        WHERE Id IN :accountIds
    ]) {
        Integer qualifyingCases = 0;

        for (Case c : a.Cases) {
            if (CaseUtility.isCaseCriticalStatusEligible(c)) {
                qualifyingCases++;
            }
        }

        if (qualifyingCases >= 10) {
            updateList.add(new Account(Id = a.Id, Status__c = 'Critical'));
        }
    }

    update updateList;
}

Now, this code might not be ideal (depending, of course, on the fine detail of our requirements and the configuration of our Salesforce org). For example, if we can fit our criteria into a SOQL query instead of an Apex function, an aggregate query against Case would likely be a better solution than this Apex.

However, what this code doesn’t do is fall into the same trap as the trigger we looked at above, even though it uses nested loops. Why?

Look at how many times the loops run. We’ll call CaseUtility exactly once per Case - not the total number of Cases multiplied by the total number of Accounts. We’ll never run a loop iteration that’s irrelevant, that has no possibility of yielding action, because we’re not “matrixing” our Accounts and Cases and generating data most of which is irrelevant to the work we’re aiming to do. Hence, we hit each Case exactly once. There’s no optimization possible via Maps here.

There are other counter-examples where nested iteration is perfectly valid. Consider for example a board-game algorithm. We might need to iterate over each space on the board, across rows and down columns:

for (Integer i = 0; i < board.width(); i++) {
	for (Integer j = 0; j < board.height(); j++) {
		// Take some action for each space on the board.
	}
}

A nested for loop is just right for this use case, and it wastes no CPU time because traversing the entire two-dimensional space is exactly what is called for.

In both situations, the computational complexity of the solution we’ve implemented is in line with the problem we’re solving.

Summing Up

Here’s a few points I hope you’ll take away from this discussion, whether or not you ever encounter the term Big-O notation again:

  • Iteration, and nested iteration, are not bad in and of themselves.
  • Traversing data you don’t need to traverse is bad.
  • It’s always worth asking “how many times will this run?”
  • Performance can scale up and down much faster than you expect, given pathological implementations.

Rolling Up Opportunity Contact Roles with Change Data Capture and Async Apex Triggers

Salesforce admins, and perhaps especially nonprofit admins, have been wishing for a long time for the ability to build more functionality around Opportunity Contact Roles - like roll-up summary fields, validation rules, and triggers.

The new Change Data Capture offered intriguing possibilities because, unlike regular Apex triggers, the feature supports change notifications for OpportunityContactRole. With the new ability, in Summer ‘19, to subscribe to Change Data Capture events in Async Apex Triggers, we now have a viable route to build completely on-platform features using these tools - and the first demo I wanted to build was a solution to roll up Opportunity Contact Roles.

If you want to drive straight into the code, find it here on GitHub.

The Simple Route and Why It Doesn’t Work

My first strategy was simply to call into Declarative Lookup Rollup Summaries from an Async Apex trigger. That approach runs afoul, though, of one of the key differences between Async Triggers and standard Apex Triggers:

Apex triggers operate on sObjects. Async Apex triggers operate on change events.

Change events aren’t sObjects, and don’t contain a complete snapshot of the sObject record at a point in time. Rather, since the underlying Change Data Capture feature targets synchronization of changes to some persistent store, each event only contains enough information to apply a specific change to a stored record.

The section Apex Change Event Messages in the Change Data Capture Developer Guide explains what data is actually available in each message:

Create
For a new record, the event message contains all fields, whether populated or empty. […]
Update
For an updated record, the event message contains field values only for changed fields. Unchanged fields are present and empty (null), even if they contain a value in the record. […]
Delete
For a deleted record, all record fields in the event message are empty (null).
Undelete
For an undeleted record, the event message contains all fields from the original record, including empty (null) fields and system fields.

For roll-up applications, the key is the update and delete events. Note that for update, we don’t get old data, only new, and at the time we receive the event the records in the database have already been updated (we can’t query for old values). Upon delete, we don’t get the field data for the deleted sObjects at all! It’s assumed that we, as consumers of this event stream, have some other persistent data store against which we’re applying the ordered stream of change events.

To build a roll-up, we need to know the parent objects for deleted children, the parent objects for changed records, and the old parents for children that are reparented.

So Change Data Capture + Async Apex Triggers is a dead end for building rollups? Not at all. There’s another strategy that allows us to take advantage of Change Data Capture’s inherent facility for synchronizing data stores: we’ll use a shadow table.

Rolling Up a Shadow Table

The SFDX project for everything we’re going to demonstrate here is available on GitHub.

We can’t roll up Opportunity Contact Roles to the Opportunity, but we easily can roll up some arbitrary Custom Object with a master-detail relationship. A shadow table, in which we mirror the Opportunity Contact Role object with a Custom Object, fits nicely as a solution: it both gives us the roll up we need without code, and provides us with an on-platform persistent data store against which Change Data Capture events can be applied as we create, update, and delete Opportunity Contact Role records.

Here’s the schema we’ll use: a custom object plus a native Rollup Summary Field on Opportunity.

Shadow Object

Rollup Field

Note the External Id field we’ve created to hold the Id of the corresponding Opportunity Contact Role. That’s the linchpin of our synchronization effort, and we’ll use it below to build an efficient sync trigger.

Synchronizing the Data

To synchronize data from OpportunityContactRole into our shadow table, we’ll use an Async Apex trigger processing the OpportunityContactRoleChangeEvent entity. First, we’ll select the needed object in Setup under Change Data Capture:

Change Data Capture Setup

Then, we build a trigger. The code is here.

Our trigger’s operation is different from what we’d see in typical, synchronous Apex. We iterate in order over the change events we receive and use them to build up two data sets. The first is a Map<Id, Shadow_Opportunity_Contact_Role__c>, which we use to store both new and updated shadow records derived from the change events. The keys, worth noting, are OpportunityContactRole Ids, ensuring that we create or update exactly one record for each OpportunityContactRole whose state incorporates all of the changes in our inbound event stream.

We also store a Set<Id> of the Ids of deleted OpportunityContactRole records. As we find delete events in our stream, we remove corresponding entries from our create and update Map, and add those Ids to the Set.

When we finish iterating through events - remember that this is an ordered time stream - these two data structures contain the union of all of the changes we need to apply to our shadow table. At that point, it’s two simple DML statements to persist the changes:

upsert createUpdateMap.values() Opportunity_Contact_Role_Id__c;
delete [
    SELECT Id
    FROM Shadow_Opportunity_Contact_Role__c
    WHERE Opportunity_Contact_Role_Id__c IN :deleteIds
];

The index on Opportunity_Contact_Role_Id__c should keep these operations performant, and once they complete, the system updates our native Rollup Summary Field on the parent Opportunities.

Testing

There’s just a couple of extra wrinkles to testing Async Apex Triggers. We have a new method in the system Test class to enable the Change Data Capture feature, and it overrides system CDC settings to ensure that the code under test executes regardless of org settings:

Test.enableChangeDataCapture();

Then, we require that CDC events are delivered and processed synchronously, using the tried-and-true Test.startTest() and Test.stopTest() calls, or by calling Test.getEventBus().deliver().

Intermediate testing results, while working towards passage and full code coverage, can be tough to interpret: most logs are found under the Automated Process user rather than the context user, requiring the use of trace flags, but some (possibly those from @testSetup methods) do appear for the context user. Code coverage maps can also produce misleading results until full passage is achieved.

As part of the demo, I built a test class that achieves full coverage on the Async Apex Trigger. It’s also part of the GitHub project. (While the tests have good logic path coverage, they could stand to exercise bulk use cases better!)

Results

The CDC + Async Apex Trigger solution doesn’t add everything we might want with Opportunity Contact Roles. We still cannot write Validation Rules against the Opportunity that take Roles into account, because the rollup operation is run asynchronously, after the original transaction commits. It’s also a near-real time, rather than real time, solution, so a brief delay may be perceptible before the rollup field updates. And lastly, because Async Apex Triggers run as the Automated Process user, Last Modified By fields won’t show actual users’ names once the rollup operation completes.

But what it does, it does well: we get all the functionality and performance of native roll-up summary fields against an object that’s never supported that feature, with a minimal investment in code. Users can see, and report on, rolled-up totals of Opportunity Contact Roles across their Opportunity pipeline.

And it’s a really neat way to apply some of the latest Salesforce technologies to solve real-world problems.

Real-World Unit Testing: Get to 100% Coverage, the Right Way

My presentation from PhillyForce ‘19, “Real-World Unit Testing: Get to 100% Coverage, the Right Way”, is now available on YouTube. It was a great experience to return to PhillyForce as a speaker and organizing committee member to talk about one of the subjects I’m most passionate about - automated testing - and make the case that writing good unit tests is actually a moral, not just a technical, imperative.

The talk covers four big headings:

  • Thinking about tests as promises we’re making, and how that changes the focus of a testing program.
  • Writing good unit tests in Apex.
  • Writing testable code, thinking about tests as API consumers, and using tests to guide refactoring.
  • Advanced testing strategies with mocking and dependency injection for working with complex asynchronous Apex.

Three Routes to Time-Based Rollup Summary Fields

Rollup Summary Fields are great. Users love them because they offer at-a-glance insight at the top level of the data hierarchy, without needing to run a report or drill down in the user interface to child records. Unfortunately, native Rollup Summary Fields come with a variety of limitations on what data you can roll up, where, and applying which criteria. In particular, time-based rollup summary fields are a common need that’s tricky to meet with this native functionality.

These fields come from business requirements like this:

As a Sales representative, I need to see on the Account a running total of Opportunities on a year-to-date and month-to-date basis.

Naïve Solutions and Why They Don’t Work

Most naïve solutions to this class of requirements fail because they’re based on formula fields or date literals. One might try creating a formula like this, for example, for a Checkbox field denoting whether or not to include a specific record:

MONTH(CloseDate) = MONTH(TODAY()) && YEAR(CloseDate) = YEAR(TODAY())

Immediately, though, we find that this new field isn’t available for selection in our Rollup Summary Field criteria screen.

Likewise, we can’t save a Rollup Summary Field where we write criteria like

CloseDate EQUALS THIS_MONTH

using a date literal like we might in a report.

The same underlying limitation gives rise to both of these restrictions: native Rollup Summary Fields require a trigger event on the rolled-up object in order to update the rolled-up value. Rollup Summary Fields are evaluated and updated as part of the Trigger Order of Execution (steps 16 and 17).

Formula fields, though, don’t have any underlying storage, and their values are not persisted to the database. Rather, their values are generated at the time the record is viewed. For this reason, there’s no event when a formula field’s value “changes”. In fact, some formula fields can be nondeterministic and incorporate date or time fields, such that their value is essentially always in flux.

λέγει που Ἡράκλειτος ὅτι ‘πάντα χωρεῖ καὶ οὐδὲν μένει’
Herakleitos says that ‘everything changes, and nothing stands still’
— Plato, Cratylos 402a

Likewise, there’s no trigger event, on any record when the current day changes from one to the next. That’s why we cannot embed date literals in our Rollup Summary Field criteria - when today ticks over to tomorrow, not only is there no trigger event, but a recalculate job could be very computationally expensive.

The approaches we will take to solve this requirement ultimately take both tacks: creating a trigger event based on the date, and arranging for broad-based recalculation on a schedule.

Approach 1: Native Rollup Summary Fields with Time-Based Workflow or Process

Our first approach maximizes use of out-of-the-box functionality, requiring no Apex and no additional packages. However, it can scale poorly and is best suited for smaller Salesforce orgs.

We can’t create a native Rollup Summary Field, as we’ve seen, based on a formula field. But we can rollup based upon a Checkbox field, and we can use Time-Based Workflow and Processes to update that Checkbox on a schedule, using formula fields to get the dates right.

Here’s how it works:

  1. We create two Date formula fields on the child object (here, Opportunity). One defines the date when the record enters rollup scope and the other the date when it exits rollup scope.
    First Day in Rollup Scope Last Day in Rollup Scope
  2. We create a Checkbox field on the child object. This is what our Time-Based Workflow Actions will set and unset, giving us a triggerable event for the Rollup Summary Field.
  3. We create a Rollup Summary Field on the parent object (here, Account). We use the criterion that our Checkbox field is set to True.
    Rollup Summary Field

  4. We create a Workflow Rule with two Time-Based Actions attached to it.
    Workflow Rule

This approach works well for most sorts of time-based rollup requirements. Because it uses formula fields to define when a record enters and exits the rolled-up period, it’s not limited to “this month”, “this year”, and other simple definitions. The time period doesn’t even need to be the same for each record!

Costs and Limitations

This solution, as noted above, is best suited for smaller Salesforce orgs with moderate data volume and small numbers of time-based rollups. Each time-based rollup will consume three custom fields on the child and one Rollup Summary Field on the parent (the limit for which is 25 per object), as well as a Process or a Workflow Rule with two attached Actions.

Because of the mechanics of time-based workflow actions, updates made won’t roll up instantaneously, like with a trigger. Rather, they’ll be on a small delay as the workflow actions are enqueued, batched, and then processed:

Time-dependent actions aren’t executed independently. They’re grouped into a single batch that starts executing within one hour after the first action enters the batch.

More important, though, is the platform limitation on throughput of time-based workflow actions. From the same document:

Salesforce limits the number of time triggers an organization can execute per hour. If an organization exceeds the limits for its Edition, Salesforce defers the execution of the additional time triggers to the next hour. For example, if an Unlimited Edition organization has 1,200 time triggers scheduled to execute between 4:00 PM and 5:00 PM, Salesforce processes 1,000 time triggers between 4:00 PM and 5:00 PM and the remaining 200 time triggers between 5:00 PM and 6:00 PM.

Data volume is therefore the death of this solution. Imagine, for example, that your org is rolling up Activities on a timed basis - say “Activities Logged This Month”. Your email-marketing solution logs Tasks for various events, like emails being sent, opened, or links clicked. Your marketing campaigns do well, and around 30,000 emails or related events are logged each month.

That immediately poses us a problem. When the clock ticks over to the first of the month, for example, we suddenly have 30,000 emails that just went out of scope for our “Activities Logged This Month” rollup. Salesforce now begins processing the 30,000 enqueued workflow actions that have accumulated in the queue for this date, but it can only process 1,000 per hour. Worse, the limit is 1,000 per hour across the whole org - not per workflow. So we now have a backlog 30 hours deep that impacts both our Rollup Summary Field and any other functionality based on timed actions across our org.

If you have a relatively small number of rollups to implement and relatively little data volume, this solution is both effective and expedient. Other organizations should read on for more scalable solutions.

Approach 2: DLRS in Realtime + Scheduled Mode

Declarative Lookup Rollup Summaries can help address many limitations of native Rollup Summary Fields. DLRS doesn’t have inherent support for time-based rollups, but it does have features we can combine to achieve that effect.

Instead of using a native Rollup Summary Field, we start by defining a DLRS Rollup Summary. We can use any SOQL criteria to limit our rolled-up objects, including date literals. As a result, we don’t need formula fields on the child object - and we also win freedom from some of the other limitations of native Rollup Summary Fields.

Here’s how we might configure our putative Opportunity rollup in DLRS:

DLRS

To start with, this Rollup Summary will only work partially. It’ll yield correct results if we run a full calculate job by clicking Calculate. If we configure it to run in Realtime mode and deploy DLRS’s triggers, we’ll see our rollup update as we add and delete Opportunities.

What won’t work, though, is the shift from one time period to the next. On the first of the month, all of our Opportunities from last month are still rolled up - there was no trigger event that would cause them to be reevaluated against our criteria.

With DLRS, rather than using time-based workflow actions to create a trigger event by updating a field, we recalculate the rollup value across every parent record when each time period rolls over. Here, we’ve scheduled a full recalculation of the rollup for the first day of each month.

DLRS Scheduler

Because DLRS also deploys triggers dynamically to react to record changes in real time, we get in some sense the best of both worlds: instant updates when we add and change records on a day-to-day basis, with recalculation taking place at time-period boundaries.

Costs and Limitations

Each DLRS rollup that’s run in Scheduled mode consumes a scheduled Apex job, the limit for which is 100 across the org - and that limit encompasses all Scheduled Apex, not just DLRS jobs.

Running rollups in real time requires the deployment of dynamically-generated triggers for the parent and child objects. This may challenge deployment processes or even cause issues with existing Apex development in rare cases.

Processing rollup recalculations can be expensive, long-running operations. A full rollup recalculation must touch each and every parent object in your org.

In general, this solution offers the greatest breadth and flexibility, but also demands the greatest awareness of your org’s unique challenges and performance characteristics.

Approach 3: Native Rollup Summary Fields with Scheduled Batch Apex

We can scale up Approach 1 by replacing the limited Time-Based Workflow Actions with a scheduled Batch Apex class. We retain the native Rollup Summary Field based on a Checkbox on the child record, but we don’t require the formula fields on the child.

The Batch Apex class we use is very simple: all it does it query for records whose Checkbox value does not match the position of the relevant Date field vis-a-vis the rollup’s defined time interval, at the time the scheduled class is executed.

Continuing the example developed above, where we’re rolling up Opportunities for the current month only, our batch class’s start() method would run a query like this:

SELECT Id, This_Month_Batch__c
FROM Opportunity
WHERE (CloseDate = THIS_MONTH AND This_Month_Batch__c = false) 
      OR (CloseDate != THIS_MONTH AND This_Month_Batch__c = true)

Here we just locate those Opportunities that are not marked as in-scope but should be (where CloseDate = THIS_MONTH - note that we have the freedom to use date literals here), or which are marked as in scope but should not be.

Then, the work of our execute() method is extremely simple: all it does is reverse the value of the Checkbox field This_Month_Batch__c on each record:

public void execute(Database.BatchableContext bc, List<Opportunity> scope) {
    for (Opportunity o : scope) {
        o.This_Month_Batch__c = !o.This_Month_Batch__c;
    }
    update scope;
}

We’d schedule this batch class to run every night after midnight. When executed, it’ll update any records moving into or out of rollup scope that day, allowing the native Rollup Summary Field machinery to do the work of recalculating the Account-level totals.

Costs and Limitations

While it’s not Apex-intensive, this is the only solution of the three that is not declarative in nature, and all of the attendant costs of code-based solutions are present.

Organizations with high data volume or data skew may need to experiment carefully to ensure queries are properly optimized.

As with DLRS, each rollup using this strategy consumes one scheduled Apex job. Unlike DLRS, we need to run our scheduled class nightly, because we’re not combining it with a trigger for (partial) real-time operation. We could add more Apex to do so if desired.

Conclusions

Time-based rollup summary fields are something many organizations need, and there’s a lot of ways to get there. Beyond the three approaches discussed here, one could explore other, more customized and site-specific options — like a full-scale Apex implementation using triggers and Scheduled Apex, or applying a data warehouse or middleware solution to handle rollups and analytics. Each org’s ideal approach will depend on the resources available, preference for code, declarative, or hybrid solutions, and important considerations around data volume and performance.

Amaxa: A Multi-Object Data Loader for Salesforce

I’ve just released Amaxa, an open-source project I’ve been working on for several months. Amaxa is a multi-object ETL tool/data loader for Salesforce. It’s designed to extract and load whole networks of records, like a selected set of Accounts with all of their Contacts, Opportunities, Contact Roles, and Campaigns, in a single repeatable operation while preserving the relationships between those records.

Core use cases for Amaxa include sandbox seeding, data migration, and retrieving connected data sets. The current release of Amaxa is version v0.9.2, in beta. Feedback and bug reports, via GitLab Issues, are welcome. There’s a lot more to come.

Amaxa is built in Python with Salesforce DX, simple_salesforce, and salesforce_bulk. Instructions and examples are included in the GitLab repository.