Everyday Salesforce Patterns: Filtering Parent Objects By Child Objects

Sometimes, we need to filter an Account query by its Contacts, or some custom object Project__c by its associated Subject_Area__c records. There might not be rollup summary fields in place, or the criteria might go beyond what rollups can do, or we might be dealing with a lookup relationship, forcing us to express this filtration directly in SOQL or in Apex.

Suppose, for example, that we do have a custom object Project__c. A second custom object Subject_Area__c is connected to it by a lookup relationship, Linked_Project__c, with the relationship name Subject_Areas. Suppose further that Subject_Area__c also has a junction object Subject_Area_Expert__c creating a many-to-many relationship to Contact; Subject_Area_Expert__c also records the dates an expert is assigned to that subject area.

We’d like to be able to a perform a variety of queries that express different kinds of filters on the parent based upon characteristics and qualities of the different child objects:

  1. Locating Project__c records with no subject areas at all.
  2. Locating Project__c records with more than five subject areas.
  3. Locating Project__c records with subject areas whose name contains ‘Industry’.
  4. Locating Project__c records without subject areas whose name contains ‘Industry’.
  5. Locating Project__c records with at least one Subject_Area__c record that itself has at least one Subject_Area_Expert__c assignment with current dates to a Contact whose Title is “Solution Architect”.

We should also note that some of these requirements can potentially be met using native Salesforce reports with Cross Filters, or by the use of Declarative Lookup Rollup Summaries.

There are several basic ways to approach constructing these queries:

  • Using a parent-child query and performing filtering in Apex.
  • Using a parent query with an IN or NOT IN child subquery with filter (semi-join/anti-join).
  • Using a child aggregate query and postprocessing in Apex.
  • Linking multiple queries of these kinds by performing a synthetic join in Apex.

Parent-Child Query

for (Project__c sr : [SELECT Id, 
                             (SELECT Id 
                              FROM Subject_Areas__r 
                              WHERE Name LIKE '%Industry%') 
                      FROM Project__c 
                      WHERE Client__c = :clientId]) {
    if (sr.Subject_Areas__r.size() == 0) {
        // We've identified a Project on the client whose Id 
        // is `clientId` with no Subject Areas that contain 'Industry'.
        // (but potentially other Subject Areas)
        // Do something about it.
    }
}

This pattern is appropriate for use when the filtration criteria for child objects are very complex or involve relationships that cannot be easily expressed in SOQL, and for situations where we are looking for a zero count of child objects. Note that the WHERE filters on the parent and child query are, strictly speaking, optional, although performance in many cases will necessitate selective query filters. It is a simple pattern and can implement requirements 1-4, but not requirement 5, as only one level of child relationship can be traversed.

This can be an anti-pattern for situations with high data volume in the parent or child object, when query performance will be a huge concern and heap size could become an issue. Adding selective filters on the parent object, adding filters on the child object, and reducing the number of columns queried will help obviate these issues.

While parent-child queries can only descend one level of relationship in the object hierarchy, Apex post-processing can gather child object Ids and re-query to descend additional plies.

Parent Query with IN or NOT IN Child Subquery (Semi-Join/Anti-Join)

Parent queries with IN and NOT IN child subqueries are particularly useful for cases 3 and 4. This query format takes advantage of the fact that the subquery is treated as returning a typed Id value, not an sObject instance. If we ran the subquery below, on Subject_Area__c, separately in Apex, we’d get back a List<Subject_Area__c>, which we’d have to iterate over and accumulate parent Project__c Ids before re-querying that object. When we present it as a subquery, no intermediate steps or type conversion are required; the Ids are used directly. (Salesforce does require that they be Ids of the correct type of object, however).

SELECT Id 
FROM Project__c 
WHERE Id NOT IN (SELECT Linked_Project__c 
                 FROM Subject_Area__c 
                 WHERE Name LIKE '%Industry%')

See Semi Joins and Anti-Joins for more in-depth information, including the numerous restrictions that apply to this type of query. In particular, note that you can only use two IN semi-joins or anti-joins in a single query.

It’s also important to note that this pattern isn’t limited to parent-child filtration. The semi-join and anti-join can be used to filter Object A based upon Object B using any shared reference field, including reference fields on the two objects that both point to some other Object C. See the Salesforce documentation linked above for in-depth discussion of more use cases.

This pattern is particularly suitable for case 3 and case 4, where we’re looking for parent objects based on specific criteria on (but not volume of) child objects. Note that it can return parent object data but doesn’t include child object data unless we include a separate sub-select.

In particularly complex cases or if multiple levels of subquery are required, it’s necessary to run the subqueries separately in Apex and accumulate relevant Ids in a Set for inclusion in the next query layer. Salesforce only allows one level of semi-join or anti-join, so we perform a sort of synthetic join in Apex of two separately expressed SOQL queries. For example, we can cover case 5 in a fashion like this:

// 5. Locating `Project__c` records with at least one `Subject_Area__c` record that itself has at least one `Subject_Area_Expert__c` assignment with current dates to a Contact whose Title is "Solution Architect".

List<Subject_Area_Expert__c> experts;

experts = [SELECT Subject_Area__c
           FROM Subject_Area_Expert__c
           WHERE Contact__r.Title = 'Solution Architect'
                 AND Start_Date__c <= TODAY
                 AND End_Date__c >= TODAY];
Set<Id> subjectAreaIds = new Set<Id>();

for (Subject_Area_Expert__c e : experts) {
    subjectAreaIds.add(e.Subject_Area__c);
}

List<Project__c> = [SELECT Id 
                    FROM Project__c 
                    WHERE Id IN (SELECT Linked_Project__c 
                                 FROM Subject_Area__c 
                                 WHERE Id IN :subjectIds)];

The same Apex/SOQL pattern can be extended to cover a more or less arbitrary depth of complexity, up to the point where governor limits are implicated.

Child Aggregate Query

The child aggregate query is suitable for locating (and ordering) parent records based on a non-zero count of child records, optionally matching the child records against some criterion. As such, it can cover our Cases 2 and 3 well. It’s not suitable for any situation where parent records without children should be included. It can provide a count of child records matching specific criteria, but doesn’t return the child or parent record data itself. The query can express some filtration that would otherwise need to be performed in Apex.

SELECT count(Id), Linked_Project__c 
FROM Subject_Area__c 
WHERE Name LIKE '%Industry%' 
GROUP BY Linked_Project__c 
HAVING count(Id) > 2 
ORDER BY count(Id) DESC

This query will give us a List<AggregateResult> with the Ids of every Project__c having at least two associated Subject_Area__c records whose Names contain ‘Industry’, in descending order of the count of such records. We’ll have to re-query to get more information about the Projects themselves or otherwise post-process the AggregateResult list in Apex.

In large data volume environments, selectivity may be a challenge because the filters on the child object that are relevant to the desired parent object set may not be especially narrow relative to the overall child object data set. Testing and query planning will help pinpoint any performance issues.

Headed for the Next Milestone

For 2018, I’ve set my sights on becoming a Salesforce Certified Application Architect. (My 2017 goal was Platform Developer II, which I did complete). To reach Application Architect, I need three certifications: Sharing and Visibility Designer, Data Architecture and Management Designer, and App Builder.

Yesterday, I passed the first milestone: Certified Sharing and Visibility Designer, my first architect-tier certification.

Sharing and Visibility Designer

It was a challenging exam and pushed me to study and implement a number of areas of functionality, like Territory Management, with which I hadn’t previously gotten hands-on experience. I’m excited to tackle Data Architecture and Management Designer for the second half of the year.

Talk on Continuous Integration Now on YouTube

My presentation from PhillyForce ‘18, “Continuous Integration with Salesforce DX: Practices and Principles for All”, is now available on YouTube.

This talk draws on several past articles published here:

Some additional resources and examples are available on my GitHub:

  • circleci-sfdx-examples, a compendium of CircleCI/Salesforce DX examples, including a basic project, using the Lightning Testing Service, testing against multiple org shapes, and using PMD static analysis.
  • sfdx-simplesalesforce, demonstrating how to test integrated code written in Python with simple_salesforce via Salesforce DX scratch orgs.
  • septaTrains, my toy Lightning project (ever wanted your SEPTA regional rail commute on your Lightning homepage?) and testing project for different CI and automated testing solutions.
  • DMRNoteAttachmentImporter, a slightly more useful package also building with SFDX on CircleCI.

The Curious Nature of the Salesforce Boolean

The Boolean can be among the simplest data types: it is either true or false, full stop, no complications.

On the Salesforce platform, this could hardly be further than the truth.

Consider the following bug, in a simplified example:

public class BooleanPropertyCheck {
    public Boolean myProperty { get; set; }

    // ... more code here ...
}

@isTest
public class BooleanPropertyCheckTest {
    @isTest
    public static void testTheThing() {
        BooleanPropertyCheck bpc = new BooleanPropertyCheck();

        System.assert(bpc.myProperty);
    }
}

That ought to be fine - we’re asserting a Boolean value, which is what asserts are for. (Although one might reasonably expect the assertion itself to fail here). But in fact we get not an assertion failure or success, but a NullPointerException. This reveals several things, both specific and broad:

  1. The value of myProperty is null when not initialized;
  2. null is not equivalent to false in Apex;
  3. The System.assert() method throws an exception when passed null, but not a failed assertion as such;
  4. Boolean variables in Apex are actually tri-valued: they can be true, false, or null, and they default to null.

Note that it doesn’t matter here whether we define myProperty using property syntax or as a simple Boolean instance variable. The behavior is the same.

It’s the case in other contexts as well that referencing a null Boolean value produces a NullPointerException rather than evaluating to false. (This can be particularly confusing to debug, as we’re conditioned to look for property access when troubleshooting a NullPointerException). These conditionals, for example, produce NullPointerException:

Boolean b = null;

System.debug(b ? 'true' : 'false'); // NullPointerException
if (b) System.debug('b is true!'); // NullPointerException

At a certain level, this is fair enough. We have some weird behaviors to look out for, but there’s nothing all that wrong with tri-valued Booleans (although it’s arguable that the behavior of System.assert(null) in particular is poorly designed). We should never rely on uninitialized values in our classes or local variables anyway, but always explicitly initialize them.

What’s perhaps more confusing, though, is that there are other layers of the Salesforce platform where Booleans are not treated as tri-valued.

Booleans and SOQL

Consider SOQL. The SOQL and SOSL Reference has this to say on filtering on Booleans:

You can use the Boolean values TRUE and FALSE in SOQL queries. To filter on a Boolean field, use the following syntax: WHERE BooleanField = TRUE WHERE BooleanField = FALSE

But we learn elsewhere in the reference another facet:

In a WHERE clause that uses a Boolean field, the Boolean field never has a null value. Instead, null is treated as false. Boolean fields on outer-joined objects are treated as false when no records match the query.

So Booleans in the database and in Apex are tri-valued, but in SOQL are treated as binary-valued. A filter in SOQL against null is treated exactly like one written against false, so the following queries are treated as equivalent:

[SELECT Id FROM Boolean_Object__c WHERE Boolean__c = false]
[SELECT Id FROM Boolean_Object__c WHERE Boolean__c = null]

Given this sort of almost-mismatch between SOQL and Apex, one might suppose that one could see the following happen, if we really drive a wedge into the gap:

Boolean_Object__c b = new Boolean_Object__c();

System.assertEquals(null, b.Boolean__c); // Should pass
b.Boolean__c = null;
System.assertEquals(null, b.Boolean__c); // Should pass

insert b;

Boolean_Object__c c = [SELECT Boolean__c 
                       FROM Boolean_Object__c
                       WHERE Id = :b.Id];

System.assertEquals(b.Boolean__c, c.Boolean__c); // Should fail.

But in fact we don’t, because sObject instances don’t behave like other Apex classes. In fact, assertion 1 fails, because Booleans are initialized not to null but to false in sObject class instances. Further, we’ll get a DMLException at insert b, because null is not treated as a legal value for inserting Boolean (Checkbox) fields. We can’t actually create a situation where there’s a null Boolean in the database.

So Booleans are tri-valued in Apex, but are clamped to true/false around DML statements and SOQL queries, and there are some special-case behaviors to remain aware of. In particular, users of wrapper or shadow Apex classes from which sObject instances are ultimately generated should keep in mind that the Boolean initialization behaviors differ between the two class types.

Consequences: Hierarchy Custom Settings

One area where this curious Boolean behavior has practical consequences is the Hierarchy Custom Setting. Custom Settings, of course, are custom objects, and they can contain checkbox fields. But Hierarchy Custom Settings have a unique feature allowing them to cascade populated field values down the hierarchy (Organization to Profile to User) until overriden by a non-null value at a lower level.

Suppose we have a custom setting Instance_Settings__c, with a single Checkbox field Run__c and some arbitrary set of other fields - say Test__c, a text field.

If we populate these fields at the Organization level (SetupOwnerId = UserInfo.getOrganizationId()), we expect rightly that the values at the Organization level will cascade down to the User level, if they’re not overriden. And for Test__c, our text field, that’s true. If we set that field to "foo" at the Organization level, and null at the User level, sure enough our Instance_Settings__c.getInstance() value will inherit the Organization’s value "foo".

But Booleans work differently, because they’re treated as binary-valued here - they’re never null. So a true value for Run__c will never cascade down to the User or Profile level of our Custom Setting if an instance is populated at that level. The instances at those levels always have false set for that Checkbox field if we don’t explicitly populate true upon creation at that setup level.

Instance_Setting__c s = new Instance_Setting__c();
s.Test__c = 'So say some';
s.Run__c = true;
s.SetupOwnerId = UserInfo.getOrganizationId();
insert s;

Instance_Setting__c instance = Instance_Setting__c.getInstance();

System.assertEquals('So say some', instance.Test__c); // Passes
System.assert(instance.Run__c); // Passes

s = new Instance_Setting__c();

s.SetupOwnerId = UserInfo.getUserId();
s.Test__c = 'So say we all';
insert s;

instance = Instance_Setting__c.getInstance();

System.assertEquals('So say we all', instance.Test__c); // Passes
System.assert(instance.Run__c); // Fails

The Upshot is…

  • Booleans are more complicated that one might expect.
  • Never rely on behavior around uninitialized variables (in any language!)
  • NullPointerException can arise even without a dot-notation object dereference.
  • Booleans can’t be relied upon to cascade in Hierarchy Custom Settings.

In Response to 10,000 GDPR Emails

Over the last week or two, every website in the known universe has sent us all emails about the GDPR and the vast scope of what data they collect and what they do with it. As a tiny response to this trend, I’ve gone the other direction: I’ve removed all analytics from ktema.org.

While I cannot speak for my hosting and infrastructure services (GitHub and Cloudflare), this site collects no information at all, sets no cookies, maintains no logs of any kind, and knows nothing at all about you. Because really, why would it need to?