Templating with `String.format()` and `String.join()` for Better Dynamic SOQL

Dynamic SOQL with complex queries and filters can easily become an unreadable mess. Consider a query like this one:

String query = 'SELECT Id ' +
    'FROM ' + objectName +
    'WHERE ' + contactLookupName + '.Title = :title ' +
    'AND ' + filterField + ' = \'' + filterValue + '\'' +
    'AND ' + startDateField + ' <= ' + dateValue
    'AND ' + endDateField + ' >= ' + dateValue;

return query;

What it’s trying to do is query a dynamically-determined child object of Contact for records associated to Contacts with a specific title, and which are valid for a specific date.

This query also exhibits (at least) five common issues with Dynamic SOQL, in no particular order:

  1. Issues with spacing. A missing space after objectName results in a parse error at runtime, since Dynamic SOQL syntax cannot be checked at compile time. These errors are very easy to make in Dynamic SOQL and aren’t always apparent upon inspection.
  2. Hanging Apex variable binds. The :title bind will be evaluated against the local scope at the time this query string is passed to Database.query(), not at the time of its creation. This makes storing and passing query strings as parameters fraught with risk: either you must construct and use all query strings within a single scope, or you must not use Apex binding (for which see more below), or you must risk evaluating binds in a different scope and introduce a non-compiler-checkable dependency in your code.
  3. SOQL injection vulnerabilities. Presumably user-supplied values (filterField, filterValue) aren’t escaped with String.escapeSingleQuotes() to mitigate potential injection attacks.
  4. Incorrect Date format conversion. While it’s legal and compiles, implicit Date-to-String conversion by doing myString + myDate results in the wrong Date format for SOQL and will yield runtime errors. It’s critical to convert Dates to SOQL format with String.valueOf(myDate), or to use Apex binding instead (resurfacing issue 2).
  5. Lack of readability and maintainability. The query is hard to read. It’s hard to format the code well. It’s hard to parse out what is user input and what’s not, and to get a sense of what the query’s overall structure looks like.

I strongly prefer to use a query template string with Dynamic SOQL, coupled with String.format(). This structure helps maintain a clean separation between the static core and dynamic parameters of the query, and can help mitigate issues 1 and 5, while making it easier to see and address issues 2, 3, and 4. Here’s what the query above might look like in that form:

final String queryTemplate;

queryTemplate = 'SELECT Id FROM {0} WHERE {1}.Title = ''{2}'' AND {3} = ''{4}'' AND {5} <= {6} AND {7} >= {6}';

return Database.query(
    String.format(
        queryTemplate
        new List<String> {
            String.escapeSingleQuotes(objectName),
            String.escapeSingleQuotes(contactLookupField),
            String.escapeSingleQuotes(title),
            String.escapeSingleQuotes(filterField),
            String.escapeSingleQuotes(filterValue),
            String.escapeSingleQuotes(startDateField),
            String.valueOf(dateValue),
            String.escapeSingleQuotes(endDateField)
        }
    )
);

By making this switch, we increase the readability of the query, clearly showing which elements are dynamic and which are the static core. We eliminate spacing issues. The compile-time type checking on the List<String> ensures that we cannot (unless we have additional logic in the parameters to new List<String>{}) perform implicit Date conversion, or other implicit type conversions.

While we’re still vulnerable to SOQL injection, explicitly listing out our parameters helps make clear which user-controlled values might need to be escaped. (Note that we’re escaping every user-supplied parameter here, even those for which a rogue quote doesn’t pose an injection risk; this keeps the static analyzer happy!)

String.format() isn’t a silver bullet: you still have to remain aware of Dynamic SOQL issues.

While this structure encourages us to use string substitution in lieu of Apex binds, we can still use binds and still encounter dangling bind issues. We’d typically want to retain use of binds anywhere a collection is used with IN. It’s best to keep those binds within a single method, rather than returning a query string that contains bind values, to avoid those dangling references.

It’s easier to show that user input is correctly escaped in this format, but it’s still not enforced by the compiler. A good static analyzer is needed to locate such issues.

Lastly, String.format() comes with a few interesting wrinkles due to its inheritance of the format used by Java’s MessageFormat. The most obvious consequence of this is the need to repeat all single quotes in the template string, as above (''{0}''). See the Java documentation for more nuances involved in this approach.

Dynamic SELECT Clauses

Dynamic SOQL templating doesn’t do anything whatsoever for enforcement of field-level security, which we need to handle in our code. Building SELECT clauses dynamically is another common source of string errors, too. Luckily, we can build on String.format() templating to handle dynamic SELECT clauses cleanly by using collections and String.join().

It’s tempting to build a dynamic SELECT query by just performing string concatenation, but this approach is inelegant and error-prone.

String query = 'SELECT Id'; // base query
// Add fields the user picked
for (String fieldNameOne : userSelectedFields) {
    query = query + ', ' + fieldNameOne;
}
// Add other fields from a field set or Custom Metadata
for (String fieldNameTwo : someObject.fieldSetA) {
    if (!query.contains(fieldNameTwo)) {
        query = query + ', ' + fieldNameTwo;
    }
}
// Do more work to build the query...

is about as clean as this approach gets, and I’ve seen far worse. It’s easy to make simple mistakes that cause hard-to-debug syntax errors, field deduplication to avoid QueryExceptions is messy (note that the above example doesn’t actually work in all cases!), and it’s hard to handle FLS properly.

Much better is to build the SELECT clause by constructing a List<String> whose contents are the API names of the desired fields. This offers three benefits:

  1. Easy production of a final, valid SELECT clause by applying String.join(), and templating in with String.format().
  2. Easy deduplication of the list with List.contains().
  3. Easy enforcement of FLS.

One natural pattern for accumulating and querying a list of fields with FLS enforcement goes like this (note that this assumes none of the selected fields include relationship paths):

List<String> fieldList = new List<String> { 'id', 'name' };
List<String> userSuppliedFields = getUserSuppliedFields();
String sobjectName = 'Contact';

// Build a single deduplicated list of fields.
for (String s : userSuppliedFields) {
    if (!fieldList.contains(s)) {
        // List.contains() is a case-sensitive comparison.
        fieldList.add(s.toLowerCase());
    }
}

// Iterate over fieldList and check accessibility for the object we're using
Schema.DescribeSObjectResult dsr = Schema.getGlobalDescribe().get(sobjectName).getDescribe();

for (String f : fieldList) {
    if (!dsr.fields.getMap().get(f).getDescribe().isAccessible()) {
        // We don't have read permission on this field.
        // Perform some validation action - remove this field,
        // display a PageMessage or Toast, etc.

        // Note that we omit an important check - validating
        // that user input is a real field! `NullPointerException` 
        // can result from the above `if` if `getUserSuppliedFields() 
        // doesn't implement this.
    }
}

// then query:

String query = String.format(
    'SELECT {0} FROM {1}',
    new List<String> {
        String.join(fieldList, ', '),
        sobjectName
    }
);

So there we are: clean, secure, testable Dynamic SOQL with just a touch of the Standard Library.

Testing Chatter in Apex without `seeAllData=true`

One of the key limitations of the Connect API, which provides interaction with Chatter inside Apex, is that unit tests of Connect API code typically require the use of seeAllData=true:

Chatter in Apex methods don’t run in system mode, they run in the context of the current user (also called the context user). The methods have access to whatever the context user has access to. Chatter in Apex doesn’t support the runAs system method.

Most Chatter in Apex methods require access to real organization data, and fail unless used in test methods marked @IsTest(SeeAllData=true).

While some Connect API methods provide dedicated setTest...()-style methods to supply standardized data, others simply cannot be called at all in test context without the seeAllData=true annotation - something all good Apex programmers avoid like the plague!

In some very simple cases, it may be enough to guard a Connect API call with if (!Test.isRunningTest()) { }. If used as a one-liner, it’ll even still receive code coverage. But, of course, this abrogates our duty to validate code behavior in our unit tests. To do that, we need to apply dependency injection so that we can see what our code under test is doing without actually invoking the Connect API.

Here’s a simple and very explicit example. Note that I’m not using a mocking library or StubProvider here, but hand-building a mock for this situation. For simple needs in environments that don’t use a pervasive mocking framework, I like this explicit model because it’s direct and easy to follow.

The outer class PostsToChatter represents our class that calls the Connect API.

We establish an inner interface, ChatterPoster, that must be implemented by our mock or delegate class, an instance of another class we’ll pass in that does all the action work of calling the Connect API. Think of it as a very thin abstraction layer - at times, using this pattern can even help genericize code to, for example, support notifications over either Chatter or email, simply by supplying a different delegate.

public class PostsToChatter {
    public interface ChatterPoster {
        void postToChatter(Id parentRecord, String text);
    }
 
    @TestVisible private class RealChatterPoster implements ChatterPoster {
        public void postToChatter(Id parentRecord, String text) {
            if (!Test.isRunningTest()) callConnectAPIHere(parentRecord, text);
            // Note that if we keep this minimal (one line) we still obtain full code coverage even though the Connect API call does not actually go through.
        }
    }

    private ChatterPoster myPoster;

    public PostsToChatter() {
        // Add a default delegate (the real Chatter poster class)
        this(new RealChatterPoster());
    }

    // This constructor could be `@TestVisible private` if the delegate isn't part of our public API.
    public PostsToChatter(ChatterPoster delegate) {
        myPoster = delegate;
    }

    public void doSomethingRequiringAChatterPost() {
        // ... stuff happens ...
        myPoster.postToChatter(someId, someText);
    }
}

Then, in test context, we have the freedom to inject a completely different class that implements PostsToChatter.ChatterPoster, but doesn’t really call the Connect API. Meanwhile, because we’ve kept our actual-Chatter delegate so tiny and simple (and because we’ve included Test.isRunningTest() guards), we can still call it in test context to get the code coverage, even though we can’t validate its behavior like we ought to.

Here’s an example of what one approach to this test class could look like. (This omits the boilerplate test case to cover the RealChatterPoster inner class). In this simple implementation, our mock poster makes assertions when called about its own parameters. A more advanced implementation would store them for validation by the test class.

@isTest
public static class PostsToChatterTEST {
    private class MockChatterPoster implements PostsToChatter.ChatterPoster {
        public Id expectedId;
        public String expectedString;
        public Integer timesCalled = 0;

        public void postToChatter(Id parentRecord, String text) {
            System.assertEquals(expectedId, parentRecord, 'correct parent');
            System.assertEquals(expectedString, text, 'correct post body');
            timesCalled++;
        }
    }
    
    @isTest
    public static void testTheClass() {
        MockChatterPoster mp = new MockChatterPoster();
        PostsToChatter p = new PostsToChatter(mp);
        
        // ... invoke class functionality...
         
        System.assertEquals(1, mp.timesCalled, 'chatter post called');
        // Post contents are validated in the mock - or store the supplied parameters 
        // in the mock and validate them here, if preferred.
    }
}

Avoiding seeAllData=true is still possible! It’s true that many real-world Chatter applications will involve more complex uses of the large and sophisticated Connect API, of which the feed posting shown here is only a small selection. In those cases, it will take work - or might even be infeasible - to apply the dependency-injection approach. The refactoring effort is likely to pay off, however, in quicker, easier, more reliable testing, and easier extensibility for the future.

For many straightforward uses of Chatter, though, like posting of notifications, code similar to the above will be enough to cover the testing needs.

Filtering in Lightning's Activity Timeline

The Lightning Experience’s record pages come with a very nice Activity timeline and publisher.

Lightning Activity Timeline

One of the features of this timeline is highlighting different types of activities with topical icons, and permitting the user to apply filters on the fly to isolate activities of interest - like just Emails, or just Calls, or all Tasks owned by the user.

Filters for Activity Timeline

How, though, does Lightning distinguish between these different activities to populate the various filters? And further, can we control that filtration to place activities we generate under specific filter headings?

The answers turn out to be “a bit of magic” and “sort of.” Lightning recognizes five categories of activity: Emails, Events, List Emails, Logged Calls, and Tasks.

Events are distinguished from Emails, List Emails, Logged Calls, and Tasks by sObject type. Events have the object type Event, while the other four are all of type Task. The Emails, List Emails, Logged Calls, and Tasks filters work differently. All of them filter based upon the picklist value in the field TaskSubtype on the Task object.

Confusingly, this field isn’t connected to the Type field whatsoever - it’s not a dependent field, and the Type field plays no role in Activity filtering.

This field is populated by the system, and cannot be changed. At the database level, it’s createable but not updateable.

[TaskSubtype] Provides standard subtypes to facilitate creating and searching for specific task subtypes. This field isn’t updateable.

The four values of this picklist (which is restricted) are Email, ListEmail, Call, and Task, each of which maps to exactly one filter. Email and ListEmail are entirely separate; they don’t appear in one another’s filtered view.

Manipulating the TaskSubtype

Knowing, then, how Lightning sorts tasks under different filter headings, can we manipulate the filters to sort our custom activities in bespoke ways? Only to a slight extent, as it turns out.

We cannot change the behavior of the filters. The Activity timeline is a non-configurable Lightning component; we can add it to record pages, but it doesn’t have any public attributes for us to set.

Because the TaskSubtype field is createable, but not updateable, we cannot move existing records from one category to another.

The one route we have is to override the category at the time of creation of the Task, and there are some unique behaviors to this approach. When inserting records via the publisher, if the TaskSubtype is ultimately going to be 'Task' (i.e., it is not an Email, List Email, or Call), we can, in a before insert trigger on Task, set the TaskSubtype to one of the other three values. A putative Task can be transmogrified into an Email, Call, or List Email:

Task converted to Call

However, this does not work in the other direction. Records that are being created from the publisher as Emails, List Emails, or Calls can’t have their TaskSubtype overridden to 'Task', or to any of the other available values. Attempting to do so has no effect on the created Task, although it doesn’t cause an error. The TaskSubtype field is null upon publisher creation; it’s set behind the scenes at some point between the before and after trigger invocations, but by the time we reach after insert, the field’s inherent non-updateability takes over.

None of this applies to Tasks inserted via Apex. If the publisher isn’t the source of the Task, any TaskSubtype value can be transformed into any other.

There’s still one more caveat of applying this technique, though: if the Task-to-be-converted is added via the publisher, Chatter records the original type of the task in its feed:

Chatter post

This mismatch does not occur when the Task is inserted via code, which doesn’t produce a Chatter post and hence preserves the illusion of being (say) a Call the entire time.

The question of whether it’s wise to manipulate the Activity timeline in this way is another story. Because this functionality is at least somewhat undocumented, I wouldn’t rely on the publisher continuing to work exactly the same way in future API versions. Vote for this Idea to make TaskSubtype editable!

One last interesting facet: there’s similar data model on Event, with Event.EventSubtype. Like with Task, this field isn’t updateable. However, permitted values are not documented, and the picklist has only a single value, ‘Event’, which is populated on standard events. Perhaps we’ll see more functionality around timeline filtering in future releases.

fix15 - A New Tool to Support Data Loads and ETLs

I’ve released a new Python tool, fix15. This tool aims to simplify Salesforce data-load and ETL workflows by seamlessly converting 15-character Id values to their corresponding 18-character Ids in CSV data.

There are existing online Id converters, but I wanted a solution I could use from the command line as I prepped and manipulated files for data load, without copying and pasting or using complex, very slow array formulas. With fix15, just do

fix15 -c Id -c AccountId -i test.csv -o done.csv

to convert the columns “Id” and “AccountId” in the file test.csv and write the result to done.csv.

The tool is tiny, tested, and MIT-licensed. It has no dependencies outside the Python standard library and doesn’t require access to Salesforce (or, for that matter, a network connection).

Everyday Salesforce Patterns: Filtering Parent Objects By Child Objects

Sometimes, we need to filter an Account query by its Contacts, or some custom object Project__c by its associated Subject_Area__c records. There might not be rollup summary fields in place, or the criteria might go beyond what rollups can do, or we might be dealing with a lookup relationship, forcing us to express this filtration directly in SOQL or in Apex.

Suppose, for example, that we do have a custom object Project__c. A second custom object Subject_Area__c is connected to it by a lookup relationship, Linked_Project__c, with the relationship name Subject_Areas. Suppose further that Subject_Area__c also has a junction object Subject_Area_Expert__c creating a many-to-many relationship to Contact; Subject_Area_Expert__c also records the dates an expert is assigned to that subject area.

We’d like to be able to a perform a variety of queries that express different kinds of filters on the parent based upon characteristics and qualities of the different child objects:

  1. Locating Project__c records with no subject areas at all.
  2. Locating Project__c records with more than five subject areas.
  3. Locating Project__c records with subject areas whose name contains ‘Industry’.
  4. Locating Project__c records without subject areas whose name contains ‘Industry’.
  5. Locating Project__c records with at least one Subject_Area__c record that itself has at least one Subject_Area_Expert__c assignment with current dates to a Contact whose Title is “Solution Architect”.

We should also note that some of these requirements can potentially be met using native Salesforce reports with Cross Filters, or by the use of Declarative Lookup Rollup Summaries.

There are several basic ways to approach constructing these queries:

  • Using a parent-child query and performing filtering in Apex.
  • Using a parent query with an IN or NOT IN child subquery with filter (semi-join/anti-join).
  • Using a child aggregate query and postprocessing in Apex.
  • Linking multiple queries of these kinds by performing a synthetic join in Apex.

Parent-Child Query

for (Project__c sr : [SELECT Id, 
                             (SELECT Id 
                              FROM Subject_Areas__r 
                              WHERE Name LIKE '%Industry%') 
                      FROM Project__c 
                      WHERE Client__c = :clientId]) {
    if (sr.Subject_Areas__r.size() == 0) {
        // We've identified a Project on the client whose Id 
        // is `clientId` with no Subject Areas that contain 'Industry'.
        // (but potentially other Subject Areas)
        // Do something about it.
    }
}

This pattern is appropriate for use when the filtration criteria for child objects are very complex or involve relationships that cannot be easily expressed in SOQL, and for situations where we are looking for a zero count of child objects. Note that the WHERE filters on the parent and child query are, strictly speaking, optional, although performance in many cases will necessitate selective query filters. It is a simple pattern and can implement requirements 1-4, but not requirement 5, as only one level of child relationship can be traversed.

This can be an anti-pattern for situations with high data volume in the parent or child object, when query performance will be a huge concern and heap size could become an issue. Adding selective filters on the parent object, adding filters on the child object, and reducing the number of columns queried will help obviate these issues.

While parent-child queries can only descend one level of relationship in the object hierarchy, Apex post-processing can gather child object Ids and re-query to descend additional plies.

Parent Query with IN or NOT IN Child Subquery (Semi-Join/Anti-Join)

Parent queries with IN and NOT IN child subqueries are particularly useful for cases 3 and 4. This query format takes advantage of the fact that the subquery is treated as returning a typed Id value, not an sObject instance. If we ran the subquery below, on Subject_Area__c, separately in Apex, we’d get back a List<Subject_Area__c>, which we’d have to iterate over and accumulate parent Project__c Ids before re-querying that object. When we present it as a subquery, no intermediate steps or type conversion are required; the Ids are used directly. (Salesforce does require that they be Ids of the correct type of object, however).

SELECT Id 
FROM Project__c 
WHERE Id NOT IN (SELECT Linked_Project__c 
                 FROM Subject_Area__c 
                 WHERE Name LIKE '%Industry%')

See Semi Joins and Anti-Joins for more in-depth information, including the numerous restrictions that apply to this type of query. In particular, note that you can only use two IN semi-joins or anti-joins in a single query.

It’s also important to note that this pattern isn’t limited to parent-child filtration. The semi-join and anti-join can be used to filter Object A based upon Object B using any shared reference field, including reference fields on the two objects that both point to some other Object C. See the Salesforce documentation linked above for in-depth discussion of more use cases.

This pattern is particularly suitable for case 3 and case 4, where we’re looking for parent objects based on specific criteria on (but not volume of) child objects. Note that it can return parent object data but doesn’t include child object data unless we include a separate sub-select.

In particularly complex cases or if multiple levels of subquery are required, it’s necessary to run the subqueries separately in Apex and accumulate relevant Ids in a Set for inclusion in the next query layer. Salesforce only allows one level of semi-join or anti-join, so we perform a sort of synthetic join in Apex of two separately expressed SOQL queries. For example, we can cover case 5 in a fashion like this:

// 5. Locating `Project__c` records with at least one `Subject_Area__c` record that itself has at least one `Subject_Area_Expert__c` assignment with current dates to a Contact whose Title is "Solution Architect".

List<Subject_Area_Expert__c> experts;

experts = [SELECT Subject_Area__c
           FROM Subject_Area_Expert__c
           WHERE Contact__r.Title = 'Solution Architect'
                 AND Start_Date__c <= TODAY
                 AND End_Date__c >= TODAY];
Set<Id> subjectAreaIds = new Set<Id>();

for (Subject_Area_Expert__c e : experts) {
    subjectAreaIds.add(e.Subject_Area__c);
}

List<Project__c> = [SELECT Id 
                    FROM Project__c 
                    WHERE Id IN (SELECT Linked_Project__c 
                                 FROM Subject_Area__c 
                                 WHERE Id IN :subjectIds)];

The same Apex/SOQL pattern can be extended to cover a more or less arbitrary depth of complexity, up to the point where governor limits are implicated.

Child Aggregate Query

The child aggregate query is suitable for locating (and ordering) parent records based on a non-zero count of child records, optionally matching the child records against some criterion. As such, it can cover our Cases 2 and 3 well. It’s not suitable for any situation where parent records without children should be included. It can provide a count of child records matching specific criteria, but doesn’t return the child or parent record data itself. The query can express some filtration that would otherwise need to be performed in Apex.

SELECT count(Id), Linked_Project__c 
FROM Subject_Area__c 
WHERE Name LIKE '%Industry%' 
GROUP BY Linked_Project__c 
HAVING count(Id) > 2 
ORDER BY count(Id) DESC

This query will give us a List<AggregateResult> with the Ids of every Project__c having at least two associated Subject_Area__c records whose Names contain ‘Industry’, in descending order of the count of such records. We’ll have to re-query to get more information about the Projects themselves or otherwise post-process the AggregateResult list in Apex.

In large data volume environments, selectivity may be a challenge because the filters on the child object that are relevant to the desired parent object set may not be especially narrow relative to the overall child object data set. Testing and query planning will help pinpoint any performance issues.