Handling Nullable Activity Outputs using ‘?’
Author: Ralph Gonzales
Life as a Data Engineer requires us to handle data integration across multiple source systems, and Azure Data Factory has been our go-to platform for orchestrating and managing data workflows efficiently. One of the lesser-known and lesser-documented, yet immensely powerful operators within ADF expressions is the question mark (?). This operator not only allows the user to handle nullable activity outputs but also handles nullable properties. In this blog, we will explore how this simple, yet impactful operator can significantly enhance the robustness of your data pipelines when dealing with potentially null properties.
The Challenge of Nullable Activity Outputs
Nullable properties can introduce complexities in data processing workflows, potentially disrupting the seamless execution of your ADF pipelines. Whether it’s a missing timestamp, an absent key field, or any other nullable property, ensuring graceful handling is crucial for maintaining the reliability of your pipelines.
A recent client engagement as an example
I was required to extract data from a cloud-based system using API calls. I used ADF to orchestrate the extracted data’s extraction and storage in a Data Lake. The API Calls that I make to the source system return data in JSON format. Data would often become fragmented as APIs are limited in the number of data they can retrieve in a single call. This means for large datasets, I had to make several API calls with different page numbers to extract a complete dataset. I’ve designed my pipelines to handle pagination, which relies heavily on a field named ‘totalHits’. However, this field only appears in the output if the number of items exceeds the maximum items allowed for a single API Call. This poses a significant risk to the pipelines failing in run-time due to the inconsistency of the field being available in the output.
The ‘?’ Operator to the rescue
The question mark, as a conditional operator, enables you to evaluate whether a property within your activity is in the output. The operator will return null by default if the property you’re evaluating is missing instead of the pipelines failing.
The syntax is straightforward. For example, here is a code snippet I used in my scenario above.
Here, the field ‘totalHits’ represented the property I was assessing in the Web Activity output. This simple yet powerful expression saved me from creating a long, complicated IF statement that can alternatively provide the evaluation and save the pipeline from failing.
I consider the question mark ‘?’ operator an unsung hero when dealing with nullable properties. Its ability to conditionally evaluate potential nullable properties ensures that your ADF pipelines are resilient to failures and flexible in accommodating the ever-changing nature of data. Its simplistic application saves developers from otherwise lengthy code and additional workload in the pipelines.