On This Page

Qualities of an Enterprise Data Warehouses
One of the qualities of an Enterprise Data Warehouse (EDW) and there are many, is the fact that the complexity of obtaining source data is taken care of and need not be a consideration when utilizing/consuming the data in an EDW. This becomes even more apparent when the various sources required in your EDW are spread across multiple locations such as web-based accounting systems, on-premise survey data, or cloud-based CRM (Customer Relationship Management) systems.
The Salesforce requirement
During a recent project, one of the needs while developing an EDW for an international health advisory client was a Power BI suite of visuals specific to CRM-related data. (Another quality of a correctly modelled EDW is that most any data consumer such as SAS/R, Power BI/Tableau/Qlik, SSAS can quickly connect and understand the data relationships with very little effort).
In this case, the CRM data source was the well-known (and liked I believe) Salesforce system. The general consensus among EDW developers is a low concern when it is known that one of the future marts to be developed and included in the EDW, originates from the web. The reason for this is due to the fact that most such systems have a great set of API calls that can be utilized to ingest the required data into the EDW. Salesforce is such a system – it has a great set of API calls to do just this. There is a ‘but’ however. While Salesforce has a well-varied set of standard API calls it also has equally great security in place to prevent data leaks and data breaches while making API calls. This is generally true for any API web call with Basic, API Key, or JWT authentication methods being utilized. In the case of Salesforce, OAuth2 is the required authentication method but there is more complexity than just OAuth2 to keep life interesting. (I really like those Salesforce hoodies.)
How do we make Salesforce API calls
If you read the general OAuth2 summary documentation you would have seen that the basic terminology parts include:
- The Client
- Resources
- The Authorization Server
- The end-user
This applies to Salesforce in the following manner:
The Client
In this situation, the client is a connected application that is created within Salesforce. The connected application then internally obtains access to the Salesforce resources which are the actual CRM records we require for EDW ingestion.
To create the connected application within Salesforce:
1. Navigate to Setup->App Manager
2. Click New Connected App

3. Complete the Basic information and enable the following API (Enable OAuth Settings)

4. Click Manage Consumer Details. Copy the Consumer Key and Consumer Secret and save it somewhere safe.


Resources
These are the actual Salesforce CRM records we require for EDW ingestion. These records are output in .json format and are ingested via ADF.
The Authorization Server
During this phase, we obtain an access token in the form of a Bearer token. This is achieved by running an API POST to our Salesforce instance where the Connected App has been created. We pass our Consumer Key and Consumer Secret in the body in order to obtain the token. Once we have a valid Bearer token it is possible to make custom API calls as required in order to obtain actual data.
Something to note here is that the token does expire after 2 hours as a default unless these settings have been changed in Salesforce.
The end-user
In this situation the end-user will be Azure Data Factory (ADF) and the communication will be machine-to-machine which is supported by the OAuth2protocol. More on this below.
This highlights the Salesforce method of implementing OAuth2 which takes some time to create. There may be a few user security issues that require administrative privileges to resolve along the way.
Please note -utilizing a Salesforce-connected app is not the only way to extract Salesforce records. It is possible to achieve similar results using Apex classes if you prefer a more traditional object orientated coding approach and it is likely that there are additional methods based on the complexity of Salesforce. It is slightly more cumbersome than the connected app approach.
How do we incorporate the API calls into ADF
It should be mentioned that Azure Data Factory (ADF) includes a direct connector to Salesforce which alleviates the complexity covered thus far. Why not just use the Salesforce ADF connector? We are then limited to the standard API calls which will probably work to extract some of the data. The standard API calls will not work if you need to join backend Salesforce tables which is accomplished using Salesforce Object Query Language (SOQL) or if you choose the less expensive & more efficient incremental EDW load approach which is always recommended.
ADF Steps
1. Obtain the bearer token from your Salesforce connected app using your credentials by running a POST API call


2. Use the Web task output (which is the bearer token) in a Web task GET as follows:

Benefits
The API interaction between Azure Data Factory and Salesforce is well secured and we need only provide ADF with our Consumer Key and Consumer Secret rather than our actual Salesforce front-end login (should we have one). This approach also saves on Salesforce licensing costs in that API users can be setup which is more cost effective from a licensing point of view in comparison to a Salesforce front-end user.
References
- Salesforce: https://www.salesforce.com/eu/
- API authentication methods: https://frontegg.com/guides/api-authentication-api-authorization
- OAuth2: https://medium.com/web-security/understanding-oauth2-a50f29f0fbf7
- SOQL: https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql.htm