Creating Data Driven Workflows in the Cloud
The Proliferation of Data
Organizations are constantly searching for ways manage, use and extract value from their data, the sources and available types of which continue to grow exponentially:
- 90 percent of the world’s current digital data was produced in the last two years.
- The U.S. alone produces upward of 2.6 million gigabytes of internet data every minute.
Menlo Technologies has helped many of our clients manage their data with Azure Data Factory (ADF) a cloud-based data integration service. Here is the story of one those clients.
SFDC Data Feed
The purpose of the integration was to develop flows using Azure data factory which will move data from SFDC to Azure SQL and Azure SQL to SFDC. Their objective was to get leads from Sales Force through Azure Data Factory (ADF).
Azure Data Factory (ADF):
- Allows users to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.
- Allows users to create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores.
How does the Azure Data Factory Work?
The pipelines in Azure Data Factory perform the following four steps:
Connect to all the required sources of data and processing, such as SaaS, APIs, Databases, CSV or Excel files, and FTP web services.
Collect with Data Factory, you can use the Copy Activity in a data pipeline to move data from both on-premises and cloud source data stores to a centralization data store in the cloud for further analysis:
- Collect data in Azure Data Lake Store and transform the data later by using an Azure Data Lake Analytics compute service
- Collect data in Azure Blob storage and transform it later by using an Azure HDInsight Hadoop cluster.
Transform and enrich data that is in a centralized data store in the cloud, process or transform the collected data by using compute services such as SSIS, HDInsight Hadoop, Spark, Data Lake Analytics, and Machine Learning. Reliably produce transformed data on a maintainable and controlled schedule to feed production environments with trusted data.
Publish: After the raw data has been refined into a business-ready consumable form, load the data into Azure Data Warehouse, Azure SQL Database, Azure CosmosDB, or whichever analytics repository your business users can point to from their business intelligence tools.
Monitor the scheduled activities and pipelines for success and failure rates. Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log Analytics, and health panels on the Azure portal. The operations monitoring console in the Data Factory is sufficient, but any Client facing console will need custom development.
Top – Level Concepts of Azure Data Factory
- Pipeline which allows you to manage the activities as a set instead of managing each one individually.
- Activity such as data movement data transformation and control activities.
- Datasets represent data structures within the data stores, which simply point to or reference the data you want to use in your activities as inputs or outputs.
- Linked services define the connection to the data source, and a dataset represents the structure of the data.
- Triggers represent the unit of processing that determines when a pipeline execution needs to be kicked off.
- Parameters are key-value pairs of read-only configuration. Parameters are defined in the pipeline.
- Control flow is an orchestration of pipeline activities that includes chaining activities in a sequence, branching, defining parameters at the pipeline level, and passing arguments while invoking the pipeline on-demand or from a trigger.
SFDC and BLOB to Database Flow:
These integrations will get the Leads information from SFDC and .txt file and performs an upsert operation in Salesforce.
SFDC & Blob to Data Base Steps
- Get Leads from Salesforce
- Get Updated Leads from file storage
- Perform data transformations if any
- Write to Azure SQL
- Email the Status
Database to SFDC Flow: This integration will get data from Azure SQL and performs and upsert operation on Salesforce Leads object.
Menlo Technologies is a global computer technology services company specializing in cloud integration, data analytics, and mobile technology. Our global delivery model for IT solutions provides a framework for exceeding customer expectations in all dimensions – quantity, time and cost.