1) Why do we need Azure Data Factory?
Azure Data Factory doesn’t store any data itself; it lets you produce workflows that orchestrate the movement of data between supported data stores and data processing. You can monitor and manage your workflows using both programmatic and UI mechanisms. Apart from that, it is the best tool available today for ETL processes with an easy-to-use interface. This shows the need for Azure Data Factory.
2) What is Azure Data Factory?
Azure Data Factory is a cloud-based integration service offered by Microsoft that lets you create data-driven workflows for orchestrating and automating data movement and data transformation over the cloud. Data Factory services also offer to create and run data pipelines that move and transform data and then run the pipeline on a specified schedule.
3) What is Integration Runtime?
ntegration runtime is nothing but a computing structure used by Azure Data Factory to give integration capabilities across different network environments.
Types of Integration Runtimes:
Azure Integration Runtime – It can copy data between cloud data stores and dispatch the activity to a variety of computing services such as SQL Server, Azure HDInsight
Self-Hosted Integration Runtime – It’s software with basically the same code as Azure Integration runtime, but it’s installed on- on-premises systems or virtual machines over virtual networks.
Azure SSIS Integration Runtime – It helps to execute SSIS packages in a managed environment. So when we lift and shift the SSIS packages to the data factory, we use Azure SSIS Integration Runtime.
4) How much is the limit on the number of integration runtimes?
There’s no specific limit on the number of integration runtime instances. But there’s a limit on the number of VM cores used by Integration runtime grounded on per subscription for SSIS package execution.
5) What are the different components used in Azure Data Factory?
Azure Data Factory consists of several numbers of components. Some components are as follows:
Pipeline: The pipeline is the logical container of the activities.
Activity: It specifies the execution step in the Data Factory pipeline, which is substantially used for data ingestion and metamorphosis.
Dataset: A dataset specifies the pointer to the data used in the pipeline conditioning.
Mapping Data Flow: It specifies the data transformation UI logic.
Linked Service: It specifies the descriptive connection string for the data sources used in the channel conditioning.
Trigger: It specifies the time when the pipeline will be executed.
Control flow: It’s used to control the execution flow of the pipeline activities.
6) What is the key difference between the Dataset and Linked Service in Azure Data Factory?
The dataset specifies a source to the data store described by the linked service. When we put data to the dataset from an SQL Server instance, the dataset indicates the table’s name that contains the target data or the query that returns data from dissimilar tables.
Linked service specifies a definition of the connection string used to connect to the data stores. For illustration, when we put data in a linked service from a SQL Server instance, the linked service contains the name for the SQL Server instance and the credentials used to connect to that case.
7) How many types of triggers are supported by Azure Data Factory?
Tumbling Window Trigger: The Tumbling Window Detector executes the Azure Data Factory pipelines over cyclic intervals. It’s also used to maintain the state of the pipeline.
Event-based Trigger: The Event-based Trigger creates a response to any event related to blob storage. These can be created when you add or cancel blob storage.
Schedule Trigger: The Schedule Trigger executes the Azure Data Factory pipelines that follow the wall clock timetable.
8) What is Blob Storage in Azure?
It helps to store a large amount of unstructured data similar to text, images, or double data. It can be used to expose data intimately to the world. Blob storage is most commonly used for streaming audio or videos, storing data for backup, and disaster recovery, storing data for analysis, etc. You can also create Data Lakes using blob storage to perform analytics.
9) What are the top-level concepts of Azure Data Factory?
Pipeline – It acts as a carrier where lots of processes take place.
Activities – It represent the steps of processes in the pipeline.
Data Sets – It is a data structure that holds our data.
Linked Services– These services store information that’s essential while connecting the resources or other services. Let‘s say we have an SQL server, so we need a connecting string connected to an external device, and we will mention the source and the destination for it.
10) How can we schedule a pipeline?
The trigger follows a world clock calendar schedule that can schedule pipelines periodically or in calendar-based recurrent patterns. We can schedule a pipeline in two ways:
Schedule Trigger
Window Trigger
Top comments (0)