Skip to content

Latest commit

 

History

History
201 lines (106 loc) · 5.37 KB

01Ingest.md

File metadata and controls

201 lines (106 loc) · 5.37 KB

01. Ingest Data

Architecture

In this lab, we'll build a data movement pipeline in Azure Data Factory to move data from a web service to Blob Before you follow the steps, please make sure concept of ADF

  • Connections
  • Linked Service
  • Dataset
  • Activites
  • Pipeline

01

Azure Data Factory Concept

01

0. Access to Azure Portal

Go to Azure Portal for this lab

1. Create Azure Data Factory

Click on '+ Create a resource' icon on the left panel, and search Data Factory from the search bar

00

Select Data Factory and click create button, then you'll see New data factory blades

Fill out the from and click on Create

01

When the Data Facotry is created, pin it to dashboard so you can access the service easy

02

2. Create Data Pipeline

Create connections, datasets for source and destination. And then create pipeline that copies source data to destination.

Open the Data Factory and click on Author&Monitor link form the blades

03

2.1. Create Connections

linked services

Click on Author and then click Connections at the bottom of the screen

04

To add new Linked services (e.g. connection string), click on + NEW

An then Search 'HTTP' from the search bar and select the HTTP and then click 'Continue' to configure a source dataset.

05

Create connection for source, name it as 'src_web_churn_csv'

Use following link for 'Base URL'

https://raw.githubusercontent.com/Azure/MachineLearningSamples-ChurnPrediction/master/data/CATelcoCustomerChurnTrainingSample.csv

Select 'Anonymous' for 'Authentication type'

Click 'Finish'

06

Next you need to create Linked Service for destination

linked services

Add new Linked services for Azure blob which will be used for store source data click on + NEW and search Blob

06

Create connection for destination, name it as 'dst_blob_datalake'

08

2.2. Create Datasets

2.2.1. Create Source dataset

You have connection information to services and now you can define dataset on source storage

source dataset

Click on '+ [add new factory resource]' button and click 'Dataset' to create source dataset

09

Search 'HTTP' and click. And click 'Finish' to configure a source dataset

Type name 'web_churn_csv' for the source dataset

10

Configure Linked Service for the connection of this source dataset which is 'src_web_churn_csv'

11

  • Check on 'Column names in the first row' check box

12

2.2.2. Create Destination dataset

source dataset

Click on '+ [add new factory resource]' button and click 'Dataset' to create destination dataset

Search 'Blob' and click. And click 'Finish' to configure a destination dataset

Type name 'blob_churn_csv' for the destination dataset

13

Click on Connection tab to configure Linked Service for the connection of destination dataset which is 'dst_blob_datalake'

14

Type 'ingest' for folder and 'customerchurnsource.csv' for file name

1401

Check on 'Column names in the first row' check box

15

2.3. Create Pipeline

2.3.1. Create Pipeline

source dataset

To Create a new pipeline, click on '+ [add new factory resource]' and then click 'Pipeline'

16

Name the pipeline as 'copy_churn_web__blob'

17

2.3.2. Create Activity

source dataset

Drag and drop 'Copy Data' module from 'Move & Transform activity section to canvas

18

Name the activity as 'Copy Data Activity'

19

Click on the activity 'Copy Data Activity', select 'Source' tab in the bottom of the screen and then select 'web_churn_csv' for the 'Source Dataset'

20

Click 'Sink' tab and select 'blob_churn_csv' for the 'Sink Dataset'

21

Click 'Publishing' button

22

3. Run the job

To manualy start the data pipeline, Click on 'Trigger Now' of 'Trigger' and then click 'Finish' to run the pipeline.

23

4. Monitor the job

From the lift panel, click monitor icon to see pipeline run status and history of jobs

24

To see detail logs of activies in pipeline, click on pipeline icon

26

Click on input, output and detail to see raw log of the activities

27


Next > 02. Data Wrangling


Main