In this lab, we'll build a data movement pipeline in Azure Data Factory to move data from a web service to Blob Before you follow the steps, please make sure concept of ADF
- Connections
- Linked Service
- Dataset
- Activites
- Pipeline
Go to Azure Portal for this lab
Click on '+ Create a resource' icon on the left panel, and search Data Factory from the search bar
Select Data Factory and click create button, then you'll see New data factory blades
Fill out the from and click on Create
When the Data Facotry is created, pin it to dashboard so you can access the service easy
Create connections, datasets for source and destination. And then create pipeline that copies source data to destination.
Open the Data Factory and click on Author&Monitor link form the blades
Click on Author and then click Connections at the bottom of the screen
To add new Linked services (e.g. connection string), click on + NEW
An then Search 'HTTP' from the search bar and select the HTTP and then click 'Continue' to configure a source dataset.
Create connection for source, name it as 'src_web_churn_csv'
Use following link for 'Base URL'
https://raw.githubusercontent.com/Azure/MachineLearningSamples-ChurnPrediction/master/data/CATelcoCustomerChurnTrainingSample.csv
Select 'Anonymous' for 'Authentication type'
Click 'Finish'
Next you need to create Linked Service for destination
Add new Linked services for Azure blob which will be used for store source data click on + NEW and search Blob
Create connection for destination, name it as 'dst_blob_datalake'
2.2.1. Create Source dataset
You have connection information to services and now you can define dataset on source storage
Click on '+ [add new factory resource]' button and click 'Dataset' to create source dataset
Search 'HTTP' and click. And click 'Finish' to configure a source dataset
Type name 'web_churn_csv' for the source dataset
Configure Linked Service for the connection of this source dataset which is 'src_web_churn_csv'
- Check on 'Column names in the first row' check box
2.2.2. Create Destination dataset
Click on '+ [add new factory resource]' button and click 'Dataset' to create destination dataset
Search 'Blob' and click. And click 'Finish' to configure a destination dataset
Type name 'blob_churn_csv' for the destination dataset
Click on Connection tab to configure Linked Service for the connection of destination dataset which is 'dst_blob_datalake'
Type 'ingest' for folder and 'customerchurnsource.csv' for file name
Check on 'Column names in the first row' check box
2.3.1. Create Pipeline
To Create a new pipeline, click on '+ [add new factory resource]' and then click 'Pipeline'
Name the pipeline as 'copy_churn_web__blob'
2.3.2. Create Activity
Drag and drop 'Copy Data' module from 'Move & Transform activity section to canvas
Name the activity as 'Copy Data Activity'
Click on the activity 'Copy Data Activity', select 'Source' tab in the bottom of the screen and then select 'web_churn_csv' for the 'Source Dataset'
Click 'Sink' tab and select 'blob_churn_csv' for the 'Sink Dataset'
Click 'Publishing' button
To manualy start the data pipeline, Click on 'Trigger Now' of 'Trigger' and then click 'Finish' to run the pipeline.
From the lift panel, click monitor icon to see pipeline run status and history of jobs
To see detail logs of activies in pipeline, click on pipeline icon
Click on input, output and detail to see raw log of the activities