Extract, Transform, and Load (ETL) is a common business function that is often necessary for integration between different systems. Traditional approaches often involve setting up dedicated FTP servers then deploying scheduled jobs to parse files and translate them for business use. Serverless architecture makes the job easier because a trigger can fire when the file is uploaded. Azure Functions is small pieces of code that focus on a specific problem and are ideal for tackling tasks like ETL.
This tutorial demonstrates how to take a legacy process and migrate it to run 100% in the cloud using the Azure SQL Database-as-a-Service offering combined with Azure Functions. You start with a local development environment, build and test the process end-to-end, then create the cloud-based assets and deploy to Azure.
The following concepts are covered in this lab. Click on the following links to learn more about each functional area that is covered in the lab exercises.
- Azure for .NET Developers
- Azure Functions
- Azure SQL
- Azure Storage
- Azure Blob Storage
- Entity Framework
- Create an Azure Functions project
- Test the project locally
- Create the SQL database
- Add and test the code to update the database
- Create an Azure SQL Database
- Migrate the database
- Deploy the project to Azure
- Test uploads
This exercise introduces you to Azure Functions along with the ability to emulate storage and debug functions locally. The Azure Functions host makes it possible to run a full version of the functions host on your development machine.
- Visual Studio 2017 15.5 or later
- The Azure Development workload
- Azure Functions and Web Jobs Tools (15 or later) (Should be installed automatically with the Azure Development workload)
-
Open Visual Studio 2017.
-
Select
File
thenNew Project
and choose theAzure Functions
template. EnterFileProcessor
for theName
. ClickOK
. -
In the Azure Functions dialog, choose the
Azure Functions v1 (.NET Framework)
host and select theEmpty
project template. Make sure thatStorage Emulator
is selected for theStorage Account
. (This automatically sets up connection strings for storage emulation.) ClickOK
and wait for the project to create. -
Right-click on the project name in the Solution Explorer and choose
Add
thenNew Item...
-
Select
Azure Function
for the item and give it the nameFileProcessFn.cs
and clickAdd
. -
In the next dialog, choose the
Blob trigger
template. You can leaveConnection
blank or populate it withAzureWebJobsStorage
. Typeimport
for thePath
. -
After the class is created, ensure it looks like this (if you did not fill out the
Connection
in the previous step, you can add it here):namespace FileProcessor { public static class FileProcessFn { [FunctionName("FileProcessFn")] public static void Run([BlobTrigger("import/{name}", Connection = "AzureWebJobsStorage")]Stream myBlob, string name, TraceWriter log) { log.Info($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes"); } } }
-
In the Solution Explorer, open
local.settings.json
. It should have development storage set, like this:{ "IsEncrypted": false, "Values": { "AzureWebJobsStorage": "UseDevelopmentStorage=true", "AzureWebJobsDashboard": "UseDevelopmentStorage=true" } }
Learn more about Azure Blob Storage Triggers.
Now you've successfully created a function. The next exercise will walk through how to test it on your local machine.
-
Launch the Storage Emulator by following the directions here.
-
Open Storage Explorer and navigate to
Blob Containers
in developer storage. -
Right-click on
Blob Containers
and chooseCreate Blob Container
. This opens a node that you can type the name for the container:import
. HitENTER
and the container details load. -
In Visual Studio, click the debug button or press
F5
to start debugging. -
Wait for the functions host to start running. The console eventually shows the text
Debugger listening on [::]:5858
(your port may be different.) -
In the Storage Explorer window for the
import
container, click theUpload
button and choose theUpload folder...
option. -
In the Upload Folder dialog, select the
data
folder that is provided with this tutorial. Make sureBlob type
is set toBlock blob
andUpload to folder (optional)
is empty. ClickUpload
. -
Confirm the files in the folder were processed by checking the logs in the function host console window.
-
Stop the debugging session
-
Delete the
data
folder and files from the storage emulator.
The Azure Function is ready. In the next exercise, you will create a database and table to process the files into.
This exercise walks through creating the local SQL database for testing.
- SQL Server Express (the full SQL Server version is fine too)
- SQL Server Management Studio (SSMS)
-
Open SQL Server Management Studio and connect to your local server instance.
-
Right-click on the
Databases
node and chooseNew Database...
-
For the
Database name
typetodo
. Adjust any other settings you desire and clickOK
. -
Right-click on the
todo
database and chooseNew Query
. In the window that opens, type the following commands:CREATE TABLE TodoItems (Id Int Identity, Task NVarChar(max), IsComplete Bit); INSERT TodoItems(Task, IsComplete) VALUES ('Insert first record', 1); SELECT * FROM TodoItems;
-
Click
Execute
. -
Confirm that a single result is returned with "Insert first record" as the task.
The database is now ready. The next exercise will walk you through enhancing your function to parse the file and insert records into the database.
The local database is ready to test. In this exercise, you use Entity Framework to insert the records you parse from the uploaded files into the SQL database.
-
Add the connection string for SQL Server to
local.json.settings
. It should look like this (example assumes SQL Express):{ "IsEncrypted": false, "Values": { "AzureWebJobsStorage": "UseDevelopmentStorage=true", "AzureWebJobsDashboard": "UseDevelopmentStorage=true" }, "ConnectionStrings": { "TodoContext": "Server=localhost\\SQLEXPRESS;Database=todo;Trusted_Connection=True;" } }
-
In Visual Studio, add a class file named
TodoItem.cs
and populate it:namespace FileProcessor { public class TodoItem { public long Id { get; set; } public string Task { get; set; } public bool IsComplete { get; set; } } }
-
Open the
Package Manager Console
(underTools
) and type:Install-Package EntityFramework
-
Add another class file named
TodoContext.cs
then include the following code to define the database connections. The connection string name is passed to the base class in the constructor:using System.Data.Entity; namespace FileProcessor { public class TodoContext : DbContext { public TodoContext() : base("TodoContext") { } public DbSet<TodoItem> TodoItems { get; set; } } }
-
Open
FileProcessFn.cs
and change theRun
method to be asynchronous by replacingvoid
withasync Task
. Be sure to addusing System.Threading.Tasks
to the top of the file. -
After the
log.Info
statement, add the structure for reading lines from the stream:if (myBlob.Length > 0) { using (var reader = new StreamReader(myBlob)) { var lineNumber = 1; var line = await reader.ReadLineAsync(); while (line != null) { await ProcessLine(name, line, lineNumber, log); line = await reader.ReadLineAsync(); lineNumber++; } } }
-
Implement the
ProcessLine
method:private static async Task ProcessLine(string name, string line, int lineNumber, TraceWriter log) { if (string.IsNullOrWhiteSpace(line)) { log.Warning($"{name}: {lineNumber} is empty."); return; } var parts = line.Split(','); if (parts.Length != 2) { log.Error($"{name}: {lineNumber} invalid data: {line}."); return; } var item = new TodoItem { Task = parts[0] }; if ((int.TryParse(parts[1], out int complete) == false) || complete < 0 || complete > 1) { log.Error($"{name}: {lineNumber} bad complete flag: {parts[1]}."); } item.IsComplete = complete == 1; }
-
After setting the
IsComplete
flag, add the logic to check for duplicates and insert the record if it is unique:using (var context = new TodoContext()) { if (context.TodoItems.Any(todo => todo.Task == item.Task)) { log.Error($"{name}: {lineNumber} duplicate task: \"{item.Task}\"."); return; } context.TodoItems.Add(item); await context.SaveChangesAsync(); log.Info($"{name}: {lineNumber} inserted task: \"{item.Task}\" with id: {item.Id}."); }
-
Press
F5
to debug. In Azure Storage Explorer, uploadGoodData.csv
to theimport
container. You should see several success messages the functions console. -
Upload
BadData.csv
and verify only a few records are processed and errors are printed. -
Open SQL Server Management Studio and run the query:
SELECT * FROM TodoItems
-
Verify you receive results similar to this:
-
Delete the imported tasks by executing this SQL statement:
DELETE FROM TodoItems WHERE Id > 1
Now the project is successfully running locally. The next few exercises demonstrate how to move the process to Azure.
Next, you create an Azure SQL database in the cloud. This exercise uses the Azure portal.
-
Choose
Create a resource
and search for or selectSQL Database
. -
Enter a unique
Database name
. -
Choose your
Azure subscription
. -
Select the
Create new
option forResource group
and entermy-todo-hol
. -
Keep the default
Blank database
forSelect source
. -
Click
Configure required settings
forServer
. -
Select
Create new server
. -
Enter a unique
Server name
. -
Provide a login and password. Note: be sure to save your credentials!
-
Pick your preferred
Location
. -
Click the
Select
button. -
Click
Pricing tier
. -
Slide the
DTU
bar to the lowest level for this lab. -
Tap
Apply
. -
Check
Pin to dashboard
. -
Click
Create
. -
Once the database is created, navigate to the
Overview
for your database and selectSet server firewall.
-
Click
Add client IP
to add your IP address, then clickSave
. Test that you can access the database by connecting from SQL Server Management Studio.
Wait for the deployment to complete (you will receive an alert) and then continue to the next exercise.
You can follow the steps in Exercise 3 (Create the SQL Database) to create and populate the Azure SQL tables, or you can migrate from your local SQL instance. If you choose to create the table yourself, you may skip this exercise.
-
Open the Microsoft Data Migration Assistant
-
Click the plus sign to start a new project, check
Migration
, give the project a name and make sure theSource server type
isSQL Server
and theTarget server type
isAzure SQL
with aMigration scope
ofSchema and data
. ClickCreate
. -
Fill out the credentials for the source server, click
Connect
, then select the database you created in Exercise 3. ClickNext
. -
Fill out the credentials for the target server (Azure SQL) and click
Connect
then select the database you created in Exercise 5. ClickNext
. -
In the next dialog, make sure only
dbo.TodoItems
underTables
is checked and clickGenerate SQL script
. -
The next dialog will show you SQL script to create the table. Click
Deploy schema
to deploy the table to Azure SQL. -
Verify the deployment was successful, then click
Migrate data
. -
Click
Start data migration
. -
Verify the migration was successful. You can test the migration by browsing the data in SQL Server Management Studio.
Now that the Azure SQL database is ready, you deploy the function to Azure.
In this exercise, you use the built-in Visual Studio tools to create a function app and app service plan and deploy your function to the cloud.
-
Inside Visual Studio, from the Solution Explorer right-click on the project name and choose
Publish...
. -
Choose
Azure Function App
, checkCreate New
, and clickPublish
. -
Give the app a unique name, choose your
Subscription
, and select the sameResource Group
that you used for the Azure SQL Server. ForApp Service Plan
clickNew...
. -
Give the plan a unique name, choose the
Location
, and pickConsumption
for theSize
. ClickOK
. -
Back in the
Create App Service
dialog, clickCreate
. -
The publish shows build output and eventually the text
Publish completed.
when it's done. -
Open your Azure SQL Database in the Azure portal and navigate to
Connection Strings
. Copy the connection string forADO.NET
. -
Navigate to the function app in the portal. Click
Application settings.
-
Scroll to the
Connection strings
section. Click+ Add new connection string
. TypeTodoContext
for the name, paste the value from step 7 (be sure to update{your_username}
and{your_password}
to the correct values), and set the type toSQLAzure
. -
Above the
Connection strings
section isApplication settings
. Note theAccountName
from theAzureWebJobsStorage
entry to get the storage account name. -
Scroll to the top and click
Save
.
Everything has been deployed and configured in Azure. In the next step, you will create the import
container and upload files to test the cloud implementation.
Now the function app is deployed and configured. This last exercise helps you create a blog container, upload the file, and test the processing trigger.
-
Navigate to the storage account from step 10. It should be the only storage account in your resource group. Click on
Blobs
. -
From the Blob service page, click
+ Container
to add a new container. -
Type
import
for the name, leavePublic access level
atPrivate (no anonymous access)
and clickOK
. -
Once the container is created, click on the container name (
import
) to open the container, then clickUpload
. -
Click the folder to browse to the
GoodData.csv
file, choose the file and clickUpload
. -
Navigate back to the function app and click
Monitor
. -
Wait for the logs to appear (use the refresh button if necessary). After the log appears, click on the log entry to view the log information and verify the data was inserted.
-
Use SQL Server Management Server to verify the records.
-
Repeat steps 4 - 7 with the
BadData.csv
file.
Congratulations! You have successfully completed this lab to create a serverless function that imports files into a SQL database.