Skip to content

Setup and Running the code

Rajdeep Biswas edited this page Jun 11, 2020 · 1 revision

Setup and Running the code

  1. Create a free Azure account. Refer: Azure Account or use an existing subscription.
  2. Create a storage account and a container. Refer: Create Blob Storage And Create Blob Container Note: You need to change the name of the Sink Blob Account Name and Sink Blob Container Name in the SparkRNotebook [Step01a_Setup] (https://github.com/microsoft/A-TALE-OF-THREE-CITIES/blob/master/dbc/Step01a_Setup.dbc) in Step 9
  3. Create a Shared Access Signature and copy the query string. Refer to the steps below. More information here: Create SAS token sas_setup
  4. From Azure portal create a key vault and then create a secret with the sas token retrieved from previous step. Refer Create Azure KeyVault
  5. Create a Azure databricks workspace and a spark cluster. Refer: Create Azure Databricks workspace and cluster Cluster_configuration
  6. Create an Azure Key Vault backed secret scope (note that you should have contributor access on the KeyVault instance). Refer: Azure Key Vault backed secret scope secret_scope
  7. Load the requisite libraries in the azure databricks spark cluster. Refer: Install Libraries Please find the list of libraries in the image below: Libraries_List
  8. Import the dbc archive using the link https://github.com/microsoft/A-TALE-OF-THREE-CITIES/blob/master/dbc/all_dbc_archive/311_Analytics_OpenSource.dbc Refer: Import notebook all_dbc_import bulk_dbc
  9. Update and validate the Sink configuration section (Line 8 to 12 in Cmd 3 section) and copy paste the value of the source sas token from line 6 in Step01a_Setup in your Azure databricks workspace.
  10. Start running the sample from Step02a_Data_Wrangling in your Azure databricks workspace.