November 24, 2022

Azure | Copy Data from CSV to SQL Server using Azure Data Factory, Azure Logic Apps and SharePoint Document Library

 

Introduction:

We created a process to copy data from CSV file uploaded in SharePoint Document Library to SQL Server Database for a Drug Dispensary based out of Pennsylvania, USA.

For this, we created a Document Library in SharePoint Site where the user uploads the CSV file, so when a file is uploaded in the SharePoint Document Library, the Logic App captures the File that has been uploaded, and the content is passed from the Logic App to the Azure Data Factory for further processing. The Azure Data Factory has a copy data pipeline, which copies the Data of CSV to the relevant SQL Table.

Prerequisites:

·         Azure Subscription

·         Azure Storage Account

·         Azure SQL Database

·         SharePoint Site with Document Library


Implementation:

In order to execute the process, the following task needs to be carried out.

·         Create SharePoint Document Library.

·         Blob Storage Folders for individual Vendors.

·         SQL Database with Relevant tables and Stored Procedures.

·         Create Azure Data Factory (Datasets and Pipeline) for all three Vendors.

·         Logic App creation for the Vendors.

 

SharePoint Document Library

 The SharePoint Site Document Library is the initial trigger where the CSV file is uploaded and then the rest of the process is carried out. We created a Document Library, and the CSV file is uploaded into that particular folder.


Azure Blob Storage

·        Blob storage for all the vendors created, the storage is used to store the CSV while processing and then deleting after the process has been completed.

·        Each folder is dedicated to individual Vendors.

Create a Blob Storage

To create a container in the Azure portal, follow these steps:

1.       Go to your new storage account in the Azure portal.

2.       In the left menu for the storage account, scroll to the Blob service section, then select Containers.

3.       Select the + Container button.

4.       Type a name for your new container. The container name must be lowercase, must start with a letter or number, and can include only letters, numbers, and the dash (-) character.

5.       Set the level of public access to the container. The default level is Private (no anonymous access).

6.       Select OK to create the container.


SQL Server

Create a table to store the Data.

    

 

Create a user-defined data table type for the equivalent table, the UDT would be given reference in the Data factory for the processing

        
 

Create a Stored Procedure that inserts the data with the help of the UDT into the corresponding table.

      



 The above image shows SalesReportByPatient Parameter which would be passed in sink configuration in the Azure Data Factory.

Azure Data Factory

·         We need to create Data Factory for the Vendors.

 

 

·         Dataset which will be created would be for all the Tables in the Schema. Two datasets are created for each table.

o   CSV dataset

o   SQL dataset

1. Create a Dataset in a Data Factory

 

1.       Go to the Source tab. Select + New to create a source dataset. 

 

2.       In the New Dataset dialog box, select Azure Blob Storage and then select Continue. The source data is in Blob storage, so you select Azure Blob Storage for the source dataset.

 

 

3.       In the Select Format dialog box, choose the Delimited Text type of your data, and then select Continue.

 

4.       In the Set Properties dialog box, enter the source name. Select the checkbox for the first row as a header. And Select your source file. 

 
  

5.       New Page opens up. Select the Schema tab and click on the Import Schema it will import the schema.


                   

 

Note: Follow the same steps to Create Destination (only changes are select SQL Server database in step 2)

2. Create a Copy Data Pipeline

In the Activities toolbox, expand the Move and Transform category, and drag and drop the Copy Data activity from the toolbox to the pipeline designer surface. 


Configure Source

Go to the Source tab. Select + New to create a source dataset.

 

Source dataset: select dataset which we created for the source

Wildcard paths: in this field, add the Blob Storage's dynamic path. The dynamic filename comes from the Logic Apps as a parameter

3. Configure sink

Sink Dataset: Select the destination dataset

Stored procedure: select the stored procedure name which you created in the SQL server

Table type: select the table type which you created in SQL Server

Table type parameter name: add parameter name which you declare in the stored procedure

4. Configure Mapping

Select the source column and destination column for mapping data. 

Azure Logic App

 To Create Logic Apps follow the below steps

1.        Sign in to the Azure portal with your Azure account credentials.

2.       In the Azure portal search for Logic Apps.

 

 

3.       On the Logic Apps page, select Add.

 

4.       On the Logic App pane, provide basic details and settings for your logic app. Create a   new resource group for the purposes of this example logic app.

 


5.       When you're ready, select Review + Create. Confirm the details that you provided, and select Create.

 

6.       After Azure successfully deploys your app, select Go to the resource. Or, you can find and select your logic app by typing the name in the search box. 


 

7.       The Logic Apps Designer opens and shows a page Under Templates, select Blank Logic App. 

 

Logic App that triggers when a file is uploaded and calls Azure Data Factory to copy data. To create add the below actions.

1.       We have added trigger “when a file is created in a folder” in logic apps for getting the file from SharePoint.

                        

2.       To process file from SharePoint to Azure Data factory we need to save the file in Azure Blob storage so we did that in logic apps using a blob storage connector.

 

3.       To run the Data factory we have added “Azure Data Factory Connector”, We pass two parameters to the Data Pipeline File name and Filetype.


When the Logic Apps runs, it will get the file from the SharePoint Document Library and copy it in Blob Storage, followed by the Data factory pipeline.

Output: The Data is inserted into the SQL database after the Logic App is successfully created. The Logic App keeps on checking the file in the SharePoint list and runs at particular given time intervals, for example, every 15 minutes.


Conclusion:

 Azure Data Factory is a robust way to transfer data and makes it easier to get multiple CSV files from SharePoint, create pipelines process of those files and move it to the relevant Database. 


If you have any questions you can reach out our SharePoint Consulting team here.

No comments:

Post a Comment