Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 30 additions & 30 deletions docs/workloads/azure/data/architecture/infrastructure_data_azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,31 @@ keywords:
---

Core data platform resources are defined within Terraform templates and grouped inside
[deploy/azure](https://github.com/ensono/stacks-azure-data/tree/main/deploy/azure) directory.
https://github.com/ensono/stacks-azure-data/tree/main/deploy/azure[deploy/azure] directory.
There are two subfolders in this directory:

* `infra`
* `networking`
* `infra`
* `databricks`

## Networking

Using a private network is the default behaviour in Ensono Stacks Azure Data Platform. `networking`
subfolder contains configurations for the created network and subnetworks, at its core using
https://github.com/ensono/stacks-terraform/tree/master/azurerm/modules/azurerm-hub-spoke[azurerm-hub-spoke]
Ensono Stacks Terraform module. See https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/hybrid-networking/hub-spoke?tabs=cli[Microsoft documentation] for more details on implementing Hub-spoke network topology in Azure.

## Infrastructure
The following diagram shows network configuration for the two default environments:

* Hub network (`is_hub: true`)
* Nonprod (`is_hub: false`)
* Prod (`is_hub: false`)

The `infra` subfolder contains the following definitions:
image::network_hub_spoke.png[Network Hub Spoke]

## Infra

`infra` subfolder contains the following definitions:

1. **Resource Group**
2. **Azure SQL Database** sample instance with database schemas
Expand Down Expand Up @@ -58,41 +74,25 @@ The `infra` subfolder contains the following definitions:
the private endpoints.
11. **Log Analytics Workspace**

## Networking

Using a private network is the default behaviour in the Ensono Stacks Data Azure Platform. The `networking`
subfolder contains configurations for the created network and subnetworks, at its core using
[azurerm-hub-spoke](https://github.com/ensono/stacks-terraform/tree/master/azurerm/modules/azurerm-hub-spoke)
Ensono Stacks Terraform module. See [Microsoft documentation](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/hybrid-networking/hub-spoke?tabs=cli) for more details on implementing Hub-spoke network topology in Azure.

The following diagram shows network configuration for the two default environments:
## Databricks

* Hub network (`is_hub: true`)
* Nonprod (`is_hub: false`)
* Prod (`is_hub: false`)
Due to the way that Databricks is deployed into Azure, a seperate stage to finalise the configuration is required. This folder conatins the following resource definitions.

![network_hub_spoke.png](../images/network_hub_spoke.png)
1. **Azure Databricks Secrets Scope**
2. **Azure Databricks Token**
3. **Azure Databricks Workspace**
4. **Azure Key Vault Screts** adds the Databricks token to Key Vault

### Databricks secure cluster connectivity

Ensono Stacks Azure Data Platform uses VNet injection to deploy Databricks into a custom virtual network.

In most scenarios, we recommend that Azure Databricks is deployed in a fully secure manner, using
secure cluster connectivity and disabling public workspace access. This means that Databricks
can only be accessed over a private endpoint from within the private network. This also means that
projects would need to have networking prerequisites such as ExpressRoute or VPNs in order to access
the workspace. If this is not possible, then a virtual machine will need to be set up within the
transit subnet. Users will then need to RDP into the VM and access the workspace via that.
In most scenarios, we recommend that Azure Databricks is deployed in a fully secure manner, using secure cluster connectivity and disabling public workspace access. This means that Databricks can only be accessed over a private endpoint from within the private network. This also means that projects would need to have networking prerequisites such as ExpressRoute or VPNs in order to access the workspace. If this is not possible, then a virtual machine will need to be set up within the transit subnet. Users will then need to RDP into the VM and access the workspace via that.

Even without public IPs and with the data plane deployed into our VNet, there is still the option
to toggle access to the Workspace UI via public networks. The default configuration disallows access
to the Databricks workspace over the public internet in production environments, while leaving it
open in development environments. This approach enhances the developer experience in case there is
no properly configured networking/VPN set up in the target subscription.
Even without public IPs and with the data plane deployed into our VNet, there is still the option to toggle access to the Workspace UI via public networks. The default configuration disallows access to the Databricks workspace over the public internet in production environments, while leaving it open in development environments. This approach enhances the developer experience in case there is no properly configured networking/VPN set up in the target subscription.

Enabling public workspace access only opens access to the UI via public internet. Access is still
restricted based on the IAM policy.
Enabling public workspace access only opens access to the UI via public internet. Access is still restricted based on the IAM policy.

The following diagram depicts the Databricks network configuration.

![network_databricks.png](../images/network_databricks.png)
![Network Databricks](../images/network_databricks.png)

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
id: core_data_platform_deployment_azure
title: Infrastructure Deployment
sidebar_label: 3. Infrastructure Deployment
hide_title: false
hide_table_of_contents: false
description: Infrastructure deployment
keywords:
- stacks cli
- ensono
- data
- infrastructure
- azure
- template
---

import TerraformDeployTasks from "../snippets/_terraform_plan_deploy_tasks.mdx"
import TerraformDeployPipeline from "../snippets/_terraform_plan_deploy_pipeline.mdx"

This section provides an overview of configuring and deploying the core data platform infrastructure in Azure.

It assumes you have [generated a new data project using Ensono Stacks](./generate_project.mdx), and that the following [requirements](./requirements_data_azure.md) are in place:

* [Azure subscription and service principal](./requirements_data_azure.md#azure-subscription)
* If you want to provision the infrastructure within a private network, this can be done as part of a [Hub-Spoke network topology](../architecture/infrastructure_data_azure#networking). Spoke virtual network and subnet for private endpoints must be provisioned for each environment. The hub network must contain a self-hosted agent. See [Microsoft documentation](https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/hybrid-networking/hub-spoke?tabs=cli) for more details on implementing Hub-spoke network topology in Azure.

## Workstation Deployment

### Step 1: Source Environment Files

:::note
This step assumes that the previous steps for deploying the networking as defined here has been deployed. If this is not the case then please follow the steps here.
:::

As before, the environment files for the base configuration have been created already. The one pertaining to the infrastructure deployment needs to be sourced. In addition the networking setup created more environment files that are based on the outputs of the Terraform. This means that the settings for the next stage are easily referenced.

import SourceInfraEnvsPowershell from "../snippets/powershell/_source_envfile_infra.mdx"
import SourceInfraEnvsBash from "../snippets/bash/_source_envfile_infra.mdx"

| Shell | Command |
|---|---|
| <img src={require('../images/powershell.png').default} width="20"></img> | <SourceInfraEnvsPowershell /> |
| <img src={require('../images/bash.png').default} width="20"></img> | <SourceInfraEnvsBash /> |

The first file, `envfile_infra.bash` configures the necessary variables which state the files that should be deployed (via the STAGE variable) as well as sourcing the credentials file again. This is done in case a different shell has been started after deployment.

The second file, `dev-networking-envvars.bash` contains the outputs from the networking stage which relate to the variables required for the inputs of the data infrastructure.

### Step 2: Deploy the Data infrastructure components

Now that everything is setup the data platform infrastructure can be deployed. As before the `STAGE` being run can be confirmed by reading the environment variable in the session, after sourcing the environment variables.

| Shell | Command |
|---|---|
| <img src={require('../images/powershell.png').default} width="20"></img> | `$env:STAGE` |
| <img src={require('../images/bash.png').default} width="20"></img> | `echo $STAGE` |

<img src={require('../images/check_env_vars_infra.png').default}></img>
<figcaption>Check the environment variables</figcaption>

After the local environment has been configured, run the following commands.

| Shell | Command |
|---|---|
| <img src={require('../images/powershell.png').default} width="20"></img> <br /> <img src={require('../images/bash.png').default} width="20"></img> | <TerraformDeployTasks /> |

Alternatively the EIR pipeline run be run which will bundle all of the above tasks together.

| Shell | Command |
|---|---|
| <img src={require('../images/powershell.png').default} width="20"></img> <br /> <img src={require('../images/bash.png').default} width="20"></img> | <TerraformDeployPipeline /> |


## Azure DevOps Pipeline

### Step 1: Add Infrastructure pipeline in Azure DevOps

A YAML file containing a template Azure DevOps CI/CD pipeline for building and deploying the core infrastructure is provided in `build/azdo/azure/pipeline-infra-private.yml` - this should be added as the definition for a new pipeline in Azure DevOps.

1. Sign-in to your Azure DevOps organization and go to your project
2. Go to Pipelines, and then select **New pipeline**
3. Name the new pipeline, e.g. `ensono.stacks-data-infrastructure`
4. For the pipeline definition, specify the YAML file in the repository (`pipeline-infra-private.yml`) and save
5. The new pipeline will require access to any Azure DevOps pipeline variable groups specified in the pipeline YAML. Under each variable group, go to 'Pipeline permissions' and add the pipeline.


### Step 2: Deploy Infrastructure in non-production environment

Run the pipeline configured in Step 1 to commence the build and deployment process.

When running this pipeline, a number of parameters are available. The two that affect which env is deployed are:

- Environment Name
- Can be one of `dev`, `qa`, `uat`, `prod`
- Environment Group
- Can be one of `nonprod`, `prod`.<br />This changes which variable group is used to provide authentication details for the subscription.

Thus the pipeline must be run for each of the environments that need to be deployed. The parameters available for the pipeline run are as follows:

| Name | Description | Default |
|------|-------------|---------|
| Destory Enviornment | State if the environment should be destroyed | `false` |
| Deploy Environment | State if the environment should be deployed | `true` |
| Environment Group | The name of the envronment group being deployed to. This controils which credentials variable group is used. | `nonprod` |
| Environment Name | The name of the environment that is being deployed | `dev` |
| Networking Pipeline Definition ID | The networking pipeline produced artifacts that this pipeline requires. The ID of that pipeline is requirted for this pipeline to download thos artifacts |

<figure>
![Azure DevOps Infra Pipeline Parameters](../images/ado-pipeline-infra-params.png)
<figcaption>Azure DevOps Infra Pipeline Parameters</figcaption>
</figure>

If successful, the core infrastructure resources will now be available in the nonprod Ensono Stacks environment. To view these deployed resources, navigate to the [Azure portal](https://portal.azure.com/) and search for the resource group associated with the deployment. This resource group is named based upon values provided during step 1 in the pattern
`<companyname>-<projectname>-<region>-<environment>-<component>` (for example: `ensono-data-euw-data-dev`). Within the resource group, you'll find a list of the resources that were deployed.

<figure>
![Azure Infra Resource Groups](../images/azure-infra-dev-rgs.png)
<figcaption>Azure Infra Resource Groups</figcaption>
</figure>

:::note
The resource group `databricks-rg-ensono-data-euw-data-dev` is an automatic resource group that contains the databricks deployment. This does not need to be accesed or managed.
:::

<figure>
![Azure Resources for Dev Infrastructure](../images/azure-infra-dev.png)
<figcaption>Azure Resources for Dev Infrastructure</figcaption>
</figure>

Once the resources have been deployed a new variable group will have been created in Azure DevOps for the specified environment. In this example, and for the `dev` enviornment, the variable group will be called `ensono-data-data-dev-infra`.

### Step 3: Deploy Infrastructure in further environments

By default Ensono Stacks provides a framework for managing the platform across 4 environments - dev, qa, uat, prod in 2 envrionment groups nonprod, prod.
The template CI/CD pipelines provided are based upon these names - but these may be amended depending upon the specific requirements of your project and organisation.

* Deployment to the non-production (nonprod) environment is triggered on a feature branch when a pull request is open
* Deployment to the production (prod) environment is triggered on merging to the `main` branch, followed by manual approval of the release step.


## Next steps

Now you have generated and deployed a new Ensono Stacks Data Platform, [create Databricks token](./databricks_config.mdx).
Loading