🏄‍♀️

Azure ML[1] Azure Machine Learning에서 머신러닝 모델 학습하기

Azure machine learning에서 model을 학습해봅시다.

TODO : Azure Machine Learning의 command job을 이용하여 associated credit card dataset으로 credit card payment를 불이행할 높은 확률을 가지는 고객들을 예측하는 모델을 학습하기

azure에서 제공해주는 associated credit card dataset을 이용해 이번 실습을 수행해 보겠습니다. [dataset Link]

Step 1. Azure Machine Learning Workspace 다루기

Workspaces는 동료들과 머신러닝 아티팩트들을 생성하고 관련된 작업들을 그룹화하는 협업을 수행하는 장소입니다. 이 장소에는 experiments, jobs, datasets, models, components, inference endpoints 등이 있습니다. 여기서 작업을 하기 위해서는 먼저 생성을 해줘야겠죠?

workspace 생성

•

먼저 Azure Machine Learning Studio 에 로그인합니다. 

•

create workspace 버튼을 눌러 workspace를 생성합니다.

리소스 그룹 등 Azure 서비스에 대한 설명은 DevOps[2] github action으로 클라우드 서비스 (Azure)로의 빌드 / 배포 자동화 에서 다뤘습니다

코드에서 workspace를 다루기

workspace는 azure.ai.ml의 MLClient 객체를 이용해 다룰 수 있습니다. 이 객체는 resources와 jobs를 관리합니다.

그러려면 먼저 이 패키지를 설치해줘야겠죠?

참고 : https://learn.microsoft.com/ko-kr/python/api/overview/azure/ai-ml-readme?view=azure-python

$ pip install azure-ai-ml
$ pip install azure-identity
Plain Text
복사

•

참고로 azure-identity 패키지는 Azure SDK 전반에 걸쳐 Azure Active Directory (Azure AD) token authentication support를 제공한다 . TokenCredential implementations의 집합을 제공하며, 이는 Azure AD token authentication을 지원하는 Azure SDK clients를 구축할 때 사용될 수 있다.

•

패키지를 설치했으면 MLClient 객체를 생성해줍니다.

•

이 때  DefaultAzureCredential을 사용하여 credentials (자격 증명)에 액세스합니다. 토큰이 필요할 때 복수 ID(EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential, VisualStudioCodeCredential, AzureCliCredential, AzurePowerShellCredential)를 차례로 사용하여 토큰을 요청하고 토큰을 제공하면 중지합니다.

•

DefaultAzureCredential 은 대부분의 Azure SDK 인증 시나리오를 처리할 수 있는 기본 자격 증명입니다.

credential에 대한 더 자세한 내용은 다음을 참고

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# authenticate
credential = DefaultAzureCredential()

# Azure 리소스 그룹 생성
# Azure Machine Learning Service 워크스페이스 생성
# Get a handle to the workspace
ml_client = MLClient(
    credential=credenial,
    subscription_id="b9#8##4-2###-4a##-####-7##e6##5a###",  # subscription_id - 구독id는 Azure Portal -> Subscription 항목에서 확인 가능
    resource_group_name="resource group name",
    workspace_name="workspace name",
)
Python
복사

Step 2. job 실행을 위한 compute cluster 생성

•

job을 실행하기 위해서는 실행할 인프라가 있어야 겠죠, 그게 compute cluster입니다. 

•

Azure machine learning compute cluster에 대해서는 다음 글에 자세히 설명되어 있습니다.

컴퓨팅 클러스터 만들기 - Azure Machine Learning

Azure Machine Learning 작업 영역에서 컴퓨팅 클러스터를 만드는 방법에 대해 알아봅니다. 컴퓨팅 클러스터를 학습 또는 유추를 위한 컴퓨팅 대상으로 사용합니다.

https://learn.microsoft.com/ko-kr/azure/machine-learning/how-to-create-attach-compute-cluster?view=azureml-api-2&tabs=python

•

간단히 말해서, Azure Machine Learning computing cluster는 사용자가 단일 또는 다중 노드 컴퓨팅을 간편하게 만들 수 있는 관리형 컴퓨팅 인프라입니다. 컴퓨팅 클러스터는 workspace의 다른 사용자와 공유할 수 있는 리소스입니다.

•

이는 Linux나 window os를 가지는 단일 혹은 다중 노드 머신일 수도 있고, Spark같은 특정 compute fabric일 수도 있습니다.

•

Azure machine learning (AML)에는 두가지 compute resources가 있습니다

instance

cluster

둘 중 하나를 선택할 수 있는데, cluster는 여러개의 노드를 포함하기에 더 많은 메모리를 포함합니다. 학습을 위해서는 cluster를 사용해 유저들이 많은 연산노드에 연산들을 분산시켜 학습을 빠르게 할 수 있도록 하는 것이 좋을 것 입니다.

•

이 실습에서는 STANDARD_D2S_v3 이라는 이름의 모델로 클러스터를 구성할 것인데요, 이는 2개의 vCPU cores와 8GB RAM을 가집니다.

from azure.ai.ml.entities import AmlCompute

# Name assigned to the compute cluster
cpu_compute_target = "cpu-cluster"

try:
    # 이 부분은 이미 만들어져 있는 경우로, 가져와서 바로 사용한다
    # workspace에서 cpu-cluster라는 이름의 compute cluster를 읽어온다. 
    cpu_cluster = ml_client.compute.get(cpu_compute_target)
    print(
        f"You already have a cluster named {cpu_compute_target}, we'll reuse it as is."
    )

except Exception:
    print("Creating a new cpu compute target...")
    # 해당 이름의 computing cluster가 존재하지 않으면 이부분이 수행된다.
    # Azure Machine Learning compute object 생성
    # quota error가 발생하면, 클러스터 모델의 사이즈를 조정하세요.
    # Learn more on https://azure.microsoft.com/en-us/pricing/details/machine-learning/
    cpu_cluster = AmlCompute(
        name=cpu_compute_target,# 이름
        type="amlcompute", # 컴퓨팅의 유형이며 가능한 값은 ["amlcompute", "computeinstance", "virtualmachine", "kubernetes", "synapsespark"]입니다.
        # VM Family
        size="STANDARD_D2S_v3",  # 이름 주의. 지역마다 차이 있음.
        # Minimum running nodes when there is no job running
        min_instances=0,
        # Nodes in cluster
        max_instances=1,  # 최소 0개, 최대 1개로 설정
        # How many seconds will the node running after the job termination
        idle_time_before_scale_down=180,
        # Dedicated or LowPriority. The latter is cheaper but there is a chance of job termination
        tier="Dedicated", # 추가 quota 설정이 없을 경우 Dedicated 으로 설정.
    )
    print(
        f"AMLCompute with name {cpu_cluster.name} will be created, with compute size {cpu_cluster.size}"
    )
	  # 이 customed AmlCompute 객체를 앞에서 생성한 MLClient 객체인 ml_client의 
    # begin_create_or_update메소드에 넣어주면 computing cluster가 생성이 됩니다.
    cpu_cluster = ml_client.compute.begin_create_or_update(cpu_cluster)
Python
복사

Azure machine learning studio 에서 작업중인 workspace로 들어가 compute를 확인해보면 다음과 같이 잘 생성된 것을 확인할 수 있습니다.

Step 3. job environment 생성

AML job을 방금 생성한 compute resource에서 실행하려면 environment가 필요합니다.

environment는 내가 설치하고 싶은 software runtime과 라이브러리들을 내가 학습시킬 compute에 나열합니다.

local machine에서 python environment와 유사하게 생각할 수 있습니다.

AML은 선별된 많은 이미 만들어진 environments를 제공합니다.

About Azure Machine Learning environments - Azure Machine Learning

Learn about machine learning environments, which enable reproducible, auditable, & portable machine learning dependency definitions for various compute targets.

https://learn.microsoft.com/en-us/azure/machine-learning/concept-environments?view=azureml-api-2

이 실습에서는 conda yaml file을 이용해 custom conda environment를 생성합니다.

•

conda yaml file을 다음과 같이 생성합니다.

./dependencies/conda.yaml

name: model-env
channels:
  - conda-forge
dependencies:
  - python=3.8
  - numpy
  - pip
  - scikit-learn
  - scipy
  - pandas
  - pip:
    - inference-schema[numpy-support]
    - mlflow
    - azureml-mlflow
    - psutil
    - tqdm
    - ipykernel
    - matplotlib
YAML
복사

이 파일에는 job에서 사용할 패키지들이 설치되도록 정의되어 있습니다.

•

이 yaml파일을 통해 이 custom environment를 생성하고 workspace에 등록합니다.

•

이를 위해서는 Environment 객체를 사용합니다. 

azureml.core.Environment class - Azure Machine Learning Python

Configures a reproducible Python environment for machine learning experiments. An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments, including in data preparation, training, and deployment to a web service. An Environment is managed and versioned in an Azure Machine Learning Workspace. You can update an existing environment and retrieve a version to reuse. Environments are exclusive to the workspace they are created in and can't be used across different workspaces. For more information about environments, see Create and manage reusable environments. Class Environment constructor.

https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment(class)?view=azure-ml-py

•

이 객체는 reproducible Python environment for machine learning experiments을 구성합니다.

•

앞에서 compute를 생성하였던 것처럼 우리가 원하는대로 이 Environment 객체를 새성한 후 MLClient의 create_or_update 메소드로 이 environment를 workspace에 등록합니다.

from azure.ai.ml.entities import Environment

custom_env_name = "aml-scikit-learn" # environment의 이름을 임의로 지정
# Environment for training.
custom_job_env = Environment(
    name=custom_env_name,# 이름
    description="Custom environment for Credit Card Defaults job", # 설명
    tags={"scikit-learn": "0.24.2"}, # tag. environment를 부가 설명
    conda_file=os.path.join(dependencies_dir, "conda.yaml"), # Path to configuration file listing conda packages to install.
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest", # URI of a custom base image.
)
# workspace를 조종하는 MLClient 객체 ml_client로 workspace에 environment를 만들거나 업데이트
custom_job_env = ml_client.environments.create_or_update(custom_job_env)

print(
    f"Environment with name {custom_job_env.name} is registered to workspace, the environment version is {custom_job_env.version}"
)
Python
복사

Step 4. training job 구성

•

training job을 구성하기전에, 먼저 training script를 생성해야합니다. (training 코드) 

•

본 실습에서는 sklearn과 mlflow 패키지를 사용하여 코드를 구성합니다..

•

Azure Machine Learning workspace는 MLflow와 호환되므로 MLflow를 사용하여 Azure Machine Learning 작업 영역에서 실행, 메트릭, 매개 변수 및 아티팩트를 추적할 수 있습니다. [https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow-cli-runs?view=azureml-api-2&tabs=interactive%2Ccli]

./src/main.py : credit default prediction을 위한 GradientBoostingClassifier 모델을 학습

•

이제 이 training script를 통해 training job을 수행할 Command 객체를 구성해봅시다.

•

compute와 environment를 생성했을 때와 비슷하게 Command 객체를 생성하고 MLClient 객체를 통해 Command 객체를 workspace에 등록해주면 job이 submit됩니다.

azure.ai.ml package

This browser is no longer supported.

https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml?view=azure-python

from azure.ai.ml import command # Command 객체를 생성하는 function
from azure.ai.ml import Input # Define an input of a Component or Job.

registered_model_name = "credit_defaults_model" # 모델 이름
# Create a Command object which can be used inside dsl.pipeline as a function and can also be created as a standalone command job.
job = command(
		# inputs에 argparse.ArgumentParser로 지정되는 인수들이 dictionary 형태로 들어간다
    inputs=dict(
        data=Input( # data 인자의 경우 uri형태로 들어가야해서 Input 객체로 지정해준다 https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.input?view=azure-python
            type="uri_file",
            path="https://azuremlexamples.blob.core.windows.net/datasets/credit_card/default_of_credit_card_clients.csv",
        ),
        test_train_ratio=0.2,
        learning_rate=0.25,
        registered_model_name=registered_model_name,
    ),
    code="./src/",  # location of source code
		# command에 python file 실행 명령 넣어줌
    command="python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}",
    environment="aml-scikit-learn@latest", # 위에서 만든 environment의 최신버전
    compute="cpu-cluster",  #  위에서 만든 cluster
    display_name="credit_default_prediction", # a friendly name
)

ml_client.create_or_update(job)
Python
복사

이 코드를 수행하게 되면 python을 통해 azure subscription에 연결하고, AML services와의 상호작용을 하게 됩니다.

ml_client가 python을 통해 job들을 submit할 수 있게 해줍니다.

•

Azure machine learning studio - workspace - jobs에서 해당 job이 잘 수행된 것을 확인할 수 있습니다.

•

metrics도 잘 기록되었네요

다음 글에서는 model 배포를 해보겠습니다 :)