Deploy Google gemma-2b and gemma-7b models on AWS Sagemaker

3 min readFeb 27, 2024

Gemma is a family of lightweight, release by google.Notably, Gemma surpasses significantly larger models on key benchmarks.

In this blog we would like to see the steps for deploying the Google gemma model on AWS Sagemaker.

Gemma does not fit the typical definition of an “open source” model. While Gemma allows users to use, reproduce, modify, distribute, perform, or display its services, there are several conditions and restrictions imposed by the license agreement.

***Important Note: Please work with your legal team to comprehensively understand the Gemma license agreement and its implications within your organization when assessing the model for your use cases.

Pre-requisites:

Step 1:

Get access to the Huggingface model repository. This requires signing up.

2. Once the access is granted, generate a Hugging Face token to access the model. ⬇️

https://huggingface.co/settings/tokens

Step 2:

Currently Gemma models is available as 2B base model, 7B instruct model, and 2B instruct model, 7B base model. The required transformers version for running inference with the model is 4.38.0.dev0.

"transformers_version": "4.38.0.dev0"

Deploying on Sagemaker:

As of today, February 27th, 2024, the most recent HuggingFace Text Generation Inference DLC container available on Sagemaker utilizes transformers version 4.37.1. To deploy the model on Sagemaker, it’s necessary to update the existing DLC container with the latest version of the transformers library.

Steps to extend the DLC container:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1-code.amazonaws.com
docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.4.0-gpu-py310-cu121-ubuntu20.04-v1.0

Create a docker file to add latest transformers version:

FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.4.0-gpu-py310-cu121-ubuntu20.04-v1.0
RUN pip install -U transformers

docker build -t huggingface-pytorch-tgi-inference:2.1.1-tgi1.4.0-gpu-py310-cu121-ubuntu20.04-v1.0 .

Push the customer Docker image to ECR repository in your AWS account:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1-code.amazonaws.com
docker tag huggingface-pytorch-tgi-inference:2.1.1-tgi1.4.0-gpu-py310-cu121-ubuntu20.04-v1.0 <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:latest
docker push <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:latest

Create Endpoint on Sagemaker:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
 role = sagemaker.get_execution_role()
except ValueError:
 iam = boto3.client('iam')
 role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
 'HF_MODEL_ID':'google/gemma-2b-it', 
 'SM_NUM_GPUS': json.dumps(1), 
    'HF_TOKEN':'XXXXXXX',
    
}


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
 image_uri= "<AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:latest",#get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
 env=hub,
 role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
 initial_instance_count=1,
 instance_type="ml.g5.2xlarge",
 container_startup_health_check_timeout=900,
  )

# send request
predictor.predict({
 "inputs": "write a story about nature",
})

Output:

[{'generated_text': ' reclaiming what was taken away:\n\nThe Old Woman and the Wind:\n\nIn an ancient forest, nestled between towering trees and whispering streams, lived an old woman named Maria. Her weathered face, as wrinkled as the leaves on the trees, held a tale of countless seasons. Her heart, as pure as the white doves that nested in the branches, held a deep longing to see her beloved forest return to its former glory.\n\nBut it was not easy. Devastated by the deforestation that'}]

Deploy Google gemma-2b and gemma-7b models on AWS Sagemaker

Written by Lavaraja Padala

No responses yet