How will you (not) use AWS SageMaker Jobs — Part Two

Model hosting using SageMaker

Published in

DataSentics

4 min readMay 21, 2020

AWS SageMaker provides deployment options for batch and real-time predictions. We already discussed batch processing in the first post of the series. In this post, let’s take a look at SageMaker hosting service, which is tightly coupled with Sagemaker jobs. Note that in this series we are focused on the so-called “bring your own Docker” use case.

Original photo by Clark Van Der Beken on Unsplash

Amazon SageMaker Hosting Services

Specification and Capabilities

First of all, let’s take a look at the deployment process and the capabilities of the service. To deploy a model to SageMaker, the user will need to:

Create a model instance defined by a Docker image (stored in ECR) and optional model artifacts. Note that other registries than ECR are not supported yet.
Create an endpoint configuration defined by one or more models, and specify the number and type of EC2 instances for each model.
Create an endpoint defined by an endpoint configuration. Note that the endpoint lifecycle corresponds to the application lifecycle. That is, creating an endpoint is equal to the application deployment, while a live endpoint corresponds to a running application.

The service manages several things for the user:

A virtual machine having Docker installed, so the user only needs to specify the required EC2 instance type(s) in the endpoint configuration
Pulling a Docker image from ECR, launching a Docker container, and monitoring the container health status

Importantly, the service is capable of provisioning GPU instances (note that your container has to be nvidia-docker compatible) and allows to configure autoscaling for a running endpoint.

Requirements and Limitations

Application deployment to SageMaker hosting services has to satisfy the following requirements:

The container hosts a web server that responds to /invocations and /ping on port 8080 (see the documentation).
There is an executable script serve which is available in the system path or located in the working directory (see an example of the serve script).
SageMaker then runs the container using docker run image serve(see the documentation).

The limitations are:

By default, the endpoints could be accessed only using high-level API provided by AWS SDK or AWS CLI. That implies lower flexibility as well as inability to provision other endpoints than /ping and /invocation.
As emphasized in the documentation, SageMaker endpoints are scoped to an individual AWS account and are not public. That is, applications that are not running within the scope of your account will not be able to reach the endpoint.
The model container has a maximum of 60 seconds to respond to the/invocations request.

Verdict

Overall, AWS SageMaker is a convenient service for model hosting, managing resources and the container lifecycle for you. Importantly, it supports GPU-based instances — a crucial requirement for some ML applications.

From my perspective, the main drawback of the SageMaker hosting service is the specificity of its endpoints. Although limitations mentioned in the previous section could be overcome by utilizing other AWS services (find links in the “Related Resource” section), it will surely boost operational costs.

To conclude, the specification of SageMaker hosting service suggests that it is not suitable for general containerized application: it imposes strict constraints on an application and its environment, and requires additional steps to configure access to the application endpoints. The hosting service, however, might be a good fit for such ML use cases, for which the requirements imposed on the container are not a bottleneck. It is also a convenient service in case you utilize other components of the Sagemaker ecosystem, e.g. training jobs.

Related Resources

“Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda” at AWS Machine Learning Blog
“Creating a machine learning-powered REST API with Amazon API Gateway mapping templates and Amazon SageMaker” at AWS Machine Learning Blog
A workshop on how to develop, train and serve a custom deep learning algorithm using SageMaker at sagemaker-workshop.com

Thank you for reading up to this point. That will be it from us on “How will you (not) use AWS SageMaker Jobs” series. If you find this post interesting, you may also like “How will you (not) use AWS SageMaker Jobs — Part One: Batch Processing” — the first post in the series.

As always, if you have any further questions or suggestions, feel free to leave a comment. Also, if you have a topic in mind that you would like us to cover in future posts, let us know.