Pojd’me si to ujasnit hned na začátku: nesnáším, když si někdo dělá srandu z mé češtiny. Proč? Je to velmi jednoduché: je miliarda důvodů, proč zrovna neříkám určité slovo nebo frázi správně. Je určitě časté, že nevím, jak se slovo píše nebo vyslovuje správně. Ale je taky možné, že mi zrovna došla mozková kapacita a nebo dokonce, že se mi prostě moje varianta líbí více. Poslední je moje srdcovka a patří sem třeba odsprostit se — klidne si tuto modifikaci do slovníčku taky přidejte, trochu se zasmát neuškodí nikdy.

Taky se stává, že správná varianta slova mi nesedí. Třeba “v”…

A Wake-Up Call for Everyone in the Data Science Industry

Whether we are talking about self-driving cars, QR code readers, personalized feed on YouTube, or approvals of loan requests — ML models are more and more often driving automation and “smart” decisions behind. Indeed, building such solutions is a complex but extremely exciting task. No wonder “data scientist” is considered the sexiest job of the 21st century and the field is attracting a lot of people — the idea of letting data solve the world’s challenges is just so intriguing.

But the truth is that enterprise-grade data science is hard. Actually, it is enormously hard. Model versioning and deployment, operating…

How we set up automatic tracing, versioning, and synchronization of project environment

Reproducibility and traceability belong to the crucial requirements of a reliable system. No continuous delivery nor efficient collaboration is possible without these two features. In fact, any kind of delivery is hardly imaginable without a precise specification of the environment.

Fortunately, various tools and methods exist, such as package managers, environment managers, and virtualization tools, addressing the issue. Those methods are widely applied among programmers but seem much less appreciated among data scientists and statisticians. Partially because scientific projects often do not encounter software delivery phases. And even when they do, the responsibility mostly falls well outside scientists’ role. …

Model hosting using SageMaker

AWS SageMaker provides deployment options for batch and real-time predictions. We already discussed batch processing in the first post of the series. In this post, let’s take a look at SageMaker hosting service, which is tightly coupled with Sagemaker jobs. Note that in this series we are focused on the so-called “bring your own Docker” use case.

Original photo by Clark Van Der Beken on Unsplash

Amazon SageMaker Hosting Services

Specification and Capabilities

First of all, let’s take a look at the deployment process and the capabilities of the service. To deploy a model to SageMaker, the user will need to:

  1. Create a model instance defined by a Docker image (stored in ECR) and optional model…

Photo by Mr TT on Unsplash

“[DataSentics mission is to] make data science and machine learning have a real impact on organizations across the world — demystify the hype and black magic surrounding AI/ML and bring to life transparent production-level data science solutions and products delivering tangible impact and innovation.”

You can recognize the values of our company from the mission clearly: focus on the real impact, productionalization of ideas, quality and high innovation level, transparency. Without a doubt, the domain knowledge inside DataSentics is truly broad, covering the entire machine learning product lifecycle. Our expertise is also very unique, being highly oriented on the cloud…

We all have to give SageMaker service a credit — it enhances capabilities and releases new features steadily. One of them is SageMaker jobs. They provide a way to process data, train, and evaluate models using algorithms provided by SageMaker or custom ones.

AWS SageMaker always uses Docker containers when running jobs. While the service provides pre-built Docker images for its built-in algorithms, users can utilize custom Docker images to define and provision jobs runtime. The last statement sounds general, so you may wonder: “Does it mean I can do in SageMaker anything I want as long as it is…

Part Two: How a Data Science Project Can Benefit from Docker?

photo from https://www.pexels.com/@cottonbro

The post suggests how you can benefit from Docker on different stages of a data science project from research to deploying to data and application monitoring in production. In particular, I discuss advantages as well as aspects to consider when applying Docker technology to

  • Examine an existing solution
  • Prepare development environment
  • Create a test environment (including a test database)
  • Monitor your data and application
  • Deploy your project
  • Share your project

Examine an existing solution

A new data science project typically starts with examining the field and testing out existing solutions regarding their feasibility to the stated problem. …

Part One: Critical Q&A for better understanding of Docker and Container Orchestration

photo from https://www.pexels.com/@suzyhazelwood

Let's take a look at some important whys and hows on virtualization, containerization, Docker, and container orchestration. The post is not intended to explain the concepts itself, but to answer some of the key questions required for a better understanding of the underlying technologies and to introduce practical use cases.

Questions I examine in the post are:

  • What is the relation between containerization and virtualization?
  • What is containerization and how containers differ from virtual machines?
  • Why is Docker the number one containerization technique?
  • What container orchestration enables and how it differs from Docker compose?

Virtualization vs Containerization

In a nutshell, virtualization brings an…

From https://giphy.com/

The 1st of January. It is about zero celsius. Extremely sunny, no wind, no rain, no snow. Here it is — my self-organized one-person first-in-life marathon. I am running. Why? To prove I can do it, to make something I had never done before. I already did a similar thing last year by running my first half-marathon on the first of January. And I felt great. It was such a wonderful beginning of the year.

So what has happened this year? I have a couple of ideas. I am going to write it down for the future myself, who, no…

“As ironic as it seems, the challenge of a tester is to test as little as possible. Test less, but test smarter.” — Federico Toledo

There is a widely held opinion that having any intelligence written in SQL is a bad, bad idea — it is hard to maintain, release and test. But in practice SQL is an extremely powerful language that allows you to speed up manipulating your data significantly. Hence developers do write plenty of code in SQL. I also do. And as an agile developer, I do cover my code with tests as much as I can.

Anastasia Lebedeva

Developer, Runner, Adventure Lover

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store