Which Google Cloud service is best for running distributed training jobs on large datasets with GPUs or TPUs?

  1. App Engine
  2. Cloud Functions
  3. Vertex AI Training with custom containers ✓
  4. Cloud Dataflow

Correct answer: Vertex AI Training with custom containers

Option C is correct because Vertex AI Training supports custom containers that allow data scientists to package any ML framework (TensorFlow, PyTorch, JAX) and run distributed training jobs at scale using Google's GPU and TPU infrastructure, making it purpose-built for large-scale model training. Option A is incorrect because App Engine is a platform-as-a-service designed for hosting web applications and APIs, not for executing computationally intensive distributed training workloads. Option B is incorrect because Cloud Functions is an event-driven serverless compute service with strict execution time and memory limits that make it unsuitable for long-running, resource-heavy distributed training jobs. Option D is incorrect because Cloud Dataflow is a managed service optimized for streaming and batch data pipeline processing using the Apache Beam model, not for training ML models on GPUs or TPUs.

Topic: · vertex ai, distributed training, gpu tpu, gcp ml

Practice Google Cloud ML Engineer Questions Free