Amazon AWS Certified Machine Learning - Specialty - AWS-Certified-Machine-Learning-Specialty Exam Practice Test
A machine learning (ML) specialist is running an Amazon SageMaker hyperparameter optimization job for a model that is based on the XGBoost algorithm. The ML specialist selects Root Mean Square Error (RMSE) as the objective evaluation metric.
The ML specialist discovers that the model is overfitting and cannot generalize well on the validation data.
The ML specialist decides to resolve the model overfitting by using SageMaker automatic model tuning (AMT).
Which solution will meet this requirement?
The ML specialist discovers that the model is overfitting and cannot generalize well on the validation data.
The ML specialist decides to resolve the model overfitting by using SageMaker automatic model tuning (AMT).
Which solution will meet this requirement?
Correct Answer: D
A large consumer goods manufacturer has the following products on sale:
* 34 different toothpaste variants
* 48 different toothbrush variants
* 43 different mouthwash variants
The entire sales history of all these products is available in Amazon S3. Currently, the company is using custom-built autoregressive integrated moving average (ARIMA) models to forecast demand for these products. The company wants to predict the demand for a new product that will soon be launched.
Which solution should a machine learning specialist apply?
* 34 different toothpaste variants
* 48 different toothbrush variants
* 43 different mouthwash variants
The entire sales history of all these products is available in Amazon S3. Currently, the company is using custom-built autoregressive integrated moving average (ARIMA) models to forecast demand for these products. The company wants to predict the demand for a new product that will soon be launched.
Which solution should a machine learning specialist apply?
Correct Answer: C
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A machine learning (ML) specialist uploads a dataset to an Amazon S3 bucket that is protected by server-side encryption with AWS KMS keys (SSE-KMS). The ML specialist needs to ensure that an Amazon SageMaker notebook instance can read the dataset that is in Amazon S3.
Which solution will meet these requirements?
Which solution will meet these requirements?
Correct Answer: C
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether a customer will return a purchased item. The dataset is imbalanced. Only 5% of customers return items A data scientist must find the hyperparameters to capture as many instances of returned items as possible. The company has a small budget for compute.
How should the data scientist meet these requirements MOST cost-effectively?
How should the data scientist meet these requirements MOST cost-effectively?
Correct Answer: C
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
While reviewing the histogram for residuals on regression evaluation data a Machine Learning Specialist notices that the residuals do not form a zero-centered bell shape as shown What does this mean?


Correct Answer: C
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A data scientist is working on a public sector project for an urban traffic system. While studying the traffic patterns, it is clear to the data scientist that the traffic behavior at each light is correlated, subject to a small stochastic error term. The data scientist must model the traffic behavior to analyze the traffic patterns and reduce congestion.
How will the data scientist MOST effectively model the problem?
How will the data scientist MOST effectively model the problem?
Correct Answer: A
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. There are 5 classes in the dataset, with 300 samples for category A, 292 samples for category B,
240 samples for category C, 258 samples for category D, and 310 samples for category E.
The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets.
What could the data scientist conclude form these results?
240 samples for category C, 258 samples for category D, and 310 samples for category E.
The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets.
What could the data scientist conclude form these results?
Correct Answer: B
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A company supplies wholesale clothing to thousands of retail stores. A data scientist must create a model that predicts the daily sales volume for each item for each store. The data scientist discovers that more than half of the stores have been in business for less than 6 months. Sales data is highly consistent from week to week.
Daily data from the database has been aggregated weekly, and weeks with no sales are omitted from the current dataset. Five years (100 MB) of sales data is available in Amazon S3.
Which factors will adversely impact the performance of the forecast model to be developed, and which actions should the data scientist take to mitigate them? (Choose two.)
Daily data from the database has been aggregated weekly, and weeks with no sales are omitted from the current dataset. Five years (100 MB) of sales data is available in Amazon S3.
Which factors will adversely impact the performance of the forecast model to be developed, and which actions should the data scientist take to mitigate them? (Choose two.)
Correct Answer: B,E
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A company uses camera images of the tops of items displayed on store shelves to determine which items were removed and which ones still remain. After several hours of data labeling, the company has a total of
1,000 hand-labeled images covering 10 distinct items. The training results were poor.
Which machine learning approach fulfills the company's long-term needs?
1,000 hand-labeled images covering 10 distinct items. The training results were poor.
Which machine learning approach fulfills the company's long-term needs?
Correct Answer: A
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.
Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?
Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?
Correct Answer: B
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A Machine Learning Specialist is packaging a custom ResNet model into a Docker container so the company can leverage Amazon SageMaker for training. The Specialist is using Amazon EC2 P3 instances to train the model and needs to properly configure the Docker container to leverage the NVIDIA GPUs.
What does the Specialist need to do?
What does the Specialist need to do?
Correct Answer: A
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A company wants to forecast the daily price of newly launched products based on 3 years of data for older product prices, sales, and rebates. The time-series data has irregular timestamps and is missing some values.
Data scientist must build a dataset to replace the missing values. The data scientist needs a solution that resamptes the data daily and exports the data for further modeling.
Which solution will meet these requirements with the LEAST implementation effort?
Data scientist must build a dataset to replace the missing values. The data scientist needs a solution that resamptes the data daily and exports the data for further modeling.
Which solution will meet these requirements with the LEAST implementation effort?
Correct Answer: A
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A Machine Learning Specialist is attempting to build a linear regression model.
Given the displayed residual plot only, what is the MOST likely problem with the model?
Given the displayed residual plot only, what is the MOST likely problem with the model?
Correct Answer: A
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).
A retail company stores 100 GB of daily transactional data in Amazon S3 at periodic intervals. The company wants to identify the schema of the transactional data. The company also wants to perform transformations on the transactional data that is in Amazon S3.
The company wants to use a machine learning (ML) approach to detect fraud in the transformed data.
Which combination of solutions will meet these requirements with the LEAST operational overhead? {Select THREE.)
The company wants to use a machine learning (ML) approach to detect fraud in the transformed data.
Which combination of solutions will meet these requirements with the LEAST operational overhead? {Select THREE.)
Correct Answer: A,E,F
Explanation: Only visible for TrainingDump members. You can sign-up / login (it's free).