Developer Guide Version Latest - Docs.aws.amazon

Transcription

Amazon Machine LearningDeveloper GuideVersion Latest

Amazon Machine Learning Developer GuideAmazon Machine Learning: Developer GuideCopyright Amazon Web Services, Inc. and/or its affiliates. All rights reserved.Amazon's trademarks and trade dress may not be used in connection with any product or service that is notAmazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages ordiscredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who mayor may not be affiliated with, connected to, or sponsored by Amazon.

Amazon Machine Learning Developer GuideTable of Contents. viiWhat is Amazon Machine Learning? . 1Amazon Machine Learning Key Concepts . 1Datasources . 1ML Models . 2Evaluations . 3Batch Predictions . 4Real-time Predictions . 4Accessing Amazon Machine Learning . 4Regions and Endpoints . 5Pricing for Amazon ML . 5Estimating Batch Prediction Cost . 5Estimating Real-Time Prediction Cost . 7Machine Learning Concepts . 8Solving Business Problems with Amazon Machine Learning . 8When to Use Machine Learning . 9Building a Machine Learning Application . 9Formulating the Problem . 9Collecting Labeled Data . 10Analyzing Your Data . 10Feature Processing . 11Splitting the Data into Training and Evaluation Data . 12Training the Model . 12Evaluating Model Accuracy . 14Improving Model Accuracy . 17Using the Model to Make Predictions . 18Retraining Models on New Data . 19The Amazon Machine Learning Process . 19Setting Up Amazon Machine Learning . 21Sign Up for AWS . 21Tutorial: Using Amazon ML to Predict Responses to a Marketing Offer . 22Prerequisite . 22Steps . 22Step 1: Prepare Your Data . 22Step 2: Create a Training Datasource . 24Step 3: Create an ML Model . 28Step 4: Review the ML Model's Predictive Performance and Set a Score Threshold . 29Step 5: Use the ML Model to Generate Predictions . 31Step 6: Clean Up . 36Creating and Using Datasources . 38Understanding the Data Format for Amazon ML . 38Attributes . 38Input File Format Requirements . 39Using Multiple Files As Data Input to Amazon ML . 39End-of-Line Characters in CSV Format . 40Creating a Data Schema for Amazon ML . 40Example Schema . 40Using the targetAttributeName Field . 42Using the rowID Field . 42Using the AttributeType Field . 43Providing a Schema to Amazon ML . 44Splitting Your Data . 44Pre-splitting Your Data . 45Sequentially Splitting Your Data . 45Version Latestiii

Amazon Machine Learning Developer GuideRandomly Splitting Your Data .Data Insights .Descriptive Statistics .Accessing Data Insights on the Amazon ML console .Using Amazon S3 with Amazon ML .Uploading Your Data to Amazon S3 .Permissions .Creating an Amazon ML Datasource from Data in Amazon Redshift .Required Parameters for the Create Datasource Wizard .Creating a Datasource with Amazon Redshift Data (Console) .Troubleshooting Amazon Redshift Issues .Using Data from an Amazon RDS Database to Create an Amazon ML Datasource .RDS Database Instance Identifier .MySQL Database Name .Database User Credentials .AWS Data Pipeline Security Information .Amazon RDS Security Information .MySQL SQL Query .S3 Output Location .Training ML Models .Types of ML Models .Binary Classification Model .Multiclass Classification Model .Regression Model .Training Process .Training Parameters .Maximum Model Size .Maximum Number of Passes over the Data .Shuffle Type for Training Data .Regularization Type and Amount .Training Parameters: Types and Default Values .Creating an ML Model .Prerequisites .Creating an ML Model with Default Options .Creating an ML Model with Custom Options .Data Transformations for Machine Learning .Importance of Feature Transformation .Feature Transformations with Data Recipes .Recipe Format Reference .Groups .Assignments .Outputs .Complete Recipe Example .Suggested Recipes .Data Transformations Reference .N-gram Transformation .Orthogonal Sparse Bigram (OSB) Transformation .Lowercase Transformation .Remove Punctuation Transformation .Quantile Binning Transformation .Normalization Transformation .Cartesian Product Transformation .Data Rearrangement .DataRearrangement Parameters .Evaluating ML Models .ML Model Insights .Binary Model Insights .Version 8181828282838484878788

Amazon Machine Learning Developer GuideInterpreting the Predictions . 88Multiclass Model Insights . 91Interpreting the Predictions . 91Regression Model Insights . 92Interpreting the Predictions . 92Preventing Overfitting . 94Cross-Validation . 95Adjusting Your Models . 96Evaluation Alerts . 96Generating and Interpreting Predictions . 98Creating a Batch Prediction . 98Creating a Batch Prediction (Console) . 98Creating a Batch Prediction (API) . 99Reviewing Batch Prediction Metrics . 99Reviewing Batch Prediction Metrics (Console) . 100Reviewing Batch Prediction Metrics and Details (API) . 100Reading the Batch Prediction Output Files . 100Locating the Batch Prediction Manifest File . 100Reading the Manifest File . 100Retrieving the Batch Prediction Output Files . 101Interpreting the Contents of Batch Prediction Files for a Binary Classification ML model . 101Interpreting the Contents of Batch Prediction Files for a Multiclass Classification ML Model . 102Interpreting the Contents of Batch Prediction Files for a Regression ML Model . 103Requesting Real-time Predictions . 103Trying Real-Time Predictions . 104Creating a Real-Time Endpoint . 105Locating the Real-time Prediction Endpoint (Console) . 106Locating the Real-time Prediction Endpoint (API) . 106Creating a Real-time Prediction Request . 106Deleting a Real-Time Endpoint . 108Managing Amazon ML Objects . 109Listing Objects . 109Listing Objects (Console) . 109Listing Objects (API) . 110Retrieving Object Descriptions . 111Detailed Descriptions in the Console . 111Detailed Descriptions from the API . 111Updating Objects . 111Deleting Objects . 111Deleting Objects (Console) . 112Deleting Objects (API) . 112Monitoring Amazon ML with Amazon CloudWatch Metrics . 114Logging Amazon ML API Calls with AWS CloudTrail . 115Amazon ML Information in CloudTrail . 115Example: Amazon ML Log File Entries . 116Tagging Your Objects . 119Tag Basics . 119Tag Restrictions . 120Tagging Amazon ML Objects (Console) . 120Tagging Amazon ML Objects (API) . 121Amazon Machine Learning Reference . 122Granting Amazon ML Permissions to Read Your Data from Amazon S3 . 122Granting Amazon ML Permissions to Output Predictions to Amazon S3 . 123Controlling Access to Amazon ML Resources -with IAM . 125IAM Policy Syntax . 125Specifying IAM Policy Actions for Amazon MLAmazon ML . 126Specifying ARNs for Amazon ML Resources in IAM Policies . 126Version Latestv

Amazon Machine Learning Developer GuideExample Policies for Amazon MLs .Cross-service confused deputy prevention .Dependency Management of Asynchronous Operations .Checking Request Status .System Limits .Names and IDs for all Objects .Object Lifetimes .Resources .Document History .Version Latestvi127129130131132132133134135

Amazon Machine Learning Developer GuideWe are no longer updating the Amazon Machine Learning service or accepting new users for it. Thisdocumentation is available for existing users, but we are no longer updating it. For more information, seeWhat is Amazon Machine Learning.Version Latestvii

Amazon Machine Learning Developer GuideAmazon Machine Learning Key ConceptsWhat is Amazon Machine Learning?We are no longer updating the Amazon Machine Learning (Amazon ML) service or accepting new usersfor it. This documentation is available for existing users, but we are no longer updating it.AWS now provides a robust, cloud-based service — Amazon SageMaker — so that developers of all skilllevels can use machine learning technology. SageMaker is a fully managed machine learning service thathelps you create powerful machine learning models. With SageMaker, data scientists and developers canbuild and train machine learning models, and then directly deploy them into a production-ready hostedenvironment.For more information, see the SageMaker documentation.Topics Amazon Machine Learning Key Concepts (p. 1) Accessing Amazon Machine Learning (p. 4) Regions and Endpoints (p. 5) Pricing for Amazon ML (p. 5)Amazon Machine Learning Key ConceptsThis section summarizes the following key concepts and describes in greater detail how they are usedwithin Amazon ML: Datasources (p. 1) contain metadata associated with data inputs to Amazon MLML Models (p. 2) generate predictions using the patterns extracted from the input dataEvaluations (p. 3) measure the quality of ML modelsBatch Predictions (p. 4) asynchronously generate predictions for multiple input data observationsReal-time Predictions (p. 4) synchronously generate predictions for individual data observationsDatasourcesA datasource is an object that contains metadata about your input data. Amazon ML reads your inputdata, computes descriptive statistics on its attributes, and stores the statistics—along with a schema andother information—as part of the datasource object. Next, Amazon ML uses the datasource to train andevaluate an ML model and generate batch predictions.ImportantA datasource does not store a copy of your input data. Instead, it stores a reference to theAmazon S3 location where your input data resides. If you move or change the Amazon S3 file,Amazon ML cannot access or use it to create a ML model, generate evaluations, or generatepredictions.The following table defines terms that are related to datasources.TermDefinitionAttributeA unique, named property within an observation. In tabular-formatted data suchas spreadsheets or comma-separated values (CSV) files, the column headingsrepresent the attributes, and the rows contain values for each attribute.Version Latest1

Amazon Machine Learning Developer GuideML ModelsTermDefinitionSynonyms: variable, variable name, field, columnDatasource Name(Optional) Allows you to define a human-readable name for a datasource. Thesenames enable you to find and manage your datasources in the Amazon MLconsole.Input DataCollective name for all the observations that are referred to by a datasource.LocationLocation of input data. Currently, Amazon ML can use data that is stored withinAmazon S3 buckets, Amazon Redshift databases, or MySQL databases in AmazonRelational Database Service (RDS).ObservationA single input data unit. For example, if you are creating an ML model to detectfraudulent transactions, your input data will consist of many observations, eachrepresenting an individual transaction.Synonyms: record, example, instance, rowRow ID(Optional) A flag that, if specified, identifies an attribute in the input data to beincluded in the prediction output. This attribute makes it easier to associate whichprediction corresponds with which observation.Synonyms: row identifierSchemaThe information needed to interpret the input data, including attribute namesand their assigned data types, and names of special attributes.StatisticsSummary statistics for each attribute in the input data. These statistics serve twopurposes:The Amazon ML console displays them in graphs to help you understand yourdata at-a-glance and identify irregularities or errors.Amazon ML uses them during the training process to improve the quality of theresulting ML model.StatusIndicates the current state of the datasource, such as In Progress, Completed, orFailed.Target AttributeIn the context of training an ML model, the target attribute identifies the nameof the attribute in the input data that contains the "correct" answers. AmazonML uses this to discover patterns in the input data and generate an ML model. Inthe context of evaluating and generating predictions, the target attribute is theattribute whose value will be predicted by a trained ML model.Synonyms: targetML ModelsAn ML model is a mathematical model that generates predictions by finding patterns in your data.Amazon ML supports three types of ML models: binary classification, multiclass classification andregression.The following table defines terms that are related to ML models.Version Latest2

Amazon Machine Learning Developer GuideEvaluationsTermDefinitionRegressionThe goal of training a regression ML model is to predict a numeric value.MulticlassThe goal of training a multiclass ML model is to predict values that belong to alimited, pre-defined set of permissible values.BinaryThe goal of training a binary ML model is to predict values that can only have oneof two states, such as true or false.Model SizeML models capture and store patterns. The more patterns a ML model stores, thebigger it will be. ML model size is described in Mbytes.Number of PassesWhen you train an ML model, you use data from a datasource. It is sometimesbeneficial to use each data record in the learning process more than once. Thenumber of times that you let Amazon ML use the same data records is called thenumber of passes.RegularizationRegularization is a machine learning technique that you can use to obtain higherquality models. Amazon ML offers a default setting that works well for mostcases.EvaluationsAn evaluation measures the quality of your ML model and determines if it is performing well.The following table defines terms that are related to evaluations.TermDefinitionModel InsightsAmazon ML provides you with a metric and a number of insights that you can useto evaluate the predictive performance of your model.AUCArea Under the ROC Curve (AUC) measures the ability of a binary ML model topredict a higher score for positive examples as compared to negative examples.Macro-averagedF1-scoreThe macro-averaged F1-score is used to evaluate the predictive performance ofmulticlass ML models.R

levels can use machine learning technology. SageMaker is a fully managed machine learning service that helps you create powerful machine learning models. With SageMaker, data scientists and developers can build and train machine learning models, and then directly deploy them into a production-ready hosted environment.