CreateDataSourceFromS3
Creates a DataSource
object. A DataSource
references data that
can be used to perform CreateMLModel
, CreateEvaluation
, or
CreateBatchPrediction
operations.
CreateDataSourceFromS3
is an asynchronous operation. In response to
CreateDataSourceFromS3
, Amazon Machine Learning (Amazon ML) immediately
returns and sets the DataSource
status to PENDING
. After the
DataSource
has been created and is ready for use, Amazon ML sets the
Status
parameter to COMPLETED
. DataSource
in
the COMPLETED
or PENDING
state can be used to perform only
CreateMLModel
, CreateEvaluation
or
CreateBatchPrediction
operations.
If Amazon ML can't accept the input source, it sets the Status
parameter to
FAILED
and includes an error message in the Message
attribute of the GetDataSource
operation response.
The observation data used in a DataSource
should be ready to use; that is,
it should have a consistent structure, and missing data values should be kept to a
minimum. The observation data must reside in one or more .csv files in an Amazon Simple
Storage Service (Amazon S3) location, along with a schema that describes the data items
by name and type. The same schema must be used for all of the data files referenced by
the DataSource
.
After the DataSource
has been created, it's ready to use in evaluations and
batch predictions. If you plan to use the DataSource
to train an
MLModel
, the DataSource
also needs a recipe. A recipe
describes how each input variable will be used in training an MLModel
. Will
the variable be included or excluded from training? Will the variable be manipulated;
for example, will it be combined with another variable or will it be split apart into
word combinations? The recipe provides answers to these questions.
Request Syntax
{
"ComputeStatistics": boolean
,
"DataSourceId": "string
",
"DataSourceName": "string
",
"DataSpec": {
"DataLocationS3": "string
",
"DataRearrangement": "string
",
"DataSchema": "string
",
"DataSchemaLocationS3": "string
"
}
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- ComputeStatistics
-
The compute statistics for a
DataSource
. The statistics are generated from the observation data referenced by aDataSource
. Amazon ML uses the statistics internally duringMLModel
training. This parameter must be set totrue
if theDataSource
needs to be used for
MLModel
training.Type: Boolean
Required: No
- DataSourceId
-
A user-supplied identifier that uniquely identifies the
DataSource
.Type: String
Length Constraints: Minimum length of 1. Maximum length of 64.
Pattern:
[a-zA-Z0-9_.-]+
Required: Yes
- DataSourceName
-
A user-supplied name or description of the
DataSource
.Type: String
Length Constraints: Maximum length of 1024.
Pattern:
.*\S.*|^$
Required: No
- DataSpec
-
The data specification of a
DataSource
:-
DataLocationS3 - The Amazon S3 location of the observation data.
-
DataSchemaLocationS3 - The Amazon S3 location of the
DataSchema
. -
DataSchema - A JSON string representing the schema. This is not required if
DataSchemaUri
is specified. -
DataRearrangement - A JSON string that represents the splitting and rearrangement requirements for the
Datasource
.Sample -
"{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"
Type: S3DataSpec object
Required: Yes
-
Response Syntax
{
"DataSourceId": "string"
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- DataSourceId
-
A user-supplied ID that uniquely identifies the
DataSource
. This value should be identical to the value of theDataSourceID
in the request.Type: String
Length Constraints: Minimum length of 1. Maximum length of 64.
Pattern:
[a-zA-Z0-9_.-]+
Errors
For information about the errors that are common to all actions, see Common Errors.
- IdempotentParameterMismatchException
-
A second request to use or change an object was not allowed. This can result from retrying a request using a parameter that was not present in the original request.
HTTP Status Code: 400
- InternalServerException
-
An error on the server occurred when trying to process a request.
HTTP Status Code: 500
- InvalidInputException
-
An error on the client occurred. Typically, the cause is an invalid input value.
HTTP Status Code: 400
Examples
The following is a sample request and response of the CreateDataSourceFromS3 operation.
This example illustrates one usage of CreateDataSourceFromS3.
Sample Request
POST / HTTP/1.1
Host: machinelearning.<region>.<domain>
x-amz-Date: <Date>
Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=contenttype;date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid,Signature=<Signature>
User-Agent: <UserAgentString>
Content-Type: application/x-amz-json-1.1
Content-Length: <PayloadSizeBytes>
Connection: Keep-Alive
X-Amz-Target: AmazonML_20141212.CreateDataSourceFromS3
{
"DataSourceId": "exampleDataSourceId",
"DataSourceName": "exampleDataSourceName",
"DataSpec":
{
"DataLocationS3": "s3://eml-test-EXAMPLE/data.csv",
"DataSchemaLocationS3": "s3://eml-test-EXAMPLE/data.csv.schema",
"DataRearrangement": "{\"splitting\":{\"percentBegin\":10,\"percentEnd\":60}}"
}
}
Sample Response
HTTP/1.1 200 OK
x-amzn-RequestId: <RequestId>
Content-Type: application/x-amz-json-1.1
Content-Length: <PayloadSizeBytes>
Date: <Date>
{"DataSourceId":"exampleDataSourceId"}
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: