Step 4: Load data into HDFS

In this step, you will copy a data file into Hadoop Distributed File System (HDFS), and then create an external Hive table that maps to the data file.

Download the sample data

Download the sample data archive (features.zip):


wget https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/samples/features.zip

Extract the features.txt file from the archive:
```
unzip features.zip
```

View the first few lines of the features.txt file:


head features.txt

The result should look similar to this:


1535908|Big Run|Stream|WV|38.6370428|-80.8595469|794
875609|Constable Hook|Cape|NJ|40.657881|-74.0990309|7
1217998|Gooseberry Island|Island|RI|41.4534361|-71.3253284|10
26603|Boone Moore Spring|Spring|AZ|34.0895692|-111.410065|3681
1506738|Missouri Flat|Flat|WA|46.7634987|-117.0346113|2605
1181348|Minnow Run|Stream|PA|40.0820178|-79.3800349|1558
1288759|Hunting Creek|Stream|TN|36.343969|-83.8029682|1024
533060|Big Charles Bayou|Bay|LA|29.6046517|-91.9828654|0
829689|Greenwood Creek|Stream|NE|41.596086|-103.0499296|3671
541692|Button Willow Island|Island|LA|31.9579389|-93.0648847|98

The features.txt file contains a subset of data from the United States Board on Geographic Names (http://geonames.usgs.gov/domestic/download_data.htm). The fields in each line represent the following:

Feature ID (unique identifier)
Name
Class (lake; forest; stream; and so on)
State
Latitude (degrees)
Longitude (degrees)
Height (in feet)

At the command prompt, enter the following command:
```
hive
```
The command prompt changes to this: hive>

Enter the following HiveQL statement to create a native Hive table:


CREATE TABLE hive_features
    (feature_id             BIGINT,
    feature_name            STRING ,
    feature_class           STRING ,
    state_alpha             STRING,
    prim_lat_dec            DOUBLE ,
    prim_long_dec           DOUBLE ,
    elev_in_ft              BIGINT)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '\n';

Enter the following HiveQL statement to load the table with data:


LOAD DATA
LOCAL
INPATH './features.txt'
OVERWRITE
INTO TABLE hive_features;

You now have a native Hive table populated with data from the features.txt file. To verify, enter the following HiveQL statement:
```
SELECT state_alpha, COUNT(*)
FROM hive_features
GROUP BY state_alpha;
```
The output should show a list of states and the number of geographic features in each.

Next step

Step 5: Copy data to DynamoDB

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Step 3: Connect to the Leader node

Step 5: Copy data to DynamoDB