Uso di HCatalog - Amazon EMR

Uso di HCatalog

Puoi utilizzare HCatalog all'interno di diverse applicazioni che utilizzano il metastore Hive. Negli esempi di questa sezione viene mostrato come creare una tabella e utilizzarla nel contesto di Pig e Spark SQL.

Disabilitazione della scrittura diretta durante l'utilizzo di HCatalog HStorer

Ogni volta che un'applicazione utilizza HCatStorer per scrivere su una tabella HCatalog archiviata in Amazon S3, disabilita la caratteristica di scrittura diretta di Amazon EMR. Ad esempio, disabilita la scrittura diretta quando utilizzi il comando STORE Pig o durante l'esecuzione di processi Sqoop che scrivono le tabelle HCatalog su Amazon S3. Puoi disabilitare la caratteristica di scrittura diretta impostando le configurazioni mapred.output.direct.NativeS3FileSystem e mapred.output.direct.EmrFileSystem su false. Nell'esempio seguente viene illustrato come impostare queste configurazioni tramite Java.

Configuration conf = new Configuration(); conf.set("mapred.output.direct.NativeS3FileSystem", "false"); conf.set("mapred.output.direct.EmrFileSystem", "false");

Creazione di una tabella tramite la CLI HCat e uso dei dati in Pig

Crea lo script seguente, impressions.q, sul cluster:

CREATE EXTERNAL TABLE impressions ( requestBeginTime string, adId string, impressionId string, referrer string, userAgent string, userCookie string, ip string ) PARTITIONED BY (dt string) ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION 's3://[your region].elasticmapreduce/samples/hive-ads/tables/impressions/'; ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05');

Esegui lo script tramite la CLI HCat:

% hcat -f impressions.q Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties OK Time taken: 4.001 seconds OK Time taken: 0.519 seconds

Apri la shell Grunt e accedi ai dati in impressions:

% pig -useHCatalog -e "A = LOAD 'impressions' USING org.apache.hive.hcatalog.pig.HCatLoader(); B = LIMIT A 5; dump B;" <snip> (1239610346000,m9nwdo67Nx6q2kI25qt5On7peICfUM,omkxkaRpNhGPDucAiBErSh1cs0MThC,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; FunWebProducts; GTB6; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET,wcVWWTascoPbGt6bdqDbuWTPPHgOPs,69.191.224.234,2009-04-13-08-05) (1239611000000,NjriQjdODgWBKnkGJUP6GNTbDeK4An,AWtXPkfaWGOaNeL9OOsFU8Hcj6eLHt,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 1.1.4322),OaMU1F2gE4CtADVHAbKjjRRks5kIgg,57.34.133.110,2009-04-13-08-05) (1239610462000,Irpv3oiu0I5QNQiwSSTIshrLdo9cM1,i1LDq44LRSJF0hbmhB8Gk7k9gMWtBq,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239611007000,q2Awfnpe0JAvhInaIp0VGx9KTs0oPO,s3HvTflPB8JIE0IuM6hOEebWWpOtJV,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; InfoPath.1),QSb3wkLR4JAIut4Uq6FNFQIR1rCVwU,42.174.193.253,2009-04-13-08-05) (1239610398000,c362vpAB0soPKGHRS43cj6TRwNeOGn,jeas5nXbQInGAgFB8jlkhnprN6cMw7,cartoonnetwork.com,Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322),k96n5PnUmwHKfiUI0TFP0TNMfADgh9,51.131.29.87,2009-04-13-08-05) 7120 [main] INFO org.apache.pig.Main - Pig script completed in 7 seconds and 199 milliseconds (7199 ms) 16/03/08 23:17:10 INFO pig.Main: Pig script completed in 7 seconds and 199 milliseconds (7199 ms)

Accesso alla tabella tramite Spark SQL

In questo esempio viene creato un DataFrame Spark dalla tabella creata nel primo esempio e vengono mostrate le prime 20 righe:

% spark-shell --jars /usr/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core-1.0.0-amzn-3.jar <snip> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc); scala> val df = hiveContext.sql("SELECT * FROM impressions") scala> df.show() <snip> 16/03/09 17:18:46 INFO DAGScheduler: ResultStage 0 (show at <console>:32) finished in 10.702 s 16/03/09 17:18:46 INFO DAGScheduler: Job 0 finished: show at <console>:32, took 10.839905 s +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ |requestbegintime| adid| impressionid| referrer| useragent| usercookie| ip| dt| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ | 1239610346000|m9nwdo67Nx6q2kI25...|omkxkaRpNhGPDucAi...|cartoonnetwork.com|Mozilla/4.0 (comp...|wcVWWTascoPbGt6bd...|69.191.224.234|2009-04-13-08-05| | 1239611000000|NjriQjdODgWBKnkGJ...|AWtXPkfaWGOaNeL9O...|cartoonnetwork.com|Mozilla/4.0 (comp...|OaMU1F2gE4CtADVHA...| 57.34.133.110|2009-04-13-08-05| | 1239610462000|Irpv3oiu0I5QNQiwS...|i1LDq44LRSJF0hbmh...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239611007000|q2Awfnpe0JAvhInaI...|s3HvTflPB8JIE0IuM...|cartoonnetwork.com|Mozilla/4.0 (comp...|QSb3wkLR4JAIut4Uq...|42.174.193.253|2009-04-13-08-05| | 1239610398000|c362vpAB0soPKGHRS...|jeas5nXbQInGAgFB8...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610600000|cjBTpruoaiEtqLuMX...|XwlohBSs8Ipxs1bRa...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610804000|Ms3eJHNAEItpxvimd...|4SIj4pGmgVLl625BD...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610872000|h5bccHX6wJReDi1jL...|EFAWIiBdVfnxwAMWP...|cartoonnetwork.com|Mozilla/4.0 (comp...|k96n5PnUmwHKfiUI0...| 51.131.29.87|2009-04-13-08-05| | 1239610365000|874NBpGmxNFfxEPKM...|xSvE4XtGbdtXPF2Lb...|cartoonnetwork.com|Mozilla/5.0 (Maci...|eWDEVVUphlnRa273j...| 22.91.173.232|2009-04-13-08-05| | 1239610348000|X8gISpUTSqh1A5reS...|TrFblGT99AgE75vuj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610743000|kbKreLWB6QVueFrDm...|kVnxx9Ie2i3OLTxFj...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610812000|9lxOSRpEi3bmEeTCu...|1B2sff99AEIwSuLVV...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610876000|lijjmCf2kuxfBTnjL...|AjvufgUtakUFcsIM9...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610941000|t8t8trgjNRPIlmxuD...|agu2u2TCdqWP08rAA...| corriere.it|Mozilla/4.0 (comp...|tX1sMpnhJUhmAF7AS...| 55.35.44.79|2009-04-13-08-05| | 1239610490000|OGRLPVNGxiGgrCmWL...|mJg2raBUpPrC8OlUm...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610556000|OnJID12x0RXKPUgrD...|P7Pm2mPdW6wO8KA3R...| corriere.it|Mozilla/4.0 (comp...|r2k96t1CNjSU9fJKN...| 71.124.66.3|2009-04-13-08-05| | 1239610373000|WflsvKIgOqfIE5KwR...|TJHd1VBspNcua0XPn...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610768000|4MJR0XxiVCU1ueXKV...|1OhGWmbvKf8ajoU8a...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610832000|gWIrpDiN57i3sHatv...|RNL4C7xPi3tdar2Uc...| corriere.it|Mozilla/5.0 (Maci...|fj2L1ILTFGMfhdrt3...| 75.117.56.155|2009-04-13-08-05| | 1239610789000|pTne9k62kJ14QViXI...|RVxJVIQousjxUVI3r...| pixnet.net|Mozilla/5.0 (Maci...|1bGOKiBD2xmui9OkF...| 33.176.101.80|2009-04-13-08-05| +----------------+--------------------+--------------------+------------------+--------------------+--------------------+--------------+----------------+ only showing top 20 rows scala>