您當(dāng)前位置：首頁(yè) > 服務(wù)器 > Alex 的 Hadoop 菜鳥(niǎo)教程: 第21課不只是在HBase中用SQL：Phoenix

Alex 的 Hadoop 菜鳥(niǎo)教程: 第21課不只是在HBase中用SQL：Phoenix

來(lái)源：程序員人生發(fā)布時(shí)間：2015-04-08 08:33:05 閱讀次數(shù)：3091次

聲明

本文基于 Centos6.x + CDH5.x

甚么是Phoenix

Phoenix的團(tuán)隊(duì)用了1句話概括Phoenix："We put the SQL back in NoSQL" 意思是：我們把SQL又放回NoSQL去了！這邊說(shuō)的NoSQL專(zhuān)指HBase，意思是可以用SQL語(yǔ)句來(lái)查詢(xún)Hbase，你可能會(huì)說(shuō)：“Hive和Impala也能夠啊！”。但是Hive和Impala還可以查詢(xún)文本文件，Phoenix的特點(diǎn)就是，它只能查Hbase，別的類(lèi)型都不支持！但是也由于這類(lèi)專(zhuān)1的態(tài)度，讓Phoenix在Hbase上查詢(xún)的性能超過(guò)了Hive和Impala！

安裝Phoenix

之前的組件都是通過(guò)CDH來(lái)安裝的，但是這回就跟Cloudera完全沒(méi)關(guān)系了。從 Apache Phoenix Download 下載Phoenix包，不過(guò)這個(gè)鏡像有點(diǎn)慢，我把包上傳到CSDN了，下載地址見(jiàn)下面的版本對(duì)應(yīng)

Phoenix跟Hbase的版本對(duì)應(yīng)

Phoenix 2.x - HBase 0.94.x
Phoenix 3.x - HBase 0.94.x 下載地址
Phoenix 4.x - HBase 0.98.1+ 下載地址官網(wǎng)下載地址

這里我用4.2.2，下載好后，解壓開(kāi)，把 phoenix⑷.2.2-server.jar 拷貝到所有RegionServer的lib目錄下 /usr/lib/hbase/lib

cp phoenix⑷.2.2-server.jar /usr/lib/hbase/lib

然后重啟所有regionserver

service hbase-regionserver restart

使用Phoenix

把 phoenix⑷.2.2-bin.tar.gz 解壓出來(lái)的 phoenix⑷.2.2-bin 文件夾也上傳到host1上，然后到bin目錄下履行(其實(shí)在本機(jī)也能夠，只是我本機(jī)沒(méi)有Python環(huán)境而Centos天生有Python)

如果是windows下下載的，得先在centos上給bin文件夾里面的.py文件賦上履行權(quán)限

[root@host1 ~]# cd phoenix⑷.2.2-bin/ [root@host1 phoenix⑷.2.2-bin]# cd bin [root@host1 bin]# chmod +x *.py

phoenix可以用4種方式調(diào)用

批處理方式
命令行方式
GUI方式
JDBC調(diào)用

批處理方式

我們建立sql 名叫 us_population.sql 內(nèi)容是

CREATE TABLE IF NOT EXISTS us_population ( state CHAR(2) NOT NULL, city VARCHAR NOT NULL, population BIGINT CONSTRAINT my_pk PRIMARY KEY (state, city));

建立1個(gè)文件 us_population.csv

NY,New York,8143197 CA,Los Angeles,3844829 IL,Chicago,2842518 TX,Houston,2016582 PA,Philadelphia,1463281 AZ,Phoenix,1461575 TX,San Antonio,1256509 CA,San Diego,1255540 TX,Dallas,1213825 CA,San Jose,912332

再創(chuàng)建1個(gè)文件 us_population_queries.sql

SELECT state as "State",count(city) as "City Count",sum(population) as "Population Sum" FROM us_population GROUP BY state ORDER BY sum(population) DESC;

然后1起履行

phoenix⑷.2.2-bin/bin/psql.py host1,host2:2181 us_population.sql us_population.csv us_population_queries.sql

這邊記得把 host1 和 host2 換成你的zookeeper地址

這條命令你同時(shí)做了創(chuàng)建1個(gè)表，插入數(shù)據(jù)，查詢(xún)結(jié)果 3件事情

[root@host1 ~]# phoenix⑷.2.2-bin/bin/psql.py host1,host2:2181 us_population.sql us_population.csv us_population_queries.sql 15/03/04 17:14:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/03/04 17:14:24 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties no rows upserted Time: 0.726 sec(s) csv columns from database. CSV Upsert complete. 10 rows upserted Time: 0.103 sec(s) St City Count Population Sum -- ---------------------------------------- ---------------------------------------- NY 1 8143197 CA 3 6012701 TX 3 4486916 IL 1 2842518 PA 1 1463281 AZ 1 1461575 Time: 0.048 sec(s)

用hbase shell 看下會(huì)發(fā)現(xiàn)多出來(lái)1個(gè) US_POPULATION 表，用scan 命令查看1下這個(gè)表的數(shù)據(jù)

hbase(main):002:0> scan 'US_POPULATION' ROW COLUMN+CELL AZPhoenix column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00x16MG AZPhoenix column=0:_0, timestamp=1425460467206, value= CALos Angeles column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00:xAAxDD CALos Angeles column=0:_0, timestamp=1425460467206, value= CASan Diego column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00x13(t CASan Diego column=0:_0, timestamp=1425460467206, value= CASan Jose column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00x0DxEBxCC CASan Jose column=0:_0, timestamp=1425460467206, value= ILChicago column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00+_x96 ILChicago column=0:_0, timestamp=1425460467206, value= NYNew York column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00|A] NYNew York column=0:_0, timestamp=1425460467206, value= PAPhiladelphia column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00x16SxF1 PAPhiladelphia column=0:_0, timestamp=1425460467206, value= TXDallas column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00x12x85x81 TXDallas column=0:_0, timestamp=1425460467206, value= TXHouston column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00x1ExC5F TXHouston column=0:_0, timestamp=1425460467206, value= TXSan Antonio column=0:POPULATION, timestamp=1425460467206, value=x80x00x00x00x00x13,= TXSan Antonio column=0:_0, timestamp=1425460467206, value= 10 row(s) in 0.2220 seconds

會(huì)發(fā)現(xiàn)

之前定義的PRIMARY KEY 為 state, city ，因而Phoenix就把你輸入的state 和 city的值拼起來(lái)成為rowkey
其他的字段還是依照列名去保存，默許的列簇為 0
還有1個(gè)0:_0 這個(gè)列是沒(méi)有值的，這個(gè)是Phoenix處于性能方面斟酌增加的1個(gè)列，不用管這個(gè)列

命令行方式

然后履行sqlline.py

$ ./sqlline.py localhost

可以進(jìn)入命令行模式

0: jdbc:phoenix:localhost>

退出命令行的方式是履行 !quit

0: jdbc:phoenix:localhost>!quit

命令開(kāi)頭需要1個(gè)感嘆號(hào)，使用help可以打印出所有命令

0: jdbc:phoenix:localhost> help !all Execute the specified SQL against all the current connections !autocommit Set autocommit mode on or off !batch Start or execute a batch of statements !brief Set verbose mode off !call Execute a callable statement !close Close the current connection to the database !closeall Close all current open connections !columns List all the columns for the specified table !commit Commit the current transaction (if autocommit is off) !connect Open a new connection to the database. !dbinfo Give metadata information about the database !describe Describe a table !dropall Drop all tables in the current database !exportedkeys List all the exported keys for the specified table !go Select the current connection !help Print a summary of command usage !history Display the command history !importedkeys List all the imported keys for the specified table !indexes List all the indexes for the specified table !isolation Set the transaction isolation for this connection !list List the current connections !manual Display the SQLLine manual !metadata Obtain metadata information !nativesql Show the native SQL for the specified statement !outputformat Set the output format for displaying results (table,vertical,csv,tsv,xmlattrs,xmlelements) !primarykeys List all the primary keys for the specified table !procedures List all the procedures !properties Connect to the database specified in the properties file(s) !quit Exits the program !reconnect Reconnect to the database !record Record all output to the specified file !rehash Fetch table and column names for command completion !rollback Roll back the current transaction (if autocommit is off) !run Run a script from the specified file !save Save the current variabes and aliases !scan Scan for installed JDBC drivers !script Start saving a script to a file !set Set a sqlline variable !sql Execute a SQL command !tables List all the tables in the database !typeinfo Display the type map for the current connection !verbose Set verbose mode on

建立employee的映照表

數(shù)據(jù)準(zhǔn)備

然后我們來(lái)建立1個(gè)映照表，映照我之前建立過(guò)的1個(gè)hbase表 employee

hbase(main):003:0> describe 'employee' DESCRIPTION ENABLED 'employee', {NAME => 'company', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => true '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', I N_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'family', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLIC ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => ' false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.1120 seconds

可以看出employee有連個(gè)列簇 company 和 family

hbase(main):016:0> scan 'employee' ROW COLUMN+CELL row1 column=company:name, timestamp=1425537923391, value=ted row1 column=company:position, timestamp=1425537950471, value=worker row1 column=family:tel, timestamp=1425537956413, value=13600912345 row2 column=family:tel, timestamp=1425537994087, value=18942245698 row2 column=family:name, timestamp=1425537975610, value=michael row2 column=family:position, timestamp=1425537985594, value=manager 2 row(s) in 0.0340 seconds

有兩條數(shù)據(jù)。如果沒(méi)有這些數(shù)據(jù)的同學(xué)可以用以下命令創(chuàng)建

create 'employee','company','family' put 'employee','row1','company:name','ted' put 'employee','row1','company:position','worker' put 'employee','row1','family:tel','13600912345' put 'employee','row2','company:name','michael' put 'employee','row2','company:position','manager' put 'employee','row2','family:tel','1894225698' scan 'employee'

關(guān)于映照表

在建立映照表之前要說(shuō)明的是，Phoenix是大小寫(xiě)敏感的，并且所有命令都是大寫(xiě)，如果你建的表名沒(méi)有用雙引號(hào)括起來(lái)，那末不管你輸入的是大寫(xiě)還是小寫(xiě)，建立出來(lái)的表名都是大寫(xiě)的，如果你需要建立出同時(shí)包括大寫(xiě)和小寫(xiě)的表名和字段名，請(qǐng)把表名或字段名用雙引號(hào)括起來(lái)

你可以建立讀寫(xiě)的表或只讀的表，他們的區(qū)分以下

讀寫(xiě)表：如果你定義的列簇不存在，會(huì)被自動(dòng)建立出來(lái)，并且賦以空值
只讀表：你定義的列簇必須事前存在

建立映照

0: jdbc:phoenix:localhost> CREATE TABLE IF NOT EXISTS "employee" ("no" CHAR(4) NOT NULL PRIMARY KEY, "company"."name" VARCHAR(30),"company"."position" VARCHAR(20), "family"."tel" CHAR(11), "family"."age" INTEGER); 2 rows affected (1.745 seconds)

這行語(yǔ)句有幾個(gè)注意點(diǎn)

IF NOT EXISTS可以保證如果已有建立過(guò)這個(gè)表，配置不會(huì)被覆蓋
作為rowkey的字段用 PRIMARY KEY標(biāo)定
列簇用 columnFamily.columnName 來(lái)表示
family.age 是新增的字段，我之前建立測(cè)試數(shù)據(jù)的時(shí)候沒(méi)有建立這個(gè)字段的緣由是在hbase shell下沒(méi)法直接寫(xiě)入數(shù)字型，等等我用UPSERT 命令插入數(shù)據(jù)的時(shí)候你就能夠看到真實(shí)的數(shù)字型在hbase 下是如何顯示的

建立好后，查詢(xún)1下數(shù)據(jù)

0: jdbc:phoenix:localhost> SELECT * FROM "employee"; +------+--------------------------------+----------------------+-------------+------------------------------------------+ | no | name | position | tel | age | +------+--------------------------------+----------------------+-------------+------------------------------------------+ | row1 | ted | worker | 13600912345 | null | | row2 | michael | manager | 1894225698 | null | +------+--------------------------------+----------------------+-------------+------------------------------------------+

插入/更改數(shù)據(jù)

插入或更改數(shù)據(jù)在Phoenix里面是1個(gè)命令叫 UPSERT 意思是 update + insert

我們插入1條數(shù)據(jù)試試

UPSERT INTO "employee" VALUES ('row3','billy','worker','16974681345',33);

查詢(xún)1下數(shù)據(jù)

0: jdbc:phoenix:localhost> SELECT * FROM "employee"; +------+--------------------------------+----------------------+-------------+------------------------------------------+ | no | name | position | tel | age | +------+--------------------------------+----------------------+-------------+------------------------------------------+ | row1 | ted | worker | 13600912345 | null | | row2 | michael | manager | 1894225698 | null | | row3 | billy | worker | 16974681345 | 33 | +------+--------------------------------+----------------------+-------------+------------------------------------------+ 3 rows selected (0.195 seconds)

我們?nèi)base里面看1下數(shù)據(jù)

hbase(main):054:0> scan 'employee' ROW COLUMN+CELL row1 column=company:_0, timestamp=1425543735420, value= row1 column=company:name, timestamp=1425543735274, value=ted row1 column=company:position, timestamp=1425543735323, value=worker row1 column=family:tel, timestamp=1425543735420, value=13600912345 row2 column=company:_0, timestamp=1425543735767, value= row2 column=company:name, timestamp=1425543735608, value=michael row2 column=company:position, timestamp=1425543735720, value=manager row2 column=family:tel, timestamp=1425543735767, value=1894225698 row3 column=company:_0, timestamp=1425543857594, value= row3 column=company:name, timestamp=1425543857594, value=billy row3 column=company:position, timestamp=1425543857594, value=worker row3 column=family:age, timestamp=1425543857594, value=x80x00x00! row3 column=family:tel, timestamp=1425543857594, value=16974681345 3 row(s) in 0.0650 seconds

最后那個(gè) x80x00x00! 就是數(shù)字型在hbase中序列化成了字節(jié)的存儲(chǔ)情勢(shì)

用GUI方式

GUI方式的安裝方法在 http://phoenix.apache.org/installation.html 這邊不講了，由于我自己也沒(méi)弄起來(lái)，而且那個(gè)界面實(shí)在太丑了，看了不忍心使用。

JDBC調(diào)用

打開(kāi)Eclipse建立1個(gè)簡(jiǎn)單的Maven項(xiàng)目 play-phoenix

pom.xml的內(nèi)容是

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven⑷.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.crazycake</groupId> <artifactId>play-phoenix</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>play-phoenix</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF⑻</project.build.sourceEncoding> </properties> <repositories> <repository> <id>apache release</id> <url>https://repository.apache.org/content/repositories/releases/</url> </repository> </repositories> <build> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>2.0.2</version> <configuration> <source>1.7</source> <target>1.7</target> <encoding>UTF⑻</encoding> <optimise>true</optimise> <compilerArgument>-nowarn</compilerArgument> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"> </transformer> </transformers> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-core</artifactId> <version>4.2.2</version> </dependency> </dependencies> </project>

phoenix 4.2.2 使用jdk1.7編譯的，如果你只有1.6就用 4.1.0

<dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-client</artifactId> <version>4.1.0</version> <classifier>minimal</classifier> </dependency>

我們建立1個(gè)類(lèi) PhoenixManager

package org.crazycake.play_phoenix; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class PhoenixManager { public static void main(String[] args) throws SQLException { Connection conn = null; Statement stat = null; ResultSet rs = null; try { Class.forName("org.apache.phoenix.jdbc.PhoenixDriver"); conn = DriverManager.getConnection("jdbc:phoenix:host1,host2:2181"); stat = conn.createStatement(); rs = stat.executeQuery("select * from "employee""); while(rs.next()){ System.out.println("no: " + rs.getString("no")); System.out.println("name: " + rs.getString("name")); System.out.println("position: " + rs.getString("position")); System.out.println("age: " + rs.getInt("age")); } } catch (Throwable e) { e.printStackTrace(); } finally{ if(rs != null){ rs.close(); } if(stat != null){ stat.close(); } if(conn != null){ conn.close(); } } } }

運(yùn)行下，結(jié)果為

no: row1 name: ted position: worker age: 0 no: row2 name: michael position: manager age: 0 no: row3 name: billy position: worker age: 33

弄定！