您當前位置：首頁 > 數據庫 > 數據庫應用 > Oracle正則表達式實戰

Oracle正則表達式實戰

來源：程序員人生發布時間：2015-01-30 08:43:47 閱讀次數：4422次

原文鏈接：http://oracle-base.com/articles/misc/regular-expressions-support-in-oracle.php

Introduction
Example 1 : REGEXP_SUBSTR
Example 2 : REGEXP_SUBSTR
Example 3 : REGEXP_SUBSTR
Example 4 : REGEXP_REPLACE
Example 5 : REGEXP_INSTR
Example 6 : REGEXP_LIKE and REGEXP_SUBSTR
Example 7 : REGEXP_COUNT
Example 8 : REGEXP_LIKE

相干文章：

PL/SQL Enhancements in Oracle Database 10g - Regular Expressions
PL/SQL New Features and Enhancements in Oracle Database 11g Release 1 - Enhancements to Regular Expression Built-in SQL Functions

介紹

Oracle 10g introduced support for regular expressions in SQL andPL/SQL with the following functions.

Oracle 10g開始支持在SQL和PLSQL中使用以下正則表達式：

REGEXP_INSTR - Similar to INSTR except it uses a regular expression rather than a literal as the search string. 類似INSTR函數
REGEXP_LIKE - Similar to LIKE except it uses a regular expression as the search string. REGEXP_LIKE is really an operator, not a function. 類似LIKE條件
REGEXP_REPLACE - Similar to REPLACE except it uses a regular expression as the search string. 類似REPLACE函數
REGEXP_SUBSTR - Returns the string matching the regular expression. Not really similar to SUBSTR. 返回匹配正則表達式的字符串，和SUBSTR有點類似

Oracle 11g introduced two new features related to regularexpressions.

11g開始引入2個新特性：

REGEXP_COUNT - Returns the number of occurrences of the regular expression in the string. 返回符合正則表達式的字符串出現的次數。
Sub-expression support was added to all regular expression functions by adding a parameter to each function to specify the sub-expression in the pattern match.
子表達式在所有正則表達式函數都支持，可通過增加1個參數實現。

Learning to write regular expressions takes a little time. If youdon't do it regularly, it can be a voyage of discovery each time. The generalrules for writing regular expressions are available here.You can read the Oracle Regular Expression Support here.

Rather than trying to repeat the formal definitions, I'll presenta number of problems I've been asked to look at over the years, where asolution using a regular expression has been appropriate.

此處不重復正則表達式的定義，代之以1組問題導向的正則表達式利用實例：

Example 1 : REGEXP_SUBSTR

The data in a column is free text, but may include a 4 digit year.

數據在字段中以自由文本寄存，但是可能包括4個精度的年份數據。

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('FALL 2014');

INSERT INTO t1 VALUES ('2014 CODE-B');

INSERT INTO t1 VALUES ('CODE-A 2014 CODE-D');

INSERT INTO t1 VALUES ('ADSHLHSALK');

INSERT INTO t1 VALUES ('FALL 2004');

INSERT INTO t1 VALUES ('FALL 2015');

COMMIT;

SELECT * FROM t1;

DATA

---------------------------------------------------------------------

FALL 2014

2014 CODE-B

CODE-A 2014 CODE-D

ADSHLHSALK

FALL 2004

5 rows selected.

SQL>

If we needed to return rows containing a specific year we coulduse the LIKE operator (WHERE data LIKE '%2014%'),but how do we return rows using a comparison (<, <=, >, >=,<>)?

One way to approach this is to pull out the 4 figure year andconvert it to a number, so we don't accidentally do an ASCII comparison. That'spretty easy using regular expressions.

如果我們需要返回包括指定年份的數據我們可使用LIKE操作符（…），但是如何通過不等操作符返回行？1條路是抽出4個數字的年份并轉換為數字。通過正則表達式可以很容易實現。

We can identify digits using the "d" or"[0⑼]" operators. We want a group of four of them, which isrepresented by the "{4}" operator. So our regular expression will be"d{4}" or "[0⑼]{4}". The REGEXP_SUBSTR functionreturns the string matching the regular expression, so that can be used toextract the text of interest. We then just need to convert it to a number andperform our comparison.

我們使用d或[0⑼]來辨認數字。我們需要4個1組，可使用{4}表示。至此，我們的正則表達式為：d{4}或[0⑼]{4}。REGEXP_SUBSTR函數返回匹配指定正式表達式的字符串，所以可以用來提取我們感興趣的文本。然后我們只需將其轉換為數字并履行比較便可。

SELECT *

FROM t1

WHERE TO_NUMBER(REGEXP_SUBSTR(data, 'd{4}')) >= 2014;

DATA

---------------------------------------------------------------------

FALL 2014

2014 CODE-B

CODE-A 2014 CODE-D

FALL 2015

4 rows selected.

SQL>

Example 2 : REGEXP_SUBSTR

Given a source string, how do we split it up into separatecolumns, based on changes of case and alpha-to-numeric, such that this.

給定1個元字符串，要求依照指定規則（基于字母大小寫和字母到數字的變化）分割為多個列：

ArtADB1234567e9876540

Becomes this. 分割后：

Art ADB 1234567 e 9876540

The source data is set up like this. 元數據以下：

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('ArtADB1234567e9876540');

COMMIT;

The first part of the string is an initcap word, so it starts witha capital letter between "A" and "Z". We identify a singlecharacter using the "[]" operator, and ranges are represented using"-", like "A-Z", "a-z" or "0⑼". So ifwe are looking for a single character that is a capital letter, we need to lookfor "[A-Z]". That needs to be followed by lower case letters, whichwe now know is "[a-z]", but we need 1 or more of them, which issignified by the "+" operator. So to find an initcap word, we need tosearch for "[A-Z][a-z]+". Since we want the first occurrence of this,we can use the following.

字符串第1部份為大寫字母，可能為A-Z。我們使用[]操作符辨認單個字符，至于范圍則用“-”，例如“A-Z”，"a-z"或"0⑼"。所以如果我們需要找大寫的首字母則用“[A-Z]”。其后緊隨著的是若干小寫字母，可以用+表示若干（1個或多個）。組合起來的正則表達式即為：[A-Z][a-z]+，這樣拆分出的第1列方法有了。

REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1)

The second part of the string is a group of 1 or more uppercaseletters. We know we need to use the "[A-Z]+" pattern, but we need tomake sure we don't get the first capital letter, so we look for the secondoccurrence.

第2部份是1組包括1個或多個大寫字母。我們知道需要用模式：[A-Z]+,但是為了不和第1部份沖突，我們指明匹配其第2次出現的文本。

REGEXP_SUBSTR(data, '[A-Z]+', 1, 2)

The next part is the first occurrence of a group of numbers.

下1部份是1組純數字。

REGEXP_SUBSTR(data, '[0⑼]+', 1, 1)

The next part is a group of lower case letters. We don't to pickup those from the initcap word, so we must look for the second occurrence oflower case letters.

下1部份是1組小寫字母，一樣斟酌了不和第1部份沖突：

REGEXP_SUBSTR(data, '[a-z]+', 1, 2)

Finally, we have a group of numbers, which is the secondoccurrence of this pattern.

最后，是1組數字：

REGEXP_SUBSTR(data, '[0⑼]+', 1, 2)

Putting that all together, we have the following query, whichsplits the data into separate columns.

將以上每部份正則表達式的輸出分別作為獨立字段：

COLUMN col1 FORMAT A15

COLUMN col2 FORMAT A15

COLUMN col3 FORMAT A15

COLUMN col4 FORMAT A15

COLUMN col5 FORMAT A15

SELECT REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1) col1,

REGEXP_SUBSTR(data, '[A-Z]+', 1, 2) col2,

REGEXP_SUBSTR(data, '[0⑼]+', 1, 1) col3,

REGEXP_SUBSTR(data, '[a-z]+', 1, 2) col4,

REGEXP_SUBSTR(data, '[0⑼]+', 1, 2) col5

FROM t1;

COL1 COL2 COL3 COL4 COL5

--------- ---------- ---------- ----------- ------------

Art ADB 1234567 e 9876540

1 row selected.

SQL>

Example 3 : REGEXP_SUBSTR

We need to pull out a group of characters from a "/"delimited string, optionally enclosed by double quotes. The data looks likethis.

我們需要從1個字符串（含有分隔字符/和雙引號” ”）中提取1組字符，原始數據以下：

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('978/955086/GZ120804/10-FEB⑴2');

INSERT INTO t1 VALUES ('97/95508/BANANA/10-FEB⑴2');

INSERT INTO t1 VALUES ('97/95508/"APPLE"/10-FEB⑴2');

COMMIT;

We are looking for 1 or more characters that are not"/", which we do using "[^/]+". The "^" in thebrackets represents NOT and "+" means 1 or more. We also want toremove optional double quotes, so we add that as a character we don't want,giving us "[^/"]+". So if we want the data from the thirdcolumn, we need the third occurrence of this pattern.

我們要找1個或多個非“/“字符，可使用”[^/]+“。^在方括號中表示NOT。我們還需要移除可選的雙引號所以需要使用[^/”]+。所以如果我們需要獲得第3次出現的字符串：

SELECT REGEXP_SUBSTR(data, '[^/"]+', 1, 3) AS element3

FROM t1;

ELEMENT3

---------------------------------------------------------------------

GZ120804

BANANA

APPLE

3 rows selected.

SQL>

Example 4 : REGEXP_REPLACE

We need to take an initcap string and separate the words. The datalooks like this.

我們需要提取首字母大寫的字符串并將其分離。原始數據以下：

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('SocialSecurityNumber');

INSERT INTO t1 VALUES ('HouseNumber');

COMMIT;

We need to find each uppercase character "[A-Z]". Wewant to keep that character we find, so we will make that pattern asub-expression "([A-Z])", allowing us to refer to it later. For eachmatch, we want to replace it with a space, plus the matching character. Thespace is pretty obvious, but we need to use "1" to signify the textmatching the first sub expression. So we will replace the matching pattern witha space and itself, " 1". We don't want to replace the first letterof the string, so we will start at the second occurrence.

我們需要使用[A-Z]找到每一個大寫字符。我們需要保存找到的字符，所以我們使用1個子表達式([A-Z])，以便后續對其援用。對每個匹配，我們想使用1個空格替換，加上匹配到的字符。空格是相當明顯的，但我們需要使用”1”表示第1個子表達式匹配的文本。所以我們替換匹配模式使用1個空格和其本身，即”1”。我們不想替換字符串的第1個字母，所以我們從第2個字符開始：

SELECT REGEXP_REPLACE(data, '([A-Z])', ' 1', 2) AS hyphen_text

FROM t1;

HYPHEN_TEXT

--------------------------------------------------------------------

Social Security Number

House Number

2 rows selected.

SQL>

Example 5 : REGEXP_INSTR

We have a specific pattern of digits (9 99:99:99) and we want toknow the location of the pattern in our data.

我們有1個指定數字模式（999:99:99）并且我們想知道模式在我們數據中所處位置。

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('1 01:01:01');

INSERT INTO t1 VALUES ('.2 02:02:02');

INSERT INTO t1 VALUES ('..3 03:03:03');

COMMIT;

We know we are looking for groups of numbers, so we can use"[0⑼]" or "d". We know the amount of digits in eachgroup, which we can indicate using the "{n}" operator, so we simplydescribe the pattern we are looking for.

我們知道我們正在找1組數字，所以使用"[0⑼]"或"d"。我們知道每組數字的數量，所以可使用{n}操作符，所以我們簡單描寫1下模式：

SELECT REGEXP_INSTR(data, '[0⑼] [0⑼]{2}:[0⑼]{2}:[0⑼]{2}') AS string_loc_1,

REGEXP_INSTR(data, 'd d{2}:d{2}:d{2}') AS string_loc_2

FROM t1;

STRING_LOC_1 STRING_LOC_2

------------ ------------

1 1

2 2

3 3

3 rows selected.

SQL>

Example 6 : REGEXP_LIKE andREGEXP_SUBSTR

We have strings containing parentheses. We want to return the textwithin the parentheses for those rows that contain parentheses.

我們有包括在括號內的字符串。我們想只想返回括號內的字符串。

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('This is some text (with parentheses) in it.');

INSERT INTO t1 VALUES ('This text has no parentheses.');

INSERT INTO t1 VALUES ('This text has (parentheses too).');

COMMIT;

The basic pattern for text between parentheses is"(.*)". The "" characters are escapes for theparentheses, making them literals. Without the escapes they would be assumed todefine a sub-expression. That pattern alone is fine to identify the rows of interestusing a REGEXP_LIKE operator,but it is not appropriate in a REGEXP_SUBSTR, as itwould return the parentheses also. To omit the parentheses we need to include asub-expression inside the literal parentheses "((.*))". We can then REGEXP_SUBSTR using thefirst sub expression.

匹配括號內文本的模式基本寫法為：“(.*)”。是轉義字符，使跟在其后的字符變成字面值。但是這個模式用在REGEXP_SUBSTR時會連括號1起返回。為了疏忽括號我們需要在字面括號內部包括子表達式："((.*))".

COLUMN with_parentheses FORMAT A20

COLUMN without_parentheses FORMAT A20

SELECT data,

REGEXP_SUBSTR(data, '(.*)') AS with_parentheses,

       REGEXP_SUBSTR(data, '((.*))', 1, 1, 'i', 1) AS without_parentheses

FROM t1

WHERE REGEXP_LIKE(data, '(.*)');

DATA WITH_PARENTHESES WITHOUT_PARENTHESES

-------------------------------------------------- -------------------- --------------------

This is some text (with parentheses) in it. (with parentheses) with parentheses

This text has (parentheses too). (parentheses too) parentheses too

2 rows selected.

SQL>

注意：REGEXP_SUBSTR(data,'((.*))', 1, 1, 'i', 1) 中最后的i代碼不辨別大小寫，最后1個“1”代表返回哪一個子表達式匹配的文本。（范圍0⑼）

Example 7 : REGEXP_COUNT

We need to know how many times a block of 4 digits appears intext. The data looks like this.

我們需要知道4個數字的塊在字符串中出現的次數。看原始數據：

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('1234');

INSERT INTO t1 VALUES ('1234 1234');

INSERT INTO t1 VALUES ('1234 1234 1234');

COMMIT;

We can identify digits using "d" or "[0⑼]"and the "{4}" operator signifies 4 of them, so using"d{4}" or "[0⑼]{4}" with the REGEXP_COUNT functionseems to be a valid option.

我們可以用表達式：d 或[0⑼]和{4}操作符辨認4個數字的塊。

SELECT REGEXP_COUNT(data, '[0⑼]{4}') AS pattern_count_1,

REGEXP_COUNT(data, 'd{4}') AS pattern_count_2

FROM t1;

PATTERN_COUNT_1 PATTERN_COUNT_2

--------------- ---------------

1 1

2 2

3 3

3 rows selected.

SQL>

Example 8 : REGEXP_LIKE

We need to identify invalid email addresses. The data looks likethis.

我們需要校驗郵箱地址，原始數據以下：

DROP TABLE t1;

CREATE TABLE t1 (

data VARCHAR2(50)

);

INSERT INTO t1 VALUES ('me@example.com');

INSERT INTO t1 VALUES ('me@example');

INSERT INTO t1 VALUES ('@example.com');

INSERT INTO t1 VALUES ('me.me@example.com');

INSERT INTO t1 VALUES ('me.me@ example.com');

INSERT INTO t1 VALUES ('me.me@example-example.com');

COMMIT;

The following test gives us email addresses that approximate toinvalid email address formats.

以下測試給我們近似不合法的郵箱。

SELECT data

FROM t1

WHERE NOT REGEXP_LIKE(data, '[A-Z0⑼._%+-]+@[A-Z0⑼.-]+.[A-Z]{2,4}', 'i');

DATA

--------------------------------------------------

me@example

@example.com

me.me@ example.com

3 rows selected.

SQL>

-----------------------------

Dylan Presents.

生活不易，碼農辛苦
如果您覺得本網站對您的學習有所幫助,可以手機掃描二維碼進行捐贈
程序員人生

------分隔線----------------------------

上一篇 計算機網絡綜合實驗

下一篇 Atitit.數據庫事務隔離級別 attilax 總結

分享到:

------分隔線----------------------------

為碼而活

積分：4237

15粉絲

7關注

欄目熱點

日本搞逼视频_黄色一级片免费在线观看_色99久久_性明星video另类hd_欧美77_综合在线视频

Oracle正則表達式實戰

介紹

Example 1 : REGEXP_SUBSTR

Example 2 : REGEXP_SUBSTR

Example 3 : REGEXP_SUBSTR

Example 4 : REGEXP_REPLACE

Example 5 : REGEXP_INSTR

Example 6 : REGEXP_LIKE andREGEXP_SUBSTR

Example 7 : REGEXP_COUNT

Example 8 : REGEXP_LIKE