原文鏈接:http://oracle-base.com/articles/misc/regular-expressions-support-in-oracle.php
相干文章:
Oracle 10g introduced support for regular expressions in SQL andPL/SQL with the following functions.
Oracle 10g開始支持在SQL和PLSQL中使用以下正則表達式:
Oracle 11g introduced two new features related to regularexpressions.
11g開始引入2個新特性:
Learning to write regular expressions takes a little time. If youdon't do it regularly, it can be a voyage of discovery each time. The generalrules for writing regular expressions are available here.You can read the Oracle Regular Expression Support here.
Rather than trying to repeat the formal definitions, I'll presenta number of problems I've been asked to look at over the years, where asolution using a regular expression has been appropriate.
此處不重復正則表達式的定義,代之以1組問題導向的正則表達式利用實例:
The data in a column is free text, but may include a 4 digit year.
數據在字段中以自由文本寄存,但是可能包括4個精度的年份數據。
If we needed to return rows containing a specific year we coulduse the LIKE
operator (WHERE data LIKE '%2014%'
),but how do we return rows using a comparison (<, <=, >, >=,<>)?
One way to approach this is to pull out the 4 figure year andconvert it to a number, so we don't accidentally do an ASCII comparison. That'spretty easy using regular expressions.
如果我們需要返回包括指定年份的數據我們可使用LIKE操作符(…),但是如何通過不等操作符返回行?1條路是抽出4個數字的年份并轉換為數字。通過正則表達式可以很容易實現。
We can identify digits using the "d" or"[0⑼]" operators. We want a group of four of them, which isrepresented by the "{4}" operator. So our regular expression will be"d{4}" or "[0⑼]{4}". The REGEXP_SUBSTR
functionreturns the string matching
the regular expression, so that can be used toextract the text of interest. We then just need to convert it to a number andperform our comparison.
我們使用d或[0⑼]來辨認數字。我們需要4個1組,可使用{4}表示。至此,我們的正則表達式為:d{4}或[0⑼]{4}。REGEXP_SUBSTR函數返回匹配指定正式表達式的字符串,所以可以用來提取我們感興趣的文本。然后我們只需將其轉換為數字并履行比較便可。
Given a source string, how do we split it up into separatecolumns, based on changes of case and alpha-to-numeric, such that this.
給定1個元字符串,要求依照指定規則(基于字母大小寫和字母到數字的變化)分割為多個列:
Becomes this. 分割后:
The source data is set up like this. 元數據以下:
The first part of the string is an initcap word, so it starts witha capital letter between "A" and "Z". We identify a singlecharacter using the "[]" operator, and ranges are represented using"-", like "A-Z", "a-z" or "0⑼". So ifwe are looking for a single character that is a capital letter, we need to lookfor "[A-Z]". That needs to be followed by lower case letters, whichwe now know is "[a-z]", but we need 1 or more of them, which issignified by the "+" operator. So to find an initcap word, we need tosearch for "[A-Z][a-z]+". Since we want the first occurrence of this,we can use the following.
字符串第1部份為大寫字母,可能為A-Z。我們使用[]操作符辨認單個字符,至于范圍則用“-”,例如“A-Z”,"a-z"或"0⑼"。所以如果我們需要找大寫的首字母則用“[A-Z]”。其后緊隨著的是若干小寫字母,可以用+表示若干(1個或多個)。組合起來的正則表達式即為:[A-Z][a-z]+,這樣拆分出的第1列方法有了。
The second part of the string is a group of 1 or more uppercaseletters. We know we need to use the "[A-Z]+" pattern, but we need tomake sure we don't get the first capital letter, so we look for the secondoccurrence.
第2部份是1組包括1個或多個大寫字母。我們知道需要用模式:[A-Z]+,但是為了不和第1部份沖突,我們指明匹配其第2次出現的文本。
The next part is the first occurrence of a group of numbers.
下1部份是1組純數字。
The next part is a group of lower case letters. We don't to pickup those from the initcap word, so we must look for the second occurrence oflower case letters.
下1部份是1組小寫字母,一樣斟酌了不和第1部份沖突:
Finally, we have a group of numbers, which is the secondoccurrence of this pattern.
最后,是1組數字:
Putting that all together, we have the following query, whichsplits the data into separate columns.
將以上每部份正則表達式的輸出分別作為獨立字段:
We need to pull out a group of characters from a "/"delimited string, optionally enclosed by double quotes. The data looks likethis.
我們需要從1個字符串(含有分隔字符/和雙引號” ”)中提取1組字符,原始數據以下:
We are looking for 1 or more characters that are not"/", which we do using "[^/]+". The "^" in thebrackets represents NOT and "+" means 1 or more. We also want toremove optional double quotes, so we add that as a character we don't want,giving us "[^/"]+". So if we want the data from the thirdcolumn, we need the third occurrence of this pattern.
我們要找1個或多個非“/“字符,可使用”[^/]+“。^在方括號中表示NOT。我們還需要移除可選的雙引號所以需要使用[^/”]+。所以如果我們需要獲得第3次出現的字符串:
We need to take an initcap string and separate the words. The datalooks like this.
我們需要提取首字母大寫的字符串并將其分離。原始數據以下:
We need to find each uppercase character "[A-Z]". Wewant to keep that character we find, so we will make that pattern asub-expression "([A-Z])", allowing us to refer to it later. For eachmatch, we want to replace it with a space, plus the matching character. Thespace is pretty obvious, but we need to use "1" to signify the textmatching the first sub expression. So we will replace the matching pattern witha space and itself, " 1". We don't want to replace the first letterof the string, so we will start at the second occurrence.
我們需要使用[A-Z]找到每一個大寫字符。我們需要保存找到的字符,所以我們使用1個子表達式([A-Z]),以便后續對其援用。對每個匹配,我們想使用1個空格替換,加上匹配到的字符。空格是相當明顯的,但我們需要使用”1”表示第1個子表達式匹配的文本。所以我們替換匹配模式使用1個空格和其本身,即”1”。我們不想替換字符串的第1個字母,所以我們從第2個字符開始:
We have a specific pattern of digits (9 99:99:99) and we want toknow the location of the pattern in our data.
我們有1個指定數字模式(999:99:99)并且我們想知道模式在我們數據中所處位置。
We know we are looking for groups of numbers, so we can use"[0⑼]" or "d". We know the amount of digits in eachgroup, which we can indicate using the "{n}" operator, so we simplydescribe the pattern we are looking for.
我們知道我們正在找1組數字,所以使用"[0⑼]"或"d"。我們知道每組數字的數量,所以可使用{n}操作符,所以我們簡單描寫1下模式:
We have strings containing parentheses. We want to return the textwithin the parentheses for those rows that contain parentheses.
我們有包括在括號內的字符串。我們想只想返回括號內的字符串。
The basic pattern for text between parentheses is"(.*)". The "" characters are escapes for theparentheses, making them literals. Without the escapes they would be assumed todefine a sub-expression. That pattern alone is fine to identify the rows of interestusing
a REGEXP_LIKE
operator,but it is not appropriate in a REGEXP_SUBSTR
, as itwould return the parentheses also. To omit the parentheses we need to include asub-expression inside the literal parentheses "((.*))". We can then REGEXP_SUBSTR
using
thefirst sub expression.
匹配括號內文本的模式基本寫法為:“(.*)”。是轉義字符,使跟在其后的字符變成字面值。但是這個模式用在REGEXP_SUBSTR時會連括號1起返回。為了疏忽括號我們需要在字面括號內部包括子表達式:"((.*))".
REGEXP_SUBSTR(data, '((.*))', 1, 1, 'i', 1) AS without_parentheses
注意:REGEXP_SUBSTR(data,'((.*))', 1, 1, 'i', 1) 中最后的i代碼不辨別大小寫,最后1個“1”代表返回哪一個子表達式匹配的文本。(范圍0⑼)
We need to know how many times a block of 4 digits appears intext. The data looks like this.
我們需要知道4個數字的塊在字符串中出現的次數。看原始數據:
We can identify digits using "d" or "[0⑼]"and the "{4}" operator signifies 4 of them, so using"d{4}" or "[0⑼]{4}" with the REGEXP_COUNT
functionseems to be a valid option.
我們可以用表達式:d 或[0⑼]和{4}操作符辨認4個數字的塊。
We need to identify invalid email addresses. The data looks likethis.
我們需要校驗郵箱地址,原始數據以下:
The following test gives us email addresses that approximate toinvalid email address formats.
以下測試給我們近似不合法的郵箱。
-----------------------------
Dylan Presents.
上一篇 計算機網絡綜合實驗