REGEXP_INSTR function - Amazon Redshift

REGEXP_INSTR function

Searches a string for a regular expression pattern and returns an integer that indicates the beginning position or ending position of the matched substring. If no match is found, then the function returns 0. REGEXP_INSTR is similar to the POSITION function, but lets you search a string for a regular expression pattern. For more information about regular expressions, see POSIX operators and Regular expression in Wikipedia.

Syntax

REGEXP_INSTR( source_string, pattern [, position [, occurrence] [, option [, parameters ] ] ] ] )

Arguments

source_string

A string expression, such as a column name, to be searched.

pattern

A UTF-8 string literal that represents a regular expression pattern. For more information, see POSIX operators.

position

(Optional) A positive INTEGER that indicates the position within source_string to begin searching. The position is based on the number of characters, not bytes, so that multibyte characters are counted as single characters. The default is 1. If position is less than 1, the search begins at the first character of source_string. If position is greater than the number of characters in source_string, the result is 0.

occurrence

(Optional) A positive INTEGER that indicates which occurrence of the pattern to use. REGEXP_INSTR skips the first occurrence-1 matches. The default is 1. If occurrence is less than 1 or greater than the number of characters in source_string, the search is ignored and the result is 0.

option

(Optional) A value that indicates whether to return the position of the first character of the match (0) or the position of the first character following the end of the match (1). A nonzero value is the same as 1. The default value is 0.

parameters

(Optional) One or more string literals that indicate how the function matches the pattern. The possible values are the following:

  • c – Perform case-sensitive matching. The default is to use case-sensitive matching.

  • i – Perform case-insensitive matching.

  • e – Extract a substring using a subexpression.

    If pattern includes a subexpression, REGEXP_INSTR matches a substring using the first subexpression in pattern. REGEXP_INSTR considers only the first subexpression; additional subexpressions are ignored. If the pattern doesn't have a subexpression, REGEXP_INSTR ignores the 'e' parameter.

  • p – Interpret the pattern with Perl Compatible Regular Expression (PCRE) dialect. For more information about PCRE, see Perl Compatible Regular Expressions in Wikipedia.

Return type

Integer

Examples

The following examples use data from the USERS table in the TICKIT sample database. For more information, see Sample database.

To search for the @ character that begins a domain name and returns the starting position of the first match, use the following example.

SELECT email, REGEXP_INSTR(email, '@[^.]*') FROM users ORDER BY userid LIMIT 4; +-----------------------------------------------+--------------+ | email | regexp_instr | +-----------------------------------------------+--------------+ | Etiam.laoreet.libero@sodalesMaurisblandit.edu | 21 | | Suspendisse.tristique@nonnisiAenean.edu | 22 | | amet.faucibus.ut@condimentumegetvolutpat.ca | 17 | | sed@lacusUtnec.ca | 4 | +-----------------------------------------------+--------------+

To search for variants of the word Center and returns the starting position of the first match, use the following example.

SELECT venuename, REGEXP_INSTR(venuename,'[cC]ent(er|re)$') FROM venue WHERE REGEXP_INSTR(venuename,'[cC]ent(er|re)$') > 0 ORDER BY venueid LIMIT 4; +-----------------------+--------------+ | venuename | regexp_instr | +-----------------------+--------------+ | The Home Depot Center | 16 | | Izod Center | 6 | | Wachovia Center | 10 | | Air Canada Centre | 12 | +-----------------------+--------------+

To find the starting position of the first occurrence of the string FOX, using case-insensitive matching logic, use the following example.

SELECT REGEXP_INSTR('the fox', 'FOX', 1, 1, 0, 'i'); +--------------+ | regexp_instr | +--------------+ | 5 | +--------------+

To use a pattern written in PCRE dialect to locate words containing at least one number and one lowercase letter, use the following example. It uses the ?= operator, which has a specific look-ahead connotation in PCRE. This example finds the starting position of the second such word.

SELECT REGEXP_INSTR('passwd7 plain A1234 a1234', '(?=[^ ]*[a-z])(?=[^ ]*[0-9])[^ ]+', 1, 2, 0, 'p'); +--------------+ | regexp_instr | +--------------+ | 21 | +--------------+

To use a pattern written in PCRE dialect to locate words containing at least one number and one lowercase letter, use the following example. It uses the ?= operator, which has a specific look-ahead connotation in PCRE. This example finds the starting position of the second such word, but differs from the previous example in that it uses case-insensitive matching.

SELECT REGEXP_INSTR('passwd7 plain A1234 a1234', '(?=[^ ]*[a-z])(?=[^ ]*[0-9])[^ ]+', 1, 2, 0, 'ip'); +--------------+ | regexp_instr | +--------------+ | 15 | +--------------+