String functions

clean

clean(text, strip=true, match_pattern=None, replacement=None)

Removes extra whitespace from a string

Args:
    text (str): string to be cleaned
    strip (bool, optional): strips leading and trailing space on the cleaned up text
    match_pattern (str, optional): override default match pattern ('\s+')
    replacement (str, optional): override default replacement pattern (' ')

Returns:
    Returns a string with trimmed whitespace

Examples:
    clean('  ab   cd   e ') -> 'ab cd e'

concat

concat(*args: Any)

Concatenate strings from fields or raw values

Args:
    *args: Variable length argument list containing strings (with single quotes) or,
           field names (without quotes)

Returns:
    Returns a concatenated string

Examples:
    concat('hello ', 'world') -> 'hello world'

contains

contains(text, q)

Check if a word is contained within a string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    q (str): substring to find

Returns:
    Returns True if substring is found. Otherwise False is returned.

Examples:
    contains(INPUT_COL, word) -> True

echo

echo(text)

Mimics the value specified in a field or string

Args:
    text (str): string (with single quotes) or field name (without quotes)

Returns:
    Returns the same value provided

Examples:
    echo(INPUT_COL) -> mimics the value at INPUT_COL.
    echo('ADP') -> 'ADP'

find

find(text, q)

Find index of a word in the given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    q (str): substring to find

Returns:
    Returns index of the substring if found. Otherwise -1.

Examples:
    find(INPUT_COL, word) -> 2

format_date

format_date(text, input_format=None, output_format='mmddyyyy')

Return a formatted date string.

Allows you to format a date string to one of four date formats.

If only 2 digits are provided for the year, values 00-68 are mapped to 2000-2068, and values 69-99 are mapped to 1969-1999.

Args:
    text (str): Input string that should be parsable
    input_format (str, optional): Expected input format. Allowed values: "day-first", "month-first". If not
        passed to the function, will assume month-first.
    output_format (str, optional): Desired output format. Allowed values: "ddmmyyyy", "mmddyyyy", "yyyymmdd",
        or 'monddyyyy'. Defaults to "mmddyyy".

Returns:
    Formatted string representing date in input string

Examples:
    format_date('September 19 2019', output_format='mmddyyyy') -> '19/09/2019
    format_date('05.01.2019', input_format='day-first', output_format='monddyyyy')
        -> January 05, 2019
    format_date('05.01.2019', input_format='month-first',
        output_format='monddyyyy') -> May 01, 2019

islower

islower(text)

Check whether a string is lowercase

Args:
    text (str): string (with single quotes) or field name (without quotes)

Returns:
    Returns True if string is lowercase else False

Examples:
    islower('hello') -> 'True'

isupper

isupper(text)

Check whether a string is uppercase

Args:
    text (str): string (with single quotes) or field name (without quotes)

Returns:
    Returns True if string is uppercase else False

Examples:
    isupper('HELLO') -> 'True'

join

join(delim, l)

Joins a list of strings with a delimiter

Args:
    delim (str): delimiter to separate joined string
    l (str): a list of strings

Returns:
    Returns a joined string

Examples:
    join(',', ['a', 'b']) -> 'a,b'

left

left(text, n)

Get first n characters in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    n (int): number of starting characters to be retrieved

Returns:
    Returns the first n characters in the given string

Examples:
    left('abcdef, 3) -> 'abc'

len

len(text)

Finds length of a string

Args:
    text (str): string (with single quotes) or field name (without quotes)

Returns:
    Returns length of a string

Examples:
    len('abc') -> 3

lower

lower(text)

Lower case a string

Args:
    text (str): string (with single quotes) or field name (without quotes)

Returns:
    Returns input string with lower case

Examples:
    lower('HELLO') -> 'hello'

ltrim

ltrim(text, trim_char=None)

Removes leading spaces (or trims character)

Args:
    text (str): string (with single quotes) or field name (without quotes)
    trim_char (str, optional): leading character to be trimmed

Returns:
    Returns a trimmed string

Examples:
    ltrim('  hello') -> hello
    ltrim(INPUT_COL, 'c') -> ab

match

match(text, match_pattern, n=None, multiline=false)

Find a pattern in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    match_pattern (str): regex pattern to match
    n (int): if specified, returns the starting index of the nth match in the string
    multiline (bool): enables multiline mode in regex. Multiline mode allows
                      '^' and '$' to match the beginnings and endings of lines
                      as well as match the beginning and end of the whole string.

Returns:
    Returns the starting index when the pattern is found.
    Otherwise returns -1.

Examples:
    match('hello 20 world 35', '[a-z]+') -> ["hello", "world"]
    list_get(match('hello 20 world 35', '[\d]+'), 1) -> 35

mid

mid(text, l, r)

Retrieve exactly r characters from the lth index in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    l (int): index in the string (starting from 0)
    r (int): number of characters to extract

Returns:
    Returns a substring starting at lth index and r characters long

Examples:
    mid('hello world', 6, 3) -> 'wor'
    mid(INPUT_COL, l, r) -> 'abc'

proper

proper(text)

Capitalizes first character of each word in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)

Returns:
    Returns a string with word starting with a capital letter

Examples:
    proper('hello') -> 'Hello'
    proper('hello world') -> 'Hello World'

regex_find

regex_find(text, match_pattern)

Find a pattern in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    match_pattern (str): regex pattern to match

Returns:
    Returns the starting index when the pattern is found.
    Otherwise returns -1.

Examples:
    regex_find('hello world', 'world') -> 6

regex_get_all

regex_get_all(text, match_pattern, e=0, ignorecase=false)

Finds and returns all matching patterns in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    match_pattern (str): regex pattern to match
    e (int, optional): number of errors allowed in the match
    ignorecase (bool, optional): Whether casing should be ignored. By default false.

Returns:
    A list of all matches.
    Each match is represented as a list. If a `match_pattern` results in multiple match results
    (i.e. from capture groups), the list will have multiple items.

Examples:
    regex_get_all(INPUT_COL, match_pattern) -> [[word1, ...], [word2, ...], ...]
    regex_get_all('hello', 'world') -> []
    regex_get_all('hello world world', 'world') -> [['world'], ['world']]
    regex_get_all('(hello) (world)', 'hello world') -> [['hello', 'world']]

regex_replace

regex_replace(text, match_pattern, replacement)

Replace a pattern in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    match_pattern (str): regex pattern to be replaced
    replacement (str): string to replace match_pattern

Returns:
    Returns a string

    When match_pattern is found, it is replaced with replacement string
    When match_pattern is not found, original string is returned

Examples:
    regex_replace('hello world', 'world', 'universe') -> 'hello universe'
    regex_replace(INPUT_COL, 'hi', 'hello') -> 'hello there'

replace

replace(text, match_word, replacement_word)

Replace match_word with replacement_word in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    match_word (str): string to be replaced
    replacement_word (str): string to replace match_word

Returns:
    Return a string

    When match_word is found, it is replaced with replacement word
    When match_word is not found, original string is returned

Examples:
    replace('hello world', 'world', 'universe') -> 'hello universe'
    replace(INPUT_COL, 'hi', 'hello') -> 'hello there'

right(text, n)

Retrieve last n characters in a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)
    n (int): number of trailing characters to return

Returns:
    Returns last n characters in a given string

Examples:
    right('hello world', 5) -> 'world'
    right(INPUT_COL, 5) -> 'world'

rtrim

rtrim(text, trim_char=None)

Removes trailing spaces (or trim character)

Args:
    text (str): string (with single quotes) or field name (without quotes)
    trim_char (str, optional): trailing character to be trimmed

Returns:
    Returns a trimmed string

Examples:
    rtrim('abc', 'c') -> ab
    rtrim(INPUT_COL, trim_char) -> ab

split

split(text, split_char, n=None, maxsplit=0, should_clean=true)

Split a string and return the array of splits

Args:
    text (str): string (with single quotes) or field name (without quotes)
    split_char (str): regex pattern passed to re.split
    n: to be deprecated
    maxsplit (int): max number of times the split should be performed
    should_clean (bool): whether to remove the whitespace at the boundaries of the input string before splitting it. Defaults to True.

Returns:
    Returns nth split of a string
    An ERROR will be return if the split index is out of bounds [-n to n-1].

Examples:
    split('a b c', ' ') -> ["a", "b", "c"]
    split(INPUT_COL, ' ') ->  ["a", "b", "c"]

split_get

split_get(text, split_char, n=None, maxsplit=0)

Split a string and returns the nth split

Args:
    text (str): string (with single quotes) or field name (without quotes)
    split_char (str): character on which to split the string
    n (int): split index to be returned
    maxsplit (int): maximum number of times the split should be performed

Returns:
    Returns nth split of a string
    An ERROR will be return if the split index is out of bounds [-n to n-1].

Examples:
    split_get('a b c', ' ', 1) -> b
    split_get('a b c', ' ', 1, maxsplit=1) -> 'b c'
    split_get(INPUT_COL, ' ', 1) -> a

trim

trim(text, trim_char=None)

Removes leading and trailing spaces (or trim character)

Args:
    text (str): string (with single quotes) or field name (without quotes)
    trim_char (str, optional): leading and trailing character to be trimmed

Returns:
    Returns a trimmed string

Examples:
    trim('cabc', 'c') -> ab
    trim(INPUT_COL, trim_char) -> ab

upper

upper(text)

Upper case a given string

Args:
    text (str): string (with single quotes) or field name (without quotes)

Returns:
    Returns string with uppercase

Examples:
    upper('hello') -> 'HELLO'