String Functions

clean

clean(text, strip=true, match_pattern=None, replacement=None)

Removes extra whitespace from a string



Args:

    text (str): string to be cleaned

    strip (bool, optional): strips leading and trailing space on the cleaned up text

    match_pattern (str, optional): override default match pattern ('\s+')

    replacement (str, optional): override default replacement pattern (' ')



Returns:

    Returns a string with trimmed whitespace



Examples:

    clean('  ab   cd   e ') -> 'ab cd e'

concat

concat(*args: Any)

Concatenate strings from fields or raw values



Args:

    *args: Variable length argument list containing strings (with single quotes) or,

           field names (without quotes)



Returns:

    Returns a concatenated string



Examples:

    concat('hello ', 'world') -> 'hello world'

contains

contains(text, q)

Check if a word is contained within a string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    q (str): substring to find



Returns:

    Returns True if substring is found. Otherwise False is returned.



Examples:

    contains(INPUT_COL, word) -> True

echo

echo(text)

Mimics the value specified in a field or string



Args:

    text (str): string (with single quotes) or field name (without quotes)



Returns:

    Returns the same value provided



Examples:

    echo(INPUT_COL) -> mimics the value at INPUT_COL.

    echo('ADP') -> 'ADP'

find

find(text, q)

Find index of a word in the given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    q (str): substring to find



Returns:

    Returns index of the substring if found. Otherwise -1.



Examples:

    find(INPUT_COL, word) -> 2

format_date

format_date(text, input_format=None, output_format='mmddyyyy')

Return a formatted date string.



Allows you to format a date string to one of four date formats.



If only 2 digits are provided for the year, values 00-68 are mapped to 2000-2068, and values 69-99 are mapped to 1969-1999.



Args:

    text (str): Input string that should be parsable

    input_format (str, optional): Expected input format. Allowed values: "day-first", "month-first". If not

        passed to the function, will assume month-first.

    output_format (str, optional): Desired output format. Allowed values: "ddmmyyyy", "mmddyyyy", "yyyymmdd",

        or 'monddyyyy'. Defaults to "mmddyyy".



Returns:

    Formatted string representing date in input string



Examples:

    format_date('September 19 2019', output_format='mmddyyyy') -> '19/09/2019

    format_date('05.01.2019', input_format='day-first', output_format='monddyyyy')

        -> January 05, 2019

    format_date('05.01.2019', input_format='month-first',

        output_format='monddyyyy') -> May 01, 2019

islower

islower(text)

Check whether a string is lowercase



Args:

    text (str): string (with single quotes) or field name (without quotes)



Returns:

    Returns True if string is lowercase else False



Examples:

    islower('hello') -> 'True'

isupper

isupper(text)

Check whether a string is uppercase



Args:

    text (str): string (with single quotes) or field name (without quotes)



Returns:

    Returns True if string is uppercase else False



Examples:

    isupper('HELLO') -> 'True'

join

join(delim, l)

Joins a list of strings with a delimiter



Args:

    delim (str): delimiter to separate joined string

    l (str): a list of strings



Returns:

    Returns a joined string



Examples:

    join(',', ['a', 'b']) -> 'a,b'

left

left(text, n)

Get first n characters in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    n (int): number of starting characters to be retrieved



Returns:

    Returns the first n characters in the given string



Examples:

    left('abcdef, 3) -> 'abc'

len

len(text)

Finds length of a string



Args:

    text (str): string (with single quotes) or field name (without quotes)



Returns:

    Returns length of a string



Examples:

    len('abc') -> 3

lower

lower(text)

Lower case a string



Args:

    text (str): string (with single quotes) or field name (without quotes)



Returns:

    Returns input string with lower case



Examples:

    lower('HELLO') -> 'hello'

ltrim

ltrim(text, trim_char=None)

Removes leading spaces (or trims character)



Args:

    text (str): string (with single quotes) or field name (without quotes)

    trim_char (str, optional): leading character to be trimmed



Returns:

    Returns a trimmed string



Examples:

    ltrim('  hello') -> hello

    ltrim(INPUT_COL, 'c') -> ab

match

match(text, match_pattern, n=None, multiline=false)

Find a pattern in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    match_pattern (str): regex pattern to match

    n (int): if specified, returns the starting index of the nth match in the string

    multiline (bool): enables multiline mode in regex. Multiline mode allows

                      '^' and '$' to match the beginnings and endings of lines

                      as well as match the beginning and end of the whole string.



Returns:

    Returns the starting index when the pattern is found.

    Otherwise returns -1.



Examples:

    match('hello 20 world 35', '[a-z]+') -> ["hello", "world"]

    list_get(match('hello 20 world 35', '[\d]+'), 1) -> 35

mid

mid(text, l, r)

Retrieve exactly r characters from the lth index in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    l (int): index in the string (starting from 0)

    r (int): number of characters to extract



Returns:

    Returns a substring starting at lth index and r characters long



Examples:

    mid('hello world', 6, 3) -> 'wor'

    mid(INPUT_COL, l, r) -> 'abc'

proper

proper(text)

Capitalizes first character of each word in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)



Returns:

    Returns a string with word starting with a capital letter



Examples:

    proper('hello') -> 'Hello'

    proper('hello world') -> 'Hello World'

regex_find

regex_find(text, match_pattern)

Find a pattern in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    match_pattern (str): regex pattern to match



Returns:

    Returns the starting index when the pattern is found.

    Otherwise returns -1.



Examples:

    regex_find('hello world', 'world') -> 6

regex_get_all

regex_get_all(text, match_pattern, e=0, ignorecase=false)

Finds and returns all matching patterns in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    match_pattern (str): regex pattern to match

    e (int, optional): number of errors allowed in the match

    ignorecase (bool, optional): Whether casing should be ignored. By default false.



Returns:

    A list of all matches.

    Each match is represented as a list. If a `match_pattern` results in multiple match results

    (i.e. from capture groups), the list will have multiple items.



Examples:

    regex_get_all(INPUT_COL, match_pattern) -> [[word1, ...], [word2, ...], ...]

    regex_get_all('hello', 'world') -> []

    regex_get_all('hello world world', 'world') -> [['world'], ['world']]

    regex_get_all('(hello) (world)', 'hello world') -> [['hello', 'world']]

regex_replace

regex_replace(text, match_pattern, replacement)

Replace a pattern in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    match_pattern (str): regex pattern to be replaced

    replacement (str): string to replace match_pattern



Returns:

    Returns a string



    When match_pattern is found, it is replaced with replacement string

    When match_pattern is not found, original string is returned



Examples:

    regex_replace('hello world', 'world', 'universe') -> 'hello universe'

    regex_replace(INPUT_COL, 'hi', 'hello') -> 'hello there'

replace

replace(text, match_word, replacement_word)

Replace match_word with replacement_word in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    match_word (str): string to be replaced

    replacement_word (str): string to replace match_word



Returns:

    Return a string



    When match_word is found, it is replaced with replacement word

    When match_word is not found, original string is returned



Examples:

    replace('hello world', 'world', 'universe') -> 'hello universe'

    replace(INPUT_COL, 'hi', 'hello') -> 'hello there'

right

right(text, n)

Retrieve last n characters in a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)

    n (int): number of trailing characters to return



Returns:

    Returns last n characters in a given string



Examples:

    right('hello world', 5) -> 'world'

    right(INPUT_COL, 5) -> 'world'

rtrim

rtrim(text, trim_char=None)

Removes trailing spaces (or trim character)



Args:

    text (str): string (with single quotes) or field name (without quotes)

    trim_char (str, optional): trailing character to be trimmed



Returns:

    Returns a trimmed string



Examples:

    rtrim('abc', 'c') -> ab

    rtrim(INPUT_COL, trim_char) -> ab

split

split(text, split_char, n=None, maxsplit=0, should_clean=true)

Split a string and return the array of splits



Args:

    text (str): string (with single quotes) or field name (without quotes)

    split_char (str): regex pattern passed to re.split

    n: to be deprecated

    maxsplit (int): max number of times the split should be performed

    should_clean (bool): whether to remove the whitespace at the boundaries of the input string before splitting it. Defaults to True.



Returns:

    Returns nth split of a string

    An ERROR will be return if the split index is out of bounds [-n to n-1].



Examples:

    split('a b c', ' ') -> ["a", "b", "c"]

    split(INPUT_COL, ' ') ->  ["a", "b", "c"]

split_get

split_get(text, split_char, n=None, maxsplit=0)

Split a string and returns the nth split



Args:

    text (str): string (with single quotes) or field name (without quotes)

    split_char (str): character on which to split the string

    n (int): split index to be returned

    maxsplit (int): maximum number of times the split should be performed



Returns:

    Returns nth split of a string

    An ERROR will be return if the split index is out of bounds [-n to n-1].



Examples:

    split_get('a b c', ' ', 1) -> b

    split_get('a b c', ' ', 1, maxsplit=1) -> 'b c'

    split_get(INPUT_COL, ' ', 1) -> a

trim

trim(text, trim_char=None)

Removes leading and trailing spaces (or trim character)



Args:

    text (str): string (with single quotes) or field name (without quotes)

    trim_char (str, optional): leading and trailing character to be trimmed



Returns:

    Returns a trimmed string



Examples:

    trim('cabc', 'c') -> ab

    trim(INPUT_COL, trim_char) -> ab

upper

upper(text)

Upper case a given string



Args:

    text (str): string (with single quotes) or field name (without quotes)



Returns:

    Returns string with uppercase



Examples:

    upper('hello') -> 'HELLO'