String functions
clean
clean(text, strip=true, match_pattern=None, replacement=None)
Removes extra whitespace from a string
Args:
text (str): string to be cleaned
strip (bool, optional): strips leading and trailing space on the cleaned up text
match_pattern (str, optional): override default match pattern ('\s+')
replacement (str, optional): override default replacement pattern (' ')
Returns:
Returns a string with trimmed whitespace
Examples:
clean(' ab cd e ') -> 'ab cd e'
concat
concat(*args: Any)
Concatenate strings from fields or raw values
Args:
*args: Variable length argument list containing strings (with single quotes) or,
field names (without quotes)
Returns:
Returns a concatenated string
Examples:
concat('hello ', 'world') -> 'hello world'
contains
contains(text, q)
Check if a word is contained within a string
Args:
text (str): string (with single quotes) or field name (without quotes)
q (str): substring to find
Returns:
Returns True if substring is found. Otherwise False is returned.
Examples:
contains(INPUT_COL, word) -> True
echo
echo(text)
Mimics the value specified in a field or string
Args:
text (str): string (with single quotes) or field name (without quotes)
Returns:
Returns the same value provided
Examples:
echo(INPUT_COL) -> mimics the value at INPUT_COL.
echo('ADP') -> 'ADP'
find
find(text, q)
Find index of a word in the given string
Args:
text (str): string (with single quotes) or field name (without quotes)
q (str): substring to find
Returns:
Returns index of the substring if found. Otherwise -1.
Examples:
find(INPUT_COL, word) -> 2
format_date
format_date(text, input_format=None, output_format='mmddyyyy')
Return a formatted date string.
Allows you to format a date string to one of four date formats.
If only 2 digits are provided for the year, values 00-68 are mapped to 2000-2068, and values 69-99 are mapped to 1969-1999.
Args:
text (str): Input string that should be parsable
input_format (str, optional): Expected input format. Allowed values: "day-first", "month-first". If not
passed to the function, will assume month-first.
output_format (str, optional): Desired output format. Allowed values: "ddmmyyyy", "mmddyyyy", "yyyymmdd",
or 'monddyyyy'. Defaults to "mmddyyy".
Returns:
Formatted string representing date in input string
Examples:
format_date('September 19 2019', output_format='mmddyyyy') -> '19/09/2019
format_date('05.01.2019', input_format='day-first', output_format='monddyyyy')
-> January 05, 2019
format_date('05.01.2019', input_format='month-first',
output_format='monddyyyy') -> May 01, 2019
islower
islower(text)
Check whether a string is lowercase
Args:
text (str): string (with single quotes) or field name (without quotes)
Returns:
Returns True if string is lowercase else False
Examples:
islower('hello') -> 'True'
isupper
isupper(text)
Check whether a string is uppercase
Args:
text (str): string (with single quotes) or field name (without quotes)
Returns:
Returns True if string is uppercase else False
Examples:
isupper('HELLO') -> 'True'
join
join(delim, l)
Joins a list of strings with a delimiter
Args:
delim (str): delimiter to separate joined string
l (str): a list of strings
Returns:
Returns a joined string
Examples:
join(',', ['a', 'b']) -> 'a,b'
left
left(text, n)
Get first n characters in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
n (int): number of starting characters to be retrieved
Returns:
Returns the first n characters in the given string
Examples:
left('abcdef, 3) -> 'abc'
len
len(text)
Finds length of a string
Args:
text (str): string (with single quotes) or field name (without quotes)
Returns:
Returns length of a string
Examples:
len('abc') -> 3
lower
lower(text)
Lower case a string
Args:
text (str): string (with single quotes) or field name (without quotes)
Returns:
Returns input string with lower case
Examples:
lower('HELLO') -> 'hello'
ltrim
ltrim(text, trim_char=None)
Removes leading spaces (or trims character)
Args:
text (str): string (with single quotes) or field name (without quotes)
trim_char (str, optional): leading character to be trimmed
Returns:
Returns a trimmed string
Examples:
ltrim(' hello') -> hello
ltrim(INPUT_COL, 'c') -> ab
match
match(text, match_pattern, n=None, multiline=false)
Find a pattern in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
match_pattern (str): regex pattern to match
n (int): if specified, returns the starting index of the nth match in the string
multiline (bool): enables multiline mode in regex. Multiline mode allows
'^' and '$' to match the beginnings and endings of lines
as well as match the beginning and end of the whole string.
Returns:
Returns the starting index when the pattern is found.
Otherwise returns -1.
Examples:
match('hello 20 world 35', '[a-z]+') -> ["hello", "world"]
list_get(match('hello 20 world 35', '[\d]+'), 1) -> 35
mid
mid(text, l, r)
Retrieve exactly r characters from the lth index in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
l (int): index in the string (starting from 0)
r (int): number of characters to extract
Returns:
Returns a substring starting at lth index and r characters long
Examples:
mid('hello world', 6, 3) -> 'wor'
mid(INPUT_COL, l, r) -> 'abc'
proper
proper(text)
Capitalizes first character of each word in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
Returns:
Returns a string with word starting with a capital letter
Examples:
proper('hello') -> 'Hello'
proper('hello world') -> 'Hello World'
regex_find
regex_find(text, match_pattern)
Find a pattern in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
match_pattern (str): regex pattern to match
Returns:
Returns the starting index when the pattern is found.
Otherwise returns -1.
Examples:
regex_find('hello world', 'world') -> 6
regex_get_all
regex_get_all(text, match_pattern, e=0, ignorecase=false)
Finds and returns all matching patterns in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
match_pattern (str): regex pattern to match
e (int, optional): number of errors allowed in the match
ignorecase (bool, optional): Whether casing should be ignored. By default false.
Returns:
A list of all matches.
Each match is represented as a list. If a `match_pattern` results in multiple match results
(i.e. from capture groups), the list will have multiple items.
Examples:
regex_get_all(INPUT_COL, match_pattern) -> [[word1, ...], [word2, ...], ...]
regex_get_all('hello', 'world') -> []
regex_get_all('hello world world', 'world') -> [['world'], ['world']]
regex_get_all('(hello) (world)', 'hello world') -> [['hello', 'world']]
regex_replace
regex_replace(text, match_pattern, replacement)
Replace a pattern in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
match_pattern (str): regex pattern to be replaced
replacement (str): string to replace match_pattern
Returns:
Returns a string
When match_pattern is found, it is replaced with replacement string
When match_pattern is not found, original string is returned
Examples:
regex_replace('hello world', 'world', 'universe') -> 'hello universe'
regex_replace(INPUT_COL, 'hi', 'hello') -> 'hello there'
replace
replace(text, match_word, replacement_word)
Replace match_word with replacement_word in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
match_word (str): string to be replaced
replacement_word (str): string to replace match_word
Returns:
Return a string
When match_word is found, it is replaced with replacement word
When match_word is not found, original string is returned
Examples:
replace('hello world', 'world', 'universe') -> 'hello universe'
replace(INPUT_COL, 'hi', 'hello') -> 'hello there'
right
right(text, n)
Retrieve last n characters in a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
n (int): number of trailing characters to return
Returns:
Returns last n characters in a given string
Examples:
right('hello world', 5) -> 'world'
right(INPUT_COL, 5) -> 'world'
rtrim
rtrim(text, trim_char=None)
Removes trailing spaces (or trim character)
Args:
text (str): string (with single quotes) or field name (without quotes)
trim_char (str, optional): trailing character to be trimmed
Returns:
Returns a trimmed string
Examples:
rtrim('abc', 'c') -> ab
rtrim(INPUT_COL, trim_char) -> ab
split
split(text, split_char, n=None, maxsplit=0, should_clean=true)
Split a string and return the array of splits
Args:
text (str): string (with single quotes) or field name (without quotes)
split_char (str): regex pattern passed to re.split
n: to be deprecated
maxsplit (int): max number of times the split should be performed
should_clean (bool): whether to remove the whitespace at the boundaries of the input string before splitting it. Defaults to True.
Returns:
Returns nth split of a string
An ERROR will be return if the split index is out of bounds [-n to n-1].
Examples:
split('a b c', ' ') -> ["a", "b", "c"]
split(INPUT_COL, ' ') -> ["a", "b", "c"]
split_get
split_get(text, split_char, n=None, maxsplit=0)
Split a string and returns the nth split
Args:
text (str): string (with single quotes) or field name (without quotes)
split_char (str): character on which to split the string
n (int): split index to be returned
maxsplit (int): maximum number of times the split should be performed
Returns:
Returns nth split of a string
An ERROR will be return if the split index is out of bounds [-n to n-1].
Examples:
split_get('a b c', ' ', 1) -> b
split_get('a b c', ' ', 1, maxsplit=1) -> 'b c'
split_get(INPUT_COL, ' ', 1) -> a
trim
trim(text, trim_char=None)
Removes leading and trailing spaces (or trim character)
Args:
text (str): string (with single quotes) or field name (without quotes)
trim_char (str, optional): leading and trailing character to be trimmed
Returns:
Returns a trimmed string
Examples:
trim('cabc', 'c') -> ab
trim(INPUT_COL, trim_char) -> ab
upper
upper(text)
Upper case a given string
Args:
text (str): string (with single quotes) or field name (without quotes)
Returns:
Returns string with uppercase
Examples:
upper('hello') -> 'HELLO'