NLP Functions
nlp_get_entities
nlp_get_entities(text, label=None)
Extracts entities from natural language text. Args: text (str): the text of interest label (str): filters for a specific kind of entity, such as PERSON or ORG. Defaults to None, which gets all entity types. Returns: Returns a dictionary containing entities extracted from the text Examples: nlp_get_entities('The Massachusetts Institute of Technology is a private research university in Cambridge, Massachusetts, United States.') -> { 'entities': [ {'char_pos': {'end': 41, 'start': 0}, 'entity': u'The Massachusetts Institute of Technology', 'label': u'ORG', 'word_pos': {'end': 5, 'start': 0}}, {'char_pos': {'end': 87, 'start': 78}, 'entity': u'Cambridge', 'label': u'GPE', 'word_pos': {'end': 12, 'start': 11}}, {'char_pos': {'end': 102, 'start': 89}, 'entity': u'Massachusetts', 'label': u'GPE', 'word_pos': {'end': 14, 'start': 13}}, {'char_pos': {'end': 117, 'start': 104}, 'entity': u'United States', 'label': u'GPE', 'word_pos': {'end': 17, 'start': 15}} ], 'status': 'OK' }
nlp_token_clean
nlp_token_clean(text, model=None, model_config=None)
Cleans a token according to the provided model. Args: text (str): the token of interest model (str): the name of a valid token model model_config (map): A map of options to configure the model Returns: The input token, cleaned according to the logic of the token model. Examples: nlp_token_clean('20IB0A1B', model='matcher:only-digits') -> '20180418'
nlp_token_find
nlp_token_find(text, model=None, separator=None, tokenizer=None, model_config=None, tokenizer_config=None)
Tokenizes the input string and returns the best scoring token according to the provided matcher. If given, will use specified tokenizer. Otherwise, will use the default tokenizer specified in the tokenmatcher class. If no tokenizer is specified or set as default, will use unigram tokenizer. Args: text (str): the text assumed to contain the token of interest model (str): the name of a valid token matcher separator (str): the string on which to to split text into tokens tokenizer (str): the name of a valid tokenizer model_config (map): A map of options to configure the model tokenizer_config (map): A map of options to configure the tokenizer Returns: The best scoring token according to the provided matcher logic. Examples: nlp_token_find('Due on 20IB-0A-1B', model='matcher:only-digits', separator=' ') -> '20IB-0A-1B'
nlp_token_find_all
nlp_token_find_all(text, model=None, separator=None, threshold=None, tokenizer=None, model_config=None, tokenizer_config=None)
Tokenizes the input string and returns all tokens with score above threshold, according to the provided matcher. If given, will use specified tokenizer. Otherwise, will use the default tokenizer specified in the tokenmatcher class. If no tokenizer is specified or set as default, will use unigram tokenizer. Args: text (str): the text assumed to contain the token of interest model (str): the name of a valid token matcher separator (str): the string on which to to split text into tokens threshold (float): the threshold for determining whether token fits the model, default=0.8. tokenizer (str): the name of a valid tokenizer model_config (map): A map of options to configure the model tokenizer_config (map): A map of options to configure the tokenizer Returns: The best scoring token according to the provided matcher logic. Examples: nlp_token_find_all('ID: 20I80A1B', model='matcher:only-digits', separator=' ') -> ['20I80A1B']
nlp_token_score
nlp_token_score(text, model=None, model_config=None)
Scores a token from 0 to 1.0 according to the provided matcher. Args: text (str): the token of interest model (str): the name of a valid token matcher model_config (map): A map of options to configure the model Returns: A score for the input token, from 0 to 1.0, according to the logic of the token matcher. Examples: nlp_token_score('20IB0A1B', model='matcher:only-digits') -> 0.75
nlp_token_select
nlp_token_select(*args: Any)
Returns the best scoring token, among provided inputs, according to the provided matcher. Args: args: dict containing: text1 .. textN (str): the tokens of interest. model (str): the name of a valid token matcher model_config (map): A map of options to configure the model Returns: The best scoring token according to the provided matcher logic. Examples: nlp_token_select('20IB-0A-1B', '2018-01-20', model='matcher:only-digits') -> '2018-01-20'