Skip to content

Commit

Permalink
get_regex_matches top docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
jwmueller authored May 1, 2024
1 parent f748c97 commit 7a5f775
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions cleanlab_studio/utils/data_enrichment/enrich.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,10 @@ def get_regex_matches(
disable_warnings: bool = False,
) -> Union[pd.Series, List[str]]:
"""
Extracts the first valid regex pattern from the response using the passed in regex patterns for each example in the column data.
Extracts the first match to the provided regular expression pattern from each example in the provided column of data.
This function is useful in settings where you want to tune the regex patterns to extract the most valid outputs out of the column data but do not want to continually prompt the LLM to generate new outputs.
If a list of expressions is provided, the expressions are applied in order and first valid extraction is returned.
Use this function for: tuning regex patterns to extract the best outputs from the raw LLM responses for your dataset obtained via ``enrich_data()``, without having to re-run the LLM.
If a list of regular expressions is provided, the expressions are applied in order, and the first valid regex match is returned.
**Note:** Regex patterns should each specify exactly 1 group that is represents the desired characters to be extracted from the raw response using parenthesis like so '(<desired match group pattern>)'.
**Example 1:** `r'.*The answer is: (Bird|[Rr]abbit).*'` will extract strings that are the words 'Bird', 'Rabbit' or 'rabbit' after the characters "The answer is: " from the raw response text. This can be used when you are asking the LLM to output COT or additional responses, however, only care about saving the answer for downstream tasks.
Expand Down

0 comments on commit 7a5f775

Please sign in to comment.