Question about regular expression and loop #535

chenchenguo · 2018-11-29T20:20:33Z

Hi, I met a problem when I want to implement a loop in regular expression.
What if I want to search a specified list of letters, like search "a", "b", "c", "d",..."z", sequentially?
right now, what I am implementing this through writing 26 regular expressions, which I know is stupid, but how to figure it out in just one loop or something else?
Thanks in advance.

ChadFibke · 2018-11-29T20:32:33Z

Hey @chenchenguo,

Can you provide us with a bit more context (what is the input, output, and what did you want to accomplish)?

chenchenguo · 2018-11-29T20:45:43Z

Hey @chenchenguo,

Can you provide us with a bit more context (what is the input, output, and what did you want to accomplish)?

Thanks @ChadFibke

The data is all those words filtered from words.txt, which has same starting and ending letter like "bob", "kick".
Now I want to count the number that how many words for each letter (from "a" to "z")? Like the number of words for starting and ending with "a" is 20, starting and ending with "b" maybe is 50.
Right now my implementation is to write down for each letter: a <- str_subset(data, "^a"); a_number <- length(a). And I repeated it for 26 times.
Is there any loop methods to figure thsi out?
Thanks a lot.

zeeva85 · 2018-11-29T21:39:10Z

words <- readLines("words.txt")

output <- vector("character", length(letters))
for (i in letters) {
output[match(i, letters)] <- paste0("^", i, ".*", i, "$") # this is regex
}

This gives the regex

df <- tibble(letters,
start_letter = seq_along(letters)) # make tibble

for (i in output) {
df [match(i, output), 2] <- sum(str_count(words, pattern = i))
}

frequency table

I think should work

ChadFibke · 2018-11-29T21:44:10Z

Ah I found something as well:

count_all_hits<-function(a_charater_vector, pattern_list){
  
  require(purrr)
  
  # Lets make a list for our results 
  
  results <- list()


  

for ( match in pattern_list) {
  
results[[sprintf("Matches for %s",match)]] <- a_charater_vector[grepl(sprintf("^%s.*%s$", match, match), a_charater_vector)]
  
}


return(map(results, length))

}




count_all_hits(a_charater_vector = wordss, pattern_list = letters)

ChadFibke · 2018-11-29T21:47:37Z

sprintf() is definitely a function to look into. sprint will allow you to expand variable names in a character string. The sprintf("Matches for %s",match) will place the character value of the match object into the string. The %s means to print a string with the character value found in match.

ChadFibke · 2018-11-29T21:51:10Z

Also.. I converted all the string to lowercase using:

wordss<-str_to_lower(readLines("./words.txt"))

If you do not want to count, and actually want to see the words remove then replace:

return(map(results, length))

# with
return(results)
# which will give you a list with all the found words.

bassamjaved · 2018-11-29T21:52:11Z

Here's another possibility...

There's an exercise from Hadley's R for Data Science in the strings chapter that can be adapted for this.

You could create a string to the effect "^a|^b|^c" and continue all the way to the letter 'z'. Let's call that string letter_match, which we'll use to match up with regex. Then,

#find and extract matches
matches <- str_extract(words, letter_match)

#create a frequency table
Letters <- table(matches)

ChadFibke · 2018-11-29T21:54:50Z

@bassamjaved,

Are you able to use that to find words that start with and end with a, b, c....z?

bassamjaved · 2018-11-29T22:01:09Z

@ChadFibke I just tried it with replacing letter_match with “a$|b$|c$” all the way to z. Checked the first few entries of `words` and it seems to work. Of course, like you said though, you should use str_to_lower() to make `words` all lowercase (more so if you’re trying to find words beginning with each letter)

…

On Nov 29, 2018, at 1:54 PM, FIBKE ***@***.***> wrote: @bassamjaved <https://github.com/bassamjaved>, Are you able to use that to find words that start with and end with a, b, c....z? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#535 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ao_FKra9gDTsKQJI_kosXSIQXWFbXgHmks5u0FetgaJpZM4Y6cV1>.

bassamjaved · 2018-11-29T22:04:26Z

Ah but I see you want start and end with the same letter. Okay, no I haven't tried that with this particular method...

chenchenguo · 2018-11-29T22:10:09Z

@zeeva85
Thanks, your function is so concise and useful.
For the df[match(i, output), 2] what is the meaning 2 here? The start letter row?

chenchenguo · 2018-11-29T22:11:56Z

Thanks @ChadFibke
I will try your suggestion

zeeva85 · 2018-11-29T22:13:53Z

@zeeva85
Thanks, your function is so concise and useful.
For the df[match(i, output), 2] what is the meaning 2 here? The start letter row?

Correct, sum the values then replace the 1:26 in 2nd column ("start_letter")

This should work also i think df[match(i, output), "start_letter"], its more explicit and probably better, prevents errors

df[row, column]

chenchenguo · 2018-11-29T22:21:25Z

Ah but I see you want start and end with the same letter. Okay, no I haven't tried that with this particular method...

Yeah, the part of start and end with same letter is done.. I will try str_extract function here, thank you

chenchenguo · 2018-11-29T22:22:42Z

Also.. I converted all the string to lowercase using:
wordss<-str_to_lower(readLines("./words.txt"))
If you do not want to count, and actually want to see the words remove then replace:
return(map(results, length))

# with
return(results)
# which will give you a list with all the found words.

Nice, yeah I fogot to switch them to lower case, thanks for notice

bassamjaved · 2018-11-29T23:24:25Z

here's a revision of the method I posted earlier:

#create a regular expression pattern that begin with a letter and ends with the same letter
(letters_for_regex <- str_c("(", "^", letters, ".+", letters, "$", ")"))

#collapse into one string
(letter_match <- str_c(letters_for_regex, collapse = "|"))

#find and subset matches
(words_with_matches <- str_subset(words_lowercase, letter_match))

#extract letters in matches
(letters_in_matches <- str_extract(words_with_matches, "^."))

#create a frequency table
(Letters <- table(letters_in_matches))

ChadFibke · 2018-11-30T00:38:24Z

Well @chenchenguo has multiple answers to choose from now!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about regular expression and loop #535

Question about regular expression and loop #535

chenchenguo commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

zeeva85 commented Nov 29, 2018 •

edited

Loading

ChadFibke commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

bassamjaved commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

bassamjaved commented Nov 29, 2018 via email •

edited

Loading

bassamjaved commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

zeeva85 commented Nov 29, 2018 •

edited

Loading

chenchenguo commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

bassamjaved commented Nov 29, 2018

ChadFibke commented Nov 30, 2018

Question about regular expression and loop #535

Question about regular expression and loop #535

Comments

chenchenguo commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

zeeva85 commented Nov 29, 2018 • edited Loading

ChadFibke commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

bassamjaved commented Nov 29, 2018

ChadFibke commented Nov 29, 2018

bassamjaved commented Nov 29, 2018 via email • edited Loading

bassamjaved commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

zeeva85 commented Nov 29, 2018 • edited Loading

chenchenguo commented Nov 29, 2018

chenchenguo commented Nov 29, 2018

bassamjaved commented Nov 29, 2018

ChadFibke commented Nov 30, 2018

zeeva85 commented Nov 29, 2018 •

edited

Loading

bassamjaved commented Nov 29, 2018 via email •

edited

Loading

zeeva85 commented Nov 29, 2018 •

edited

Loading