Work with strings with strin CHEATSHEET The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks. Detect Matches TRUE TRUE FALSE III str_detect(string, pattern) Detect the presence of a pattern match in a string. str_detect(fruit, "a") str_which(string, pattern) Find the indexes of strings that contain a pattern match. str_which(fruit, "a") str_count(string, pattern) Count the number of matches in a string. str_count(fruit, "a") str_locate(string, pattern) Locate the positions of pattern matches in a string. Also str_locate_all. str_locate(fruit, "a") Subset Strings str_sub(string, start = 1L, end = -1L) Extract substrings from a character vector. str_sub(fruit, 1, 3); str_sub(fruit, -2) str_subset(string, pattern) Return only the strings that contain a pattern match. str_subset(fruit, "b") str_extract(string, pattern) Return the first pattern match found in each string, as a vector. Also str_extract_all to return every pattern match. str_extract(fruit, "[aeiou]") str_match(string, pattern) Return the first pattern match found in each string, as a matrix with a column for each () group in pattern. Also str_match_all. str_match(sentences, "(a\the) (["]+)") Manage Lengths str_length(string) The width of strings (i.e. number of code points, which generally equals the number of characters), strjength(fruit) str_pad(string, width, side = c("left", "right", "both"), pad = "") Pad strings to constant width. str_pad(fruit, 17) str_trunc(string, width, side = c("right", "left", "center"), ellipsis = "...") Truncate the width of strings, replacing content with ellipsis. str_trunc(fruit, 3) str_trim(string, side = c("both", "left", "right")) Trim whitespace from the start and/or end of a string. str_trim(fruit) Mutate Strings Join and Split Order Strings ASTRING a string a string ASTRING a string t A String str_sub() <- value. Replace substrings by identifying the substrings with str_sub() and assigning into the results. str_sub(fruit, 1,3) <- "str" str_replace(string, pattern, replacement) Replace the first matched pattern in each string. str_replace(fruit, "a", "-") str_replace_all(string, pattern, replacement) Replace all matched patterns in each string. str_replace_all(fruit, "a", "-") str_to_lower(string, locale = strings to lower case. str_to_lower(sentences) str_to_upper(string, locale = strings to upper case. str_to_upper(sentences) "en")1 Convert "en")1 Convert str_to_title(string, locale = "en")1 Convert strings to title case. str_to_title(sentences) @Stud 10 Helpers str_c(..., sep ="", collapse = NULL) Join multiple strings into a single string. str_c(letters, LETTERS) str_c(..., sep = ".collapse = NULL) Collapse a vector of strings into a single string. str_c(letters, collapse = "") str_dup(string, times) Repeat strings times times. str_dup(fruit, times = 2) str_split_fixed(string, pattern, n) Split a vector of strings into a matrix of substrings (splitting at occurrences of a pattern match). Also str_split to return a list of substrings. str_split_fixed(fruit, "" n=2) IwBtoll glue::glue(..., .sep ="", .envir = | parent.frame(), .open = "{", .close ="}") Create a string from strings and {expressions} to evaluate. glue::glue("Pi is {pi}") glue::glue_data(.x,.sep ="", .envir = ■ parent.frame(), .open = "{", .close = "}") Use a data frame, list, or environment to create a string from strings and {expressions} to evaluate. glue::glue_data(mtcars, "{rownames(mtcars)} has {hp} hp") RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at striner.tidvverse.ore apple banana pear apple str_order(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...J1 Return the vector of indexes that sorts a character vector. x[str_order(x)] str_sort(x, decreasing = FALSE, najast = TRUE, locale = "en", numeric = FALSE, ...J1 Sort a character vector. str_sort(x) str_conv(string, encoding) Override the encoding of a string. str_conv(fruit,"IS0-8859-l") str_view(string, pattern, match = NA) View HTML rendering of first regex match in each string. str_view(fruit, "[aeiou]") str_view_all(string, pattern, match = NA) View HTML rendering of all regex matches. str_view_all(fruit, "[aeiou]") str_wrap(string, width = 80, indent = 0, exdent = 0) Wrap strings into nicely formatted paragraphs. str_wrap(sentences, 20) 1 See bit.lv/IS0639-l for a complete list of locales. Diagrams from (ffiLVaudor*-stringr 1.2.0- Updated: 2017-10 Need to Know Pattern arguments in stringr are interpreted as regu la r exp ressio ns after any special characters have been parsed. In R, you write regular expressions as strings, sequences of characters surrounded by quotes ("") orsingle quotes("). Some characters cannot be represented directly in an R string. These must be represented as special characters, sequences of characters that have a specific meaning., e.g. Special Character Represents \\ \ \n new line Run ?""' to see a complete list Because of this, whenever a \ appears in a regular expression, you must write it as \\ in the string that represents the regular expression. Use writeLinesI) to see how R views your string after all special characters have been parsed. writeLines("\\.") #\. writeLines("\\ is a backslash") #\isa backslash INTERPRETATION Patterns in stringr are interpreted as regexs To change this default, wrap the pattern in one of: regex(pattern, ignore_case= FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE,...) Modifies a regex to ignore cases, match end of lines as well of end of strings, allow R comments within regex's , and/or to have . match everything including \n. str_detect("l", regex("i", TRUE)) fixed!) Matches raw bytes but will miss some characters that can be represented in multiple ways (fast). str_detect("\u0130", fixed("i")) coll() Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow). str_detect("\u0130", coll("i", TRUE, locale = "tr")) boundary!) Matches boundaries between characters, line_breaks, sentences, or words. str_split(sentences, boundary("word")) ©Stud 10 Regular Expressions Regular expressions, or regexps, are a concise language for describing patterns in strings. MATCH CHARACTERS see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx) string (type this) regex p (to mean this) matches (which matches this) example ■ (etc.) a (etc.) see( 'a") |bcABC123 •!?\(){} \\. \. see( •\V) abcABC123 |!?\(){} W V see( •\\n abcABC123 •!?\(){} \\? \? see( •\\r) abcABC123 •!?\(){} WW w \ see( 'WW") abcABC123 •!!(){} \\( \( ( see( •\\n abcABC123 •!?\(){} \\) \) ) see( 'W)"l abcABC123 •!?\(){} \\{ \{ { see( •\\n abcABC123 •!?\()l \\) \} } see( "WJ") abcABC123 •!?\(){} \\n \n new line (return) see( '\\n") abcABC123 •!?\(){} \\t \t tab see( ■\\t") abcABC123| •!?\(){} Us \s any whitespace (\S fornon-whitespaces) see( •\\s") abcABC123 •!?\(){} \\d \d any digit f\D for non-digits) see( •\\d") abcABCH •!?\(){} \\w \w any word character (\\N for non-word chars) see( '\\w") abcABC123 •!?\(){} \\b \b word boundaries see( ■\\b") |abc||ABCJ|l23; •!?\(){} [:digit:] 1 digits see( '[:digit:]") abcABC123 •!?\(){} [:alpha:] letters see( '[:alpha:]") abcABC123 •!?\(){} [:lower:] lowercase letters see( '[:lower:]") abcABC123 •!?\(){} [:upper:] 1 uppercase letters see( '[:upper:]") abcABC123 •!?\(){} [:alnum:] letters and numbers see( 'talnum:]") abcABC123 •!?\(){} [:punct:] punctuation see( '[:punct:]") abcABC123 •!?\(){} [:graph:] 1 letters, numbers, and punctuation see( '[:graph:]") abcABC123 •!?\(){} [:space:] 1 space characters (i.e. \s) see( '[:space:]") abo]AB(|l23| mni [:blank:] 1 space and tab (but not new line) see( '[:blank:]") abo]AB(|l23| mm every character except a new line see( '•") abcABC123 mm [:space:] j new line [:blank:] space i tab [:graph:] [:punct:] ; ? ■ \ I / ' '[]{}() [:alnum:] [•digit:] 0123456789 [:alpha:] @ # $ ALTERNATES 1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]] QUANTIFIERS [:Iower:] [:upper:] a b c d e f A B C D E F g h i j k 1 G H 1 J K L m n o p q r M N 0 P Q R s t u V w X S T U V W X z Z alt<-function(rx) str_view_all("abcde", rx) regexp matches example | or alt("ab|d") abcde [abe] one of alt("[abe]") abcde [Aabe] anything but alt("[Aabe]") abcde [a-c] range alt("[a-c]") abcde quant <- function(rx) str_view_all(".a.aa.aaa", rx) ANCHORS ■OCT anchor<-function(rx) str_view_all("aaa", rx) regexp matches example Aa start of string anchor("Aa") aaa a$ end of string anchor("a$") aaa □DO hhuqm'1 .......[2H;::h30 rMIHmH'] regexp matches example ? zero or one quantC a?") .a.aa.aaa * zero or more quantC a*") .a.aa.aaa + one or more quantC a+") .a.aa.aaa a{n} exactly n quantC a{2}") .a.aa.aaa a{n,} n or more quantC a{2,}") .a.aa.aaa {n, m} between n and m quantC a{2,4}") .a.aa.aaa LOOKAROUNDS look<-function(rx) str_view_all("bacad", rx) regexp matches a(?=c) followed by (?! ) not followed by (?<=b)a preceded by (?