Work with strings with strin
CHEATSHEET
The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.
Detect Matches
TRUE TRUE FALSE
III
str_detect(string, pattern) Detect the presence of a pattern match in a string.
str_detect(fruit, "a")
str_which(string, pattern) Find the indexes of strings that contain a pattern match.
str_which(fruit, "a")
str_count(string, pattern) Count the number of matches in a string.
str_count(fruit, "a")
str_locate(string, pattern) Locate the positions of pattern matches in a string. Also str_locate_all. str_locate(fruit, "a")
Subset Strings
str_sub(string, start = 1L, end = -1L) Extract substrings from a character vector.
str_sub(fruit, 1, 3); str_sub(fruit, -2)
str_subset(string, pattern) Return only the strings that contain a pattern match.
str_subset(fruit, "b")
str_extract(string, pattern) Return the first pattern match found in each string, as a vector. Also str_extract_all to return every pattern match. str_extract(fruit, "[aeiou]")
str_match(string, pattern) Return the first pattern match found in each string, as a matrix with a column for each () group in pattern. Also str_match_all.
str_match(sentences, "(a\the) (["]+)")
Manage Lengths
str_length(string) The width of strings (i.e. number of code points, which generally equals the number of characters), strjength(fruit)
str_pad(string, width, side = c("left", "right", "both"), pad = "") Pad strings to constant width. str_pad(fruit, 17)
str_trunc(string, width, side = c("right", "left", "center"), ellipsis = "...") Truncate the width of strings, replacing content with ellipsis.
str_trunc(fruit, 3)
str_trim(string, side = c("both", "left", "right")) Trim whitespace from the start and/or end of a string. str_trim(fruit)
Mutate Strings
Join and Split
Order Strings
ASTRING a string
a string ASTRING
a string
t
A String
str_sub() <- value. Replace substrings by identifying the substrings with str_sub() and assigning into the results.
str_sub(fruit, 1,3) <- "str"
str_replace(string, pattern, replacement) Replace the first matched pattern in each string. str_replace(fruit, "a", "-")
str_replace_all(string, pattern,
replacement) Replace all matched patterns in each string. str_replace_all(fruit, "a", "-")
str_to_lower(string, locale = strings to lower case. str_to_lower(sentences)
str_to_upper(string, locale = strings to upper case. str_to_upper(sentences)
"en")1 Convert
"en")1 Convert
str_to_title(string, locale = "en")1 Convert strings to title case. str_to_title(sentences)
@Stud
10
Helpers
str_c(..., sep ="", collapse = NULL) Join multiple strings into a single string. str_c(letters, LETTERS)
str_c(..., sep = ".collapse = NULL) Collapse a vector of strings into a single string.
str_c(letters, collapse = "")
str_dup(string, times) Repeat strings times times. str_dup(fruit, times = 2)
str_split_fixed(string, pattern, n) Split a vector of strings into a matrix of substrings (splitting at occurrences of a pattern match). Also str_split to return a list of substrings. str_split_fixed(fruit, "" n=2)
IwBtoll       glue::glue(..., .sep ="", .envir =
| parent.frame(), .open = "{", .close ="}") Create
a string from strings and {expressions} to evaluate. glue::glue("Pi is {pi}")
glue::glue_data(.x,.sep ="", .envir = ■ parent.frame(), .open = "{", .close = "}") Use a
data frame, list, or environment to create a string from strings and {expressions} to evaluate. glue::glue_data(mtcars, "{rownames(mtcars)} has {hp} hp")
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at striner.tidvverse.ore
apple
banana
pear
apple
str_order(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...J1 Return the vector of indexes that sorts a character vector. x[str_order(x)]
str_sort(x, decreasing = FALSE, najast = TRUE, locale = "en", numeric = FALSE, ...J1 Sort a character vector. str_sort(x)
str_conv(string, encoding) Override the encoding of a string. str_conv(fruit,"IS0-8859-l")
str_view(string, pattern, match = NA) View HTML rendering of first regex match in each string. str_view(fruit, "[aeiou]")
str_view_all(string, pattern, match = NA) View HTML rendering of all regex matches. str_view_all(fruit, "[aeiou]")
str_wrap(string, width = 80, indent = 0, exdent = 0) Wrap strings into nicely formatted paragraphs. str_wrap(sentences, 20)
1 See bit.lv/IS0639-l for a complete list of locales.
Diagrams from (ffiLVaudor*-stringr 1.2.0- Updated: 2017-10
Need to Know
Pattern arguments in stringr are interpreted as regu la r exp ressio ns after any special characters have been parsed.
In R, you write regular expressions as strings, sequences of characters surrounded by quotes ("") orsingle quotes(").
Some characters cannot be represented directly in an R string. These must be represented as special characters, sequences of characters that have a specific meaning., e.g.
Special Character Represents
\\ \
\n new line
Run ?""' to see a complete list
Because of this, whenever a \ appears in a regular expression, you must write it as \\ in the string that represents the regular expression.
Use writeLinesI) to see how R views your string after all special characters have been parsed.
writeLines("\\.") #\.
writeLines("\\ is a backslash") #\isa backslash
INTERPRETATION
Patterns in stringr are interpreted as regexs To change this default, wrap the pattern in one of:
regex(pattern, ignore_case= FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE,...) Modifies a regex to ignore cases, match end of lines as well of end of strings, allow R comments within regex's , and/or to have . match everything including \n.
str_detect("l", regex("i", TRUE))
fixed!) Matches raw bytes but will miss some characters that can be represented in multiple ways (fast). str_detect("\u0130", fixed("i"))
coll() Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow). str_detect("\u0130", coll("i", TRUE, locale = "tr"))
boundary!) Matches boundaries between characters, line_breaks, sentences, or words. str_split(sentences, boundary("word"))
©Stud
10
Regular Expressions
Regular expressions, or regexps, are a concise language for describing patterns in strings.
MATCH CHARACTERS
see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx)
string (type this)	regex p (to mean this)	matches (which matches this)	example			
	■ (etc.)	a (etc.)	see(	'a")	|bcABC123	•!?\(){}
\\.	\.		see(	•\V)	abcABC123	|!?\(){}
W	V		see(	•\\n	abcABC123	•!?\(){}
\\?	\?		see(	•\\r)	abcABC123	•!?\(){}
WW	w	\	see(	'WW")	abcABC123	•!!(){}
\\(	\(	(	see(	•\\n	abcABC123	•!?\(){}
\\)	\)	)	see(	'W)"l	abcABC123	•!?\(){}
\\{	\{	{	see(	•\\n	abcABC123	•!?\()l
\\)	\}	}	see(	"WJ")	abcABC123	•!?\(){}
\\n	\n	new line (return)	see(	'\\n")	abcABC123	•!?\(){}
\\t	\t	tab	see(	■\\t")	abcABC123|	•!?\(){}
Us	\s	any whitespace (\S fornon-whitespaces)	see(	•\\s")	abcABC123	•!?\(){}
\\d	\d	any digit f\D for non-digits)	see(	•\\d")	abcABCH	•!?\(){}
\\w	\w	any word character (\\N for non-word chars)	see(	'\\w")	abcABC123	•!?\(){}
\\b	\b	word boundaries	see(	■\\b")	|abc||ABCJ|l23;	•!?\(){}
	[:digit:] 1	digits	see(	'[:digit:]")	abcABC123	•!?\(){}
	[:alpha:]	letters	see(	'[:alpha:]")	abcABC123	•!?\(){}
	[:lower:]	lowercase letters	see(	'[:lower:]")	abcABC123	•!?\(){}
	[:upper:] 1	uppercase letters	see(	'[:upper:]")	abcABC123	•!?\(){}
	[:alnum:]	letters and numbers	see(	'talnum:]")	abcABC123	•!?\(){}
	[:punct:]	punctuation	see(	'[:punct:]")	abcABC123	•!?\(){}
	[:graph:] 1	letters, numbers, and punctuation	see(	'[:graph:]")	abcABC123	•!?\(){}
	[:space:] 1	space characters (i.e. \s)	see(	'[:space:]")	abo]AB(|l23|	mni
	[:blank:] 1	space and tab (but not new line)	see(	'[:blank:]")	abo]AB(|l23|	mm
		every character except a new line	see(	'•")	abcABC123	mm
[:space:] j new line
[:blank:]
space i tab
[:graph:]
[:punct:]
; ? ■ \ I / ' '[]{}()
[:alnum:] [•digit:]
0123456789
[:alpha:]
@ # $
ALTERNATES
1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]]
QUANTIFIERS
	[:Iower:]	[:upper:]
a	b c d e f	A B C D E F
g	h i j k 1	G H 1 J K L
m	n o p q r	M N 0 P Q R
s	t u V w X	S T U V W X
z		Z
		
alt<-function(rx) str_view_all("abcde", rx) regexp        matches example
| or alt("ab|d") abcde
[abe]       one of alt("[abe]") abcde
[Aabe] anything but alt("[Aabe]") abcde [a-c]        range alt("[a-c]") abcde
quant <- function(rx) str_view_all(".a.aa.aaa", rx)
ANCHORS
■OCT
anchor<-function(rx) str_view_all("aaa", rx) regexp        matches example
Aa start of string       anchor("Aa") aaa
a$ end of string        anchor("a$") aaa
□DO hhuqm'1
.......[2H;::h30
rMIHmH']
regexp	matches	example		
?	zero or one	quantC	a?")	.a.aa.aaa
*	zero or more	quantC	a*")	.a.aa.aaa
+	one or more	quantC	a+")	.a.aa.aaa
a{n}	exactly n	quantC	a{2}")	.a.aa.aaa
a{n,}	n or more	quantC	a{2,}")	.a.aa.aaa
{n, m}	between n and m	quantC	a{2,4}")	.a.aa.aaa
LOOKAROUNDS
look<-function(rx) str_view_all("bacad", rx)
regexp matches
a(?=c) followed by
(?! ) not followed by
(?<=b)a preceded by
(?<!b)a not preceded by
example
look("a(?=c)") look("a(?!c)") look("(?<=b)a") look("(?<!b)a")
bacad bacad bacad bacad
GROUPS ref <-function(rx) str_view_all("abbaab", rx)
Use parentheses to set precedent (order of evaluation) and create groups
regexp matches example
(ab|d)e        sets precedence       alt("(ab|d)e") abcde
Use an escaped number to refer to and duplicate parentheses groups that occur earlier in a pattern. Refer to each group by its order of appearance
string        regexp matches
(type this)    fto mean this)     (which matches this)
example
(the result is the same as ref("abba"))
\\1 \l(etc.)        first () group, etc.      ref("(a)(b)\\2\\l") abbaab
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at striner.tidvverse.ore ■ Diagrams from @LVaudor     stringr 1.2.0« Updated: 2017-10