R sub Function


sub() function replaces the first match of a string, if the parameter is a string vector, replaces the first match of all elements.

sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)


• pattern: regular expression, or string for fixed=TRUE
• x: string, the character vector
• replacement: string, character vector for replacement
• ignore.case: case sensitive or not
• perl: logical. Should perl-compatible regexps be used? Has priority over extended
• fixed: logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments
• useBytes: logical. If TRUE the matching is done byte-by-byte rather than character-by-character

> x <- "r tutorial"
> y <- sub("r ","HTML ", x)
> y

[1] "HTML tutorial"


> y <- sub("t.*r","BBBBB", x) #regular expression substitution
> y

[1] "r BBBBBial"


If fixed=TRUE, the pattern will be looked as a string only:

> y <- sub("t.*r","BBBBB", x, fixed=TRUE) #not regular expression
> y

[1] "r tutorial"


sub() can be used for vector replacement. Following example replaces one digit of all elements in the vector:

> x <- c("15 line 435", "good weather", "89 pigs")
> y <- sub("[[:digit:]]","",x)
> y

[1] "5 line 35" "good weather" "9 pigs"


Try to replace all digits of the vector elements:

> x <- c("line 435", "good weather", "89 pigs")
> y <- sub("[[:digit:]]+","",x)
> y

[1] " line 435"    "good weather" " pigs"


gsub() will replace all patterns of the target string:

> y <- gsub("[[:digit:]]","",x)
> y
[1] " line "       "good weather" " pigs"


Regular Expression Syntax:

Syntax
Description
\\d
Digit, 0,1,2 ... 9
\\D
Not Digit
\\s
Space
\\S
Not Space
\\w
Word
\\W
Not Word
\\t
Tab
\\n
New line
^
Beginning of the string
$
End of the string
\
Escape special characters, e.g. \\ is "\", \+ is "+"
|
Alternation match. e.g. /(e|d)n/ matches "en" and "dn"
Any character, except \n or line terminator
[ab]
a or b
[^ab]
Any character except a and b
[0-9]
All Digit
[A-Z]
All uppercase A to Z letters
[a-z]
All lowercase a to z letters
[A-z]
All Uppercase and lowercase a to z letters
i+
i at least one time
i*
i zero or more times
i?
i zero or 1 time
i{n}
i occurs n times in sequence
i{n1,n2}
i occurs n1 - n2 times in sequence
i{n1,n2}?
non greedy match, see above example
i{n,}
i occures >= n times
[:alnum:]
Alphanumeric characters: [:alpha:] and [:digit:]
[:alpha:]
Alphabetic characters: [:lower:] and [:upper:]
[:blank:]
Blank characters: e.g. space, tab
[:cntrl:]
Control characters
[:digit:]
Digits: 0 1 2 3 4 5 6 7 8 9
[:graph:]
Graphical characters: [:alnum:] and [:punct:]
[:lower:]
Lower-case letters in the current locale
[:print:]
Printable characters: [:alnum:], [:punct:] and space
[:punct:]
Punctuation character: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~
[:space:]
Space characters: tab, newline, vertical tab, form feed, carriage return, space
[:upper:]
Upper-case letters in the current locale
[:xdigit:]
Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f