Title: | ANSI Control Sequence Aware String Functions |
---|---|
Description: | Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences. |
Authors: | Brodie Gaslam [aut, cre], Elliott Sales De Andrade [ctb], R Core Team [cph] (UTF8 byte length calcs from src/util.c) |
Maintainer: | Brodie Gaslam <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.0.6 |
Built: | 2024-11-14 04:29:07 UTC |
Source: | https://github.com/brodieg/fansi |
Terminal capabilities are assumed to include bright and 256 color SGR codes.
24 bit color support is detected based on the COLORTERM
environment
variable.
dflt_term_cap() dflt_css()
dflt_term_cap() dflt_css()
Default CSS may exceed or fail to cover the interline distance when two lines have background colors. To ensure lines are exactly touching use inline-block, although that has its own issues. Otherwise specify your own CSS.
character to use as default value for fansi
parameter.
Counterparts to R string manipulation functions that account for the effects of some ANSI X3.64 (a.k.a. ECMA-48, ISO-6429) control sequences.
Control characters and sequences are non-printing inline characters or sequences initiated by them that can be used to modify terminal display and behavior, for example by changing text color or cursor position.
We will refer to X3.64/ECMA-48/ISO-6429 control characters and sequences as "Control Sequences" hereafter.
There are four types of Control Sequences that fansi
can treat
specially:
"C0" control characters, such as tabs and carriage returns (we include delete in this set, even though technically it is not part of it).
Sequences starting in "ESC[", also known as Control Sequence Introducer (CSI) sequences, of which the Select Graphic Rendition (SGR) sequences used to format terminal output are a subset.
Sequences starting in "ESC]", also known as Operating System Commands (OSC), of which the subset beginning with "8" is used to encode URI based hyperlinks.
Sequences starting in "ESC" and followed by something other than "[" or "]".
Control Sequences starting with ESC are assumed to be two characters
long (including the ESC) unless they are of the CSI or OSC variety, in which
case their length is computed as per the ECMA-48 specification,
with the exception that OSC hyperlinks may be terminated
with BEL ("\a") in addition to ST ("ESC\"). fansi
handles most common
Control Sequences in its parsing algorithms, but it is not a conforming
implementation of ECMA-48. For example, there are non-CSI/OSC escape
sequences that may be longer than two characters, but fansi
will
(incorrectly) treat them as if they were two characters long. There are many
more unimplemented ECMA-48 specifications.
In theory it is possible to encode CSI sequences with a single byte
introducing character in the 0x40-0x5F range instead of the traditional
"ESC[". Since this is rare and it conflicts with UTF-8 encoding, fansi
does not support it.
Within Control Sequences, fansi
further distinguishes CSI SGR and OSC
hyperlinks by recording format specification and URIs into string state, and
applying the same to any output strings according to the semantics of the
functions in use. CSI SGR and OSC hyperlinks are known together as Special
Sequences. See the following sections for details.
Additionally, all Control Sequences, whether special or not,
do not count as characters, graphemes, or display width. You can cause
fansi
to treat particular Control Sequences as regular characters with
the ctl
parameter.
NOTE: not all displays support CSI SGR sequences; run
term_cap_test
to see whether your display supports them.
CSI SGR Control Sequences are the subset of CSI sequences that can be
used to change text appearance (e.g. color). These sequences begin with
"ESC[" and end in "m". fansi
interprets these sequences and writes new
ones to the output strings in such a way that the original formatting is
preserved. In most cases this should be transparent to the user.
Occasionally there may be mismatches between how fansi
and a display
interpret the CSI SGR sequences, which may produce display artifacts. The
most likely source of artifacts are Control Sequences that move
the cursor or change the display, or that fansi
otherwise fails to
interpret, such as:
Unknown SGR substrings.
"C0" control characters like tabs and carriage returns.
Other escape sequences.
Another possible source of problems is that different displays parse
and interpret control sequences differently. The common CSI SGR sequences
that you are likely to encounter in formatted text tend to be treated
consistently, but less common ones are not. fansi
tries to hew by the
ECMA-48 specification for CSI SGR control sequences, but not all
terminals do.
The most likely source of problems will be 24-bit CSI SGR sequences.
For example, a 24-bit color sequence such as "ESC[38;2;31;42;4" is a
single foreground color to a terminal that supports it, or separate
foreground, background, faint, and underline specifications for one that does
not. fansi
will always interpret the sequences according to ECMA-48, but
it will warn you if encountered sequences exceed those specified by
the term.cap
parameter or the "fansi.term.cap" global option.
fansi
will will also warn if it encounters Control Sequences that it
cannot interpret. You can turn off warnings via the warn
parameter, which
can be set globally via the "fansi.warn" option. You can work around "C0"
tabs characters by turning them into spaces first with tabs_as_spaces
or
with the tabs.as.spaces
parameter available in some of the fansi
functions
fansi
interprets CSI SGR sequences in cumulative "Graphic Rendition
Combination Mode". This means new SGR sequences add to rather than replace
previous ones, although in some cases the effect is the same as replacement
(e.g. if you have a color active and pick another one).
Operating System Commands are interpreted by terminal emulators typically to engage actions external to the display of text proper, such as setting a window title or changing the active color palette.
Some terminals have
added support for associating URIs to text with OSCs in a similar way to
anchors in HTML, so fansi
interprets them and outputs or terminates them as
needed. For example:
"\033]8;;xy.z\033\\LINK\033]8;;\033\\"
Might be interpreted as link to the URI "x.z". To make the encoding pattern clearer, we replace "\033]" with "<OSC>" and "\033\\" with "<ST>" below:
<OSC>8;;URI<ST>LINK TEXT<OSC>8;;<ST>
The cumulative nature of state as specified by SGR or OSC hyperlinks means
that unterminated strings that are spliced will interact with each other.
By extension, a substring does not inherently contain all the information
required to recreate its state as it appeared in the source document. The
default fansi
configuration terminates extracted substrings and prepends
original state to them so they present on a stand-alone basis as they did as
part of the original string.
To allow state in substrings to affect subsequent strings set terminate = FALSE
, but you will need to manually terminate them or deal with the
consequences of not doing so (see "Terminal Quirks").
By default, fansi
assumes that each element in an input character vector is
independent, but this is incorrect if the input is a single document with
each element a line in it. In that situation state from each line should
bleed into subsequent ones. Setting carry = TRUE
enables the "single
document" interpretation.
To most closely approximate what writeLines(x)
produces on your terminal,
where x
is a stateful string, use writeLines(fansi_fun(x, carry=TRUE, terminate=FALSE))
. fansi_fun
is a stand-in for any of the fansi
string
manipulation functions. Note that even with a seeming "null-op" such as
substr_ctl(x, 1, nchar_ctl(x), carry=TRUE, terminate=FALSE)
the output
control sequences may not match the input ones, but the output should look
the same if displayed to the terminal.
fansi
strings will be affected by any active state in strings they are
appended to. There are no parameters to control what happens in this case,
but fansi
provides functions that can help the user get the desired
behavior. state_at_end
computes the active state the end of a string,
which can then be prepended onto the input of fansi
functions so that
they are aware of the active style at the beginning of the string.
Alternatively, one could use close_state(state_at_end(...))
and pre-pend
that to the output of fansi
functions so they are unaffected by preceding
SGR. One could also just prepend "ESC[0m", but in some cases as
described in ?normalize_state
that is sub-optimal.
If you intend to combine stateful fansi
manipulated strings with your own,
it may be best to set normalize = TRUE
for improved compatibility (see
?normalize_state
.)
Some terminals (e.g. OS X terminal, ITerm2) will pre-paint the entirety of a
new line with the currently active background before writing the contents of
the line. If there is a non-default active background color, any unwritten
columns in the new line will keep the prior background color even if the new
line changes the background color. To avoid this be sure to use terminate = TRUE
or to manually terminate each line with e.g. "ESC[0m". The
problem manifests as:
" " = default background "#" = new background ">" = start new background "!" = restore default background +-----------+ | abc\n | |>###\n | |!abc\n#####| <- trailing "#" after newline are from pre-paint | abc | +-----------+
The simplest way to avoid this problem is to split input strings by any
newlines they contain, and use terminate = TRUE
(the default). A more
complex solution is to pad with spaces to the terminal window width before
emitting the newline to ensure the pre-paint is overpainted with the current
line's prevailing background color.
fansi
will convert any non-ASCII strings to UTF-8 before processing them,
and fansi
functions that return strings will return them encoded in UTF-8.
In some cases this will be different to what base R does. For example,
substr
re-encodes substrings to their original encoding.
Interpretation of UTF-8 strings is intended to be consistent with base R. There are three ways things may not work out exactly as desired:
fansi
, despite its best intentions, handles a UTF-8 sequence differently
to the way R does.
R incorrectly handles a UTF-8 sequence.
Your display incorrectly handles a UTF-8 sequence.
These issues are most likely to occur with invalid UTF-8 sequences, combining character sequences, and emoji. For example, whether special characters such as emoji are considered one or two wide evolves as software implements newer versions the Unicode databases.
Internally, fansi
computes the width of most UTF-8 character sequences
outside of the ASCII range using the native R_nchar
function. This will
cause such characters to be processed slower than ASCII characters. Unlike R
(at least as of version 4.1), fansi
can account for graphemes.
Because fansi
implements its own internal UTF-8 parsing it is possible
that you will see results different from those that R produces even on
strings without Control Sequences.
The maximum length of input character vector elements allowed by fansi
is
the 32 bit INT_MAX, excluding the terminating NULL. As of R4.1 this is the
limit for R character vector elements generally, but is enforced at the C
level by fansi
nonetheless.
It is possible that during processing strings that are shorter than INT_MAX
would become longer than that. fansi
checks for that overflow and will
stop with an error if that happens. A work-around for this situation is to
break up large strings into smaller ones. The limit is on each element of a
character vector, not on the vector as a whole. fansi
will also error on
your system if R_len_t
, the R type used to measure string lengths, is less
than the processed length of the string.
Nominally you can build and run this package in R versions between 3.1.0 and
3.2.1. Things should mostly work, but please be aware we do not run the test
suite under versions of R less than 3.2.2. One key degraded capability is
width computation of wide-display characters. Under R < 3.2.2 fansi
will
assume every character is 1 display width. Additionally, fansi
may not
always report malformed UTF-8 sequences as it usually does. One
exception to this is nchar_ctl
as that is just a thin wrapper around
base::nchar
.
Color each element in input with one of the "256 color" ANSI CSI SGR codes. This is intended for testing and demo purposes.
fansi_lines(txt, step = 1)
fansi_lines(txt, step = 1)
txt |
character vector or object that can be coerced to character vector |
step |
integer(1L) how quickly to step through the color palette |
character vector with each element colored
NEWS <- readLines(file.path(R.home('doc'), 'NEWS')) writeLines(fansi_lines(NEWS[1:20])) writeLines(fansi_lines(NEWS[1:20], step=8))
NEWS <- readLines(file.path(R.home('doc'), 'NEWS')) writeLines(fansi_lines(NEWS[1:20])) writeLines(fansi_lines(NEWS[1:20], step=8))
has_ctl
checks for any Control Sequence. You can check for different
types of sequences with the ctl
parameter. Warnings are only emitted for
malformed CSI or OSC sequences.
has_ctl(x, ctl = "all", warn = getOption("fansi.warn", TRUE), which)
has_ctl(x, ctl = "all", warn = getOption("fansi.warn", TRUE), which)
x |
a character vector or object that can be coerced to such. |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
which |
character, deprecated in favor of |
logical of same length as x
; NA values in x
result in NA values
in return
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
has_ctl("hello world") has_ctl("hello\nworld") has_ctl("hello\nworld", "sgr") has_ctl("hello\033[31mworld\033[m", "sgr")
has_ctl("hello world") has_ctl("hello\nworld") has_ctl("hello\nworld", "sgr") has_ctl("hello\033[31mworld\033[m", "sgr")
This simulates what rmarkdown
/ knitr
do to the output of an R markdown
chunk, at least as of rmarkdown
1.10. It is useful when we override the
knitr
output hooks so that we can have a result that still looks as if it
was run by knitr
.
html_code_block(x, class = "fansi-output")
html_code_block(x, class = "fansi-output")
x |
character vector |
class |
character vectors of classes to apply to the PRE HTML tags. It is the users responsibility to ensure the classes are valid CSS class names. |
character(1L) x
, with <PRE> and <CODE> HTML tags
applied and collapsed into one line with newlines as the line separator.
html_code_block(c("hello world")) html_code_block(c("hello world"), class="pretty")
html_code_block(c("hello world")) html_code_block(c("hello world"), class="pretty")
Arbitrary text may contain characters with special meaning in HTML, which may
cause HTML display to be corrupted if they are included unescaped in a web
page. This function escapes those special characters so they do not
interfere with the HTML markup generated by e.g. to_html
.
html_esc(x, what = getOption("fansi.html.esc", "<>&'\""))
html_esc(x, what = getOption("fansi.html.esc", "<>&'\""))
x |
character vector |
what |
character(1) containing any combination of ASCII characters
"<", ">", "&", "'", or "\"". These characters are special in HTML contexts
and will be substituted by their HTML entity code. By default, all
special characters are escaped, but in many cases "<>&" or even "<>" might
be sufficient.
@return |
Non-ASCII strings are converted to and returned in UTF-8 encoding.
Other HTML functions:
in_html()
,
make_styles()
,
to_html()
html_esc("day > night") html_esc("<SPAN>hello world</SPAN>")
html_esc("day > night") html_esc("<SPAN>hello world</SPAN>")
Helper function that assembles user provided HTML and CSS into a temporary text file, and by default displays it in the browser. Intended for use in examples.
in_html(x, css = character(), pre = TRUE, display = TRUE, clean = display)
in_html(x, css = character(), pre = TRUE, display = TRUE, clean = display)
x |
character vector of html encoded strings. |
css |
character vector of css styles. |
pre |
TRUE (default) or FALSE, whether to wrap |
display |
TRUE or FALSE, whether to display the resulting page in a browser window. If TRUE, will sleep for one second before returning, and will delete the temporary file used to store the HTML. |
clean |
TRUE or FALSE, if TRUE and |
character(1L) the file location of the page, invisibly, but keep in
mind it will have been deleted if clean=TRUE
.
Other HTML functions:
html_esc()
,
make_styles()
,
to_html()
txt <- "\033[31;42mHello \033[7mWorld\033[m" writeLines(txt) html <- to_html(txt) ## Not run: in_html(html) # spawns a browser window ## End(Not run) writeLines(readLines(in_html(html, display=FALSE))) css <- "SPAN {text-decoration: underline;}" writeLines(readLines(in_html(html, css=css, display=FALSE))) ## Not run: in_html(html, css) ## End(Not run)
txt <- "\033[31;42mHello \033[7mWorld\033[m" writeLines(txt) html <- to_html(txt) ## Not run: in_html(html) # spawns a browser window ## End(Not run) writeLines(readLines(in_html(html, display=FALSE))) css <- "SPAN {text-decoration: underline;}" writeLines(readLines(in_html(html, css=css, display=FALSE))) ## Not run: in_html(html, css) ## End(Not run)
Given a set of class names, produce the CSS that maps them to the default
8-bit colors. This is a helper function to generate style sheets for use
in examples with either default or remixed fansi
colors. In practice users
will create their own style sheets mapping their classes to their preferred
styles.
make_styles(classes, rgb.mix = diag(3))
make_styles(classes, rgb.mix = diag(3))
classes |
a character vector of either 16, 32, or 512 class names. The
character vectors are described in |
rgb.mix |
3 x 3 numeric matrix to remix color channels. Given a N x 3
matrix of numeric RGB colors |
A character vector that can be used as the contents of a style sheet.
Other HTML functions:
html_esc()
,
in_html()
,
to_html()
## Generate some class strings; order matters classes <- do.call(paste, c(expand.grid(c("fg", "bg"), 0:7), sep="-")) writeLines(classes[1:4]) ## Some Default CSS css0 <- "span {font-size: 60pt; padding: 10px; display: inline-block}" ## Associated class strings to styles css1 <- make_styles(classes) writeLines(css1[1:4]) ## Generate SGR-derived HTML, mapping to classes string <- "\033[43mYellow\033[m\n\033[45mMagenta\033[m\n\033[46mCyan\033[m" html <- to_html(string, classes=classes) writeLines(html) ## Combine in a page with styles and display in browser ## Not run: in_html(html, css=c(css0, css1)) ## End(Not run) ## Change CSS by remixing colors, and apply to exact same HTML mix <- matrix( c( 0, 1, 0, # red output is green input 0, 0, 1, # green output is blue input 1, 0, 0 # blue output is red input ), nrow=3, byrow=TRUE ) css2 <- make_styles(classes, rgb.mix=mix) ## Display in browser: same HTML but colors changed by CSS ## Not run: in_html(html, css=c(css0, css2)) ## End(Not run)
## Generate some class strings; order matters classes <- do.call(paste, c(expand.grid(c("fg", "bg"), 0:7), sep="-")) writeLines(classes[1:4]) ## Some Default CSS css0 <- "span {font-size: 60pt; padding: 10px; display: inline-block}" ## Associated class strings to styles css1 <- make_styles(classes) writeLines(css1[1:4]) ## Generate SGR-derived HTML, mapping to classes string <- "\033[43mYellow\033[m\n\033[45mMagenta\033[m\n\033[46mCyan\033[m" html <- to_html(string, classes=classes) writeLines(html) ## Combine in a page with styles and display in browser ## Not run: in_html(html, css=c(css0, css1)) ## End(Not run) ## Change CSS by remixing colors, and apply to exact same HTML mix <- matrix( c( 0, 1, 0, # red output is green input 0, 0, 1, # green output is blue input 1, 0, 0 # blue output is red input ), nrow=3, byrow=TRUE ) css2 <- make_styles(classes, rgb.mix=mix) ## Display in browser: same HTML but colors changed by CSS ## Not run: in_html(html, css=c(css0, css2)) ## End(Not run)
nchar_ctl
counts all non Control Sequence characters.
nzchar_ctl
returns TRUE for each input vector element that has non Control
Sequence sequence characters. By default newlines and other C0 control
characters are not counted.
nchar_ctl( x, type = "chars", allowNA = FALSE, keepNA = NA, ctl = "all", warn = getOption("fansi.warn", TRUE), strip ) nzchar_ctl( x, keepNA = FALSE, ctl = "all", warn = getOption("fansi.warn", TRUE) )
nchar_ctl( x, type = "chars", allowNA = FALSE, keepNA = NA, ctl = "all", warn = getOption("fansi.warn", TRUE), strip ) nzchar_ctl( x, keepNA = FALSE, ctl = "all", warn = getOption("fansi.warn", TRUE) )
x |
a character vector or object that can be coerced to such. |
type |
character(1L) partial matching
|
allowNA |
logical: should |
keepNA |
logical: should |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
strip |
character, deprecated in favor of |
nchar_ctl
and nzchar_ctl
are implemented in statically compiled code, so
in particular nzchar_ctl
will be much faster than the otherwise equivalent
nzchar(strip_ctl(...))
.
These functions will warn if either malformed or escape or UTF-8 sequences are encountered as they may be incorrectly interpreted.
Like base::nchar
, with Control Sequences excluded.
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
fansi
approximates grapheme widths and counts by using heuristics for
grapheme breaks that work for most common graphemes, including emoji
combining sequences. The heuristic is known to work incorrectly with
invalid combining sequences, prepending marks, and sequence interruptors.
fansi
does not provide a full implementation of grapheme break detection to
avoid carrying a copy of the Unicode grapheme breaks table, and also because
the hope is that R will add the feature eventually itself.
The utf8
package provides a
conforming grapheme parsing implementation.
The keepNA
parameter is ignored for R < 3.2.2.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
nchar_ctl("\033[31m123\a\r") ## with some wide characters cn.string <- sprintf("\033[31m%s\a\r", "\u4E00\u4E01\u4E03") nchar_ctl(cn.string) nchar_ctl(cn.string, type='width') ## Remember newlines are not counted by default nchar_ctl("\t\n\r") ## The 'c0' value for the `ctl` argument does not include ## newlines. nchar_ctl("\t\n\r", ctl="c0") nchar_ctl("\t\n\r", ctl=c("c0", "nl")) ## The _sgr flavor only treats SGR sequences as zero width nchar_sgr("\033[31m123") nchar_sgr("\t\n\n123") ## All of the following are Control Sequences or C0 controls nzchar_ctl("\n\033[42;31m\033[123P\a")
nchar_ctl("\033[31m123\a\r") ## with some wide characters cn.string <- sprintf("\033[31m%s\a\r", "\u4E00\u4E01\u4E03") nchar_ctl(cn.string) nchar_ctl(cn.string, type='width') ## Remember newlines are not counted by default nchar_ctl("\t\n\r") ## The 'c0' value for the `ctl` argument does not include ## newlines. nchar_ctl("\t\n\r", ctl="c0") nchar_ctl("\t\n\r", ctl=c("c0", "nl")) ## The _sgr flavor only treats SGR sequences as zero width nchar_sgr("\033[31m123") nchar_sgr("\t\n\n123") ## All of the following are Control Sequences or C0 controls nzchar_ctl("\n\033[42;31m\033[123P\a")
Re-encodes SGR and OSC encoded URL sequences into a unique decomposed form. Strings containing semantically identical SGR and OSC sequences that are encoded differently should compare equal after normalization.
normalize_state( x, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), carry = getOption("fansi.carry", FALSE) )
normalize_state( x, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), carry = getOption("fansi.carry", FALSE) )
x |
a character vector or object that can be coerced to such. |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
Each compound SGR sequence is broken up into individual tokens, superfluous tokens are removed, and the SGR reset sequence "ESC[0m" (or "ESC[m") is replaced by the closing codes for whatever SGR styles are active at the point in the string in which it appears.
Unrecognized SGR codes will be dropped from the output with a warning. The
specific order of SGR codes associated with any given SGR sequence is not
guaranteed to remain the same across different versions of fansi
, but
should remain unchanged except for the addition of previously uninterpreted
codes to the list of interpretable codes. There is no special significance
to the order the SGR codes are emitted in other than it should be consistent
for any given SGR state. URLs adjacent to SGR codes are always emitted after
the SGR codes irrespective of what side they were on originally.
OSC encoded URL sequences are always terminated by "ESC]\", and those between abutting URLs are omitted. Identical abutting URLs are merged. In order for URLs to be considered identical both the URL and the "id" parameter must be specified and be the same. OSC URL parameters other than "id" are dropped with a warning.
The underlying assumption is that each element in the vector is
unaffected by SGR or OSC URLs in any other element or elsewhere. This may
lead to surprising outcomes if these assumptions are untrue (see examples).
You may adjust this assumption with the carry
parameter.
Normalization was implemented primarily for better compatibility with
crayon
which emits SGR codes individually and assumes that each
opening code is paired up with its specific closing code, but it can also be
used to reduce the probability that strings processed with future versions of
fansi
will produce different results than the current version.
x
, with all SGRs normalized.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
normalize_state("hello\033[42;33m world") normalize_state("hello\033[42;33m world\033[m") normalize_state("\033[4mhello\033[42;33m world\033[m") ## Superflous codes removed normalize_state("\033[31;32mhello\033[m") # only last color prevails normalize_state("\033[31\033[32mhello\033[m") # only last color prevails normalize_state("\033[31mhe\033[49mllo\033[m") # unused closing ## Equivalent normalized sequences compare identical identical( normalize_state("\033[31;32mhello\033[m"), normalize_state("\033[31mhe\033[49mllo\033[m") ) ## External SGR will defeat normalization, unless we `carry` it red <- "\033[41m" writeLines( c( paste(red, "he\033[0mllo", "\033[0m"), paste(red, normalize_state("he\033[0mllo"), "\033[0m"), paste(red, normalize_state("he\033[0mllo", carry=red), "\033[0m") ) )
normalize_state("hello\033[42;33m world") normalize_state("hello\033[42;33m world\033[m") normalize_state("\033[4mhello\033[42;33m world\033[m") ## Superflous codes removed normalize_state("\033[31;32mhello\033[m") # only last color prevails normalize_state("\033[31\033[32mhello\033[m") # only last color prevails normalize_state("\033[31mhe\033[49mllo\033[m") # unused closing ## Equivalent normalized sequences compare identical identical( normalize_state("\033[31;32mhello\033[m"), normalize_state("\033[31mhe\033[49mllo\033[m") ) ## External SGR will defeat normalization, unless we `carry` it red <- "\033[41m" writeLines( c( paste(red, "he\033[0mllo", "\033[0m"), paste(red, normalize_state("he\033[0mllo"), "\033[0m"), paste(red, normalize_state("he\033[0mllo", carry=red), "\033[0m") ) )
This is a convenience function designed for use within an rmarkdown
document. It overrides the knitr
output hooks by using
knitr::knit_hooks$set
. It replaces the hooks with ones that convert
Control Sequences into HTML. In addition to replacing the hook functions,
this will output a <STYLE> HTML block to stdout. These two actions are
side effects as a result of which R chunks in the rmarkdown
document that
contain CSI SGR are shown in their HTML equivalent form.
set_knit_hooks( hooks, which = "output", proc.fun = function(x, class) html_code_block(to_html(html_esc(x)), class = class), class = sprintf("fansi fansi-%s", which), style = getOption("fansi.css", dflt_css()), split.nl = FALSE, .test = FALSE )
set_knit_hooks( hooks, which = "output", proc.fun = function(x, class) html_code_block(to_html(html_esc(x)), class = class), class = sprintf("fansi fansi-%s", which), style = getOption("fansi.css", dflt_css()), split.nl = FALSE, .test = FALSE )
hooks |
list, this should the be |
which |
character vector with the names of the hooks that should be replaced, defaults to 'output', but can also contain values 'message', 'warning', and 'error'. |
proc.fun |
function that will be applied to output that contains
CSI SGR sequences. Should accept parameters |
class |
character the CSS class to give the output chunks. Each type of
output chunk specified in |
style |
character a vector of CSS styles; these will be output inside
HTML >STYLE< tags as a side effect. The default value is designed to
ensure that there is no visible gap in background color with lines with
height 1.5 (as is the default setting in |
split.nl |
TRUE or FALSE (default), set to TRUE to split input strings
by any newlines they may contain to avoid any newlines inside SPAN tags
created by |
.test |
TRUE or FALSE, for internal testing use only. |
The replacement hook function tests for the presence of CSI SGR
sequences in chunk output with has_ctl
, and if it is detected then
processes it with the user provided proc.fun
. Chunks that do not contain
CSI SGR are passed off to the previously set hook function. The default
proc.fun
will run the output through html_esc
, to_html
, and
finally html_code_block
.
If you require more control than this function provides you can set the
knitr
hooks manually with knitr::knit_hooks$set
. If you are seeing your
output gaining extra line breaks, look at the split.nl
option.
named list with the prior output hooks for each of which
.
Since we do not formally import the knitr
functions we do not
guarantee that this function will always work properly with knitr
/
rmarkdown
.
has_ctl
, to_html
, html_esc
, html_code_block
,
knitr
output hooks,
embedding CSS in Rmd,
and the vignette vignette(package='fansi', 'sgr-in-rmd')
.
## Not run: ## The following should be done within an `rmarkdown` document chunk with ## chunk option `results` set to 'asis' and the chunk option `comment` set ## to ''. ```{r comment="", results='asis', echo=FALSE} ## Change the "output" hook to handle ANSI CSI SGR old.hooks <- set_knit_hooks(knitr::knit_hooks) ## Do the same with the warning, error, and message, and add styles for ## them (alternatively we could have done output as part of this call too) styles <- c( getOption('fansi.style', dflt_css()), # default style "PRE.fansi CODE {background-color: transparent;}", "PRE.fansi-error {background-color: #DD5555;}", "PRE.fansi-warning {background-color: #DDDD55;}", "PRE.fansi-message {background-color: #EEEEEE;}" ) old.hooks <- c( old.hooks, fansi::set_knit_hooks( knitr::knit_hooks, which=c('warning', 'error', 'message'), style=styles ) ) ``` ## You may restore old hooks with the following chunk ## Restore Hooks ```{r} do.call(knitr::knit_hooks$set, old.hooks) ``` ## End(Not run)
## Not run: ## The following should be done within an `rmarkdown` document chunk with ## chunk option `results` set to 'asis' and the chunk option `comment` set ## to ''. ```{r comment="", results='asis', echo=FALSE} ## Change the "output" hook to handle ANSI CSI SGR old.hooks <- set_knit_hooks(knitr::knit_hooks) ## Do the same with the warning, error, and message, and add styles for ## them (alternatively we could have done output as part of this call too) styles <- c( getOption('fansi.style', dflt_css()), # default style "PRE.fansi CODE {background-color: transparent;}", "PRE.fansi-error {background-color: #DD5555;}", "PRE.fansi-warning {background-color: #DDDD55;}", "PRE.fansi-message {background-color: #EEEEEE;}" ) old.hooks <- c( old.hooks, fansi::set_knit_hooks( knitr::knit_hooks, which=c('warning', 'error', 'message'), style=styles ) ) ``` ## You may restore old hooks with the following chunk ## Restore Hooks ```{r} do.call(knitr::knit_hooks$set, old.hooks) ``` ## End(Not run)
Generates text with each 8 bit SGR code (e.g. the "###" in "38;5;###") with the background colored by itself, and the foreground in a contrasting color and interesting color (we sacrifice some contrast for interest as this is intended for demo rather than reference purposes).
sgr_256()
sgr_256()
character vector with SGR codes with background color set as themselves.
writeLines(sgr_256())
writeLines(sgr_256())
state_at_end
reads through strings computing the accumulated SGR and
OSC hyperlinks, and outputs the active state at the end of them.
close_state
produces the sequence that closes any SGR active and OSC
hyperlinks at the end of each input string. If normalize = FALSE
(default), it will emit the reset code "ESC[0m" if any SGR is present.
It is more interesting for closing SGRs if normalize = TRUE
. Unlike
state_at_end
and other functions close_state
has no concept of carry
:
it will only emit closing sequences for states explicitly active at the end
of a string.
state_at_end( x, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE) ) close_state( x, warn = getOption("fansi.warn", TRUE), normalize = getOption("fansi.normalize", FALSE) )
state_at_end( x, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE) ) close_state( x, warn = getOption("fansi.warn", TRUE), normalize = getOption("fansi.normalize", FALSE) )
x |
a character vector or object that can be coerced to such. |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
normalize |
TRUE or FALSE (default) whether SGR sequence should be
normalized out such that there is one distinct sequence for each SGR code.
normalized strings will occupy more space (e.g. "\033[31;42m" becomes
"\033[31m\033[42m"), but will work better with code that assumes each SGR
code will be in its own escape as |
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
character vector same length as x
.
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
x <- c("\033[44mhello", "\033[33mworld") state_at_end(x) state_at_end(x, carry=TRUE) (close <- close_state(state_at_end(x, carry=TRUE), normalize=TRUE)) writeLines(paste0(x, close, " no style"))
x <- c("\033[44mhello", "\033[33mworld") state_at_end(x) state_at_end(x, carry=TRUE) (close <- close_state(state_at_end(x, carry=TRUE), normalize=TRUE)) writeLines(paste0(x, close, " no style"))
Removes Control Sequences from strings. By default it will
strip all known Control Sequences, including CSI/OSC sequences, two
character sequences starting with ESC, and all C0 control characters,
including newlines. You can fine tune this behavior with the ctl
parameter.
strip_ctl(x, ctl = "all", warn = getOption("fansi.warn", TRUE), strip)
strip_ctl(x, ctl = "all", warn = getOption("fansi.warn", TRUE), strip)
x |
a character vector or object that can be coerced to such. |
ctl |
character, any combination of the following values (see details):
|
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
strip |
character, deprecated in favor of |
The ctl
value contains the names of non-overlapping subsets of the
known Control Sequences (e.g. "csi" does not contain "sgr", and "c0" does
not contain newlines). The one exception is "all" which means strip every
known sequence. If you combine "all" with any other options then everything
but those options will be stripped.
character vector of same length as x with ANSI escape sequences stripped
Non-ASCII strings are converted to and returned in UTF-8 encoding.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
string <- "hello\033k\033[45p world\n\033[31mgoodbye\a moon" strip_ctl(string) strip_ctl(string, c("nl", "c0", "sgr", "csi", "esc")) # equivalently strip_ctl(string, "sgr") strip_ctl(string, c("c0", "esc")) ## everything but C0 controls, we need to specify "nl" ## in addition to "c0" since "nl" is not part of "c0" ## as far as the `strip` argument is concerned strip_ctl(string, c("all", "nl", "c0"))
string <- "hello\033k\033[45p world\n\033[31mgoodbye\a moon" strip_ctl(string) strip_ctl(string, c("nl", "c0", "sgr", "csi", "esc")) # equivalently strip_ctl(string, "sgr") strip_ctl(string, c("c0", "esc")) ## everything but C0 controls, we need to specify "nl" ## in addition to "c0" since "nl" is not part of "c0" ## as far as the `strip` argument is concerned strip_ctl(string, c("all", "nl", "c0"))
A drop-in replacement for base::strsplit
.
strsplit_ctl( x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) )
strsplit_ctl( x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) )
x |
a character vector, or, unlike |
split |
character vector (or object which can be coerced to such)
containing regular expression(s) (unless |
fixed |
logical. If |
perl |
logical. Should Perl-compatible regexps be used? |
useBytes |
logical. If |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
normalize |
TRUE or FALSE (default) whether SGR sequence should be
normalized out such that there is one distinct sequence for each SGR code.
normalized strings will occupy more space (e.g. "\033[31;42m" becomes
"\033[31m\033[42m"), but will work better with code that assumes each SGR
code will be in its own escape as |
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
terminate |
TRUE (default) or FALSE whether substrings should have
active state closed to avoid it bleeding into other strings they may be
prepended onto. This does not stop state from carrying if |
This function works by computing the position of the split points after
removing Control Sequences, and uses those positions in conjunction with
substr_ctl
to extract the pieces. This concept is borrowed from
crayon::col_strsplit
. An important implication of this is that you cannot
split by Control Sequences that are being treated as Control Sequences.
You can however limit which control sequences are treated specially via the
ctl
parameters (see examples).
Like base::strsplit
, with Control Sequences excluded.
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
fansi
is unaware of text directionality and operates as if all strings are
left to right (LTR). Using fansi
function with strings that contain mixed
direction scripts (i.e. both LTR and RTL) may produce undesirable results.
The split positions are computed after both x
and split
are
converted to UTF-8.
Non-ASCII strings are converted to and returned in UTF-8 encoding. Width calculations will not work properly in R < 3.2.2.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
normalize_state
for more details on what the normalize
parameter does,
state_at_end
to compute active state at the end of strings,
close_state
to compute the sequence required to close active state.
strsplit_ctl("\033[31mhello\033[42m world!", " ") ## Splitting by newlines does not work as they are _Control ## Sequences_, but we can use `ctl` to treat them as ordinary strsplit_ctl("\033[31mhello\033[42m\nworld!", "\n") strsplit_ctl("\033[31mhello\033[42m\nworld!", "\n", ctl=c("all", "nl"))
strsplit_ctl("\033[31mhello\033[42m world!", " ") ## Splitting by newlines does not work as they are _Control ## Sequences_, but we can use `ctl` to treat them as ordinary strsplit_ctl("\033[31mhello\033[42m\nworld!", "\n") strsplit_ctl("\033[31mhello\033[42m\nworld!", "\n", ctl=c("all", "nl"))
A drop in replacement for base::strtrim
, with the difference that all
C0 control characters such as newlines, carriage returns, etc., are always
treated as zero width, whereas in base it may vary with platform / R version.
strtrim_ctl( x, width, warn = getOption("fansi.warn", TRUE), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) strtrim2_ctl( x, width, warn = getOption("fansi.warn", TRUE), tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) )
strtrim_ctl( x, width, warn = getOption("fansi.warn", TRUE), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) strtrim2_ctl( x, width, warn = getOption("fansi.warn", TRUE), tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) )
x |
a character vector, or an object which can be coerced to a
character vector by |
width |
Positive integer values: recycled to the length of |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
normalize |
TRUE or FALSE (default) whether SGR sequence should be
normalized out such that there is one distinct sequence for each SGR code.
normalized strings will occupy more space (e.g. "\033[31;42m" becomes
"\033[31m\033[42m"), but will work better with code that assumes each SGR
code will be in its own escape as |
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
terminate |
TRUE (default) or FALSE whether substrings should have
active state closed to avoid it bleeding into other strings they may be
prepended onto. This does not stop state from carrying if |
tabs.as.spaces |
FALSE (default) or TRUE, whether to convert tabs to
spaces. This can only be set to TRUE if |
tab.stops |
integer(1:n) indicating position of tab stops to use when converting tabs to spaces. If there are more tabs in a line than defined tab stops the last tab stop is re-used. For the purposes of applying tab stops, each input line is considered a line and the character count begins from the beginning of the input line. |
strtrim2_ctl
adds the option of converting tabs to spaces before trimming.
This is the only difference between strtrim_ctl
and strtrim2_ctl
.
Like base::strtrim
, except that Control Sequences are treated
as zero width.
Non-ASCII strings are converted to and returned in UTF-8 encoding. Width calculations will not work properly in R < 3.2.2.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
normalize_state
for more details on what the normalize
parameter does,
state_at_end
to compute active state at the end of strings,
close_state
to compute the sequence required to close active state.
strtrim_ctl("\033[42mHello world\033[m", 6)
strtrim_ctl("\033[42mHello world\033[m", 6)
Wraps strings to a specified width accounting for Control Sequences.
strwrap_ctl
is intended to emulate strwrap
closely except with respect to
the Control Sequences (see details for other minor differences), while
strwrap2_ctl
adds features and changes the processing of whitespace.
strwrap_ctl
is faster than strwrap
.
strwrap_ctl( x, width = 0.9 * getOption("width"), indent = 0, exdent = 0, prefix = "", simplify = TRUE, initial = prefix, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) strwrap2_ctl( x, width = 0.9 * getOption("width"), indent = 0, exdent = 0, prefix = "", simplify = TRUE, initial = prefix, wrap.always = FALSE, pad.end = "", strip.spaces = !tabs.as.spaces, tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) )
strwrap_ctl( x, width = 0.9 * getOption("width"), indent = 0, exdent = 0, prefix = "", simplify = TRUE, initial = prefix, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) strwrap2_ctl( x, width = 0.9 * getOption("width"), indent = 0, exdent = 0, prefix = "", simplify = TRUE, initial = prefix, wrap.always = FALSE, pad.end = "", strip.spaces = !tabs.as.spaces, tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) )
x |
a character vector, or an object which can be converted to a
character vector by |
width |
a positive integer giving the target column for wrapping lines in the output. |
indent |
a non-negative integer giving the indentation of the first line in a paragraph. |
exdent |
a non-negative integer specifying the indentation of subsequent lines in paragraphs. |
prefix , initial
|
a character string to be used as prefix for
each line except the first, for which |
simplify |
a logical. If |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
normalize |
TRUE or FALSE (default) whether SGR sequence should be
normalized out such that there is one distinct sequence for each SGR code.
normalized strings will occupy more space (e.g. "\033[31;42m" becomes
"\033[31m\033[42m"), but will work better with code that assumes each SGR
code will be in its own escape as |
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
terminate |
TRUE (default) or FALSE whether substrings should have
active state closed to avoid it bleeding into other strings they may be
prepended onto. This does not stop state from carrying if |
wrap.always |
TRUE or FALSE (default), whether to hard wrap at requested
width if no word breaks are detected within a line. If set to TRUE then
|
pad.end |
character(1L), a single character to use as padding at the
end of each line until the line is |
strip.spaces |
TRUE (default) or FALSE, if TRUE, extraneous white spaces (spaces, newlines, tabs) are removed in the same way as base::strwrap does. When FALSE, whitespaces are preserved, except for newlines as those are implicit boundaries between output vector elements. |
tabs.as.spaces |
FALSE (default) or TRUE, whether to convert tabs to
spaces. This can only be set to TRUE if |
tab.stops |
integer(1:n) indicating position of tab stops to use when converting tabs to spaces. If there are more tabs in a line than defined tab stops the last tab stop is re-used. For the purposes of applying tab stops, each input line is considered a line and the character count begins from the beginning of the input line. |
strwrap2_ctl
can convert tabs to spaces, pad strings up to width
, and
hard-break words if single words are wider than width
.
Unlike base::strwrap, both these functions will translate any non-ASCII
strings to UTF-8 and return them in UTF-8. Additionally, invalid UTF-8
always causes errors, and prefix
and indent
must be scalar.
When replacing tabs with spaces the tabs are computed relative to the
beginning of the input line, not the most recent wrap point.
Additionally,indent
, exdent
, initial
, and prefix
will be ignored when
computing tab positions.
A character vector, or list of character vectors if simplify
is
false.
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
fansi
approximates grapheme widths and counts by using heuristics for
grapheme breaks that work for most common graphemes, including emoji
combining sequences. The heuristic is known to work incorrectly with
invalid combining sequences, prepending marks, and sequence interruptors.
fansi
does not provide a full implementation of grapheme break detection to
avoid carrying a copy of the Unicode grapheme breaks table, and also because
the hope is that R will add the feature eventually itself.
The utf8
package provides a
conforming grapheme parsing implementation.
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
fansi
is unaware of text directionality and operates as if all strings are
left to right (LTR). Using fansi
function with strings that contain mixed
direction scripts (i.e. both LTR and RTL) may produce undesirable results.
Non-ASCII strings are converted to and returned in UTF-8 encoding. Width calculations will not work properly in R < 3.2.2.
For the strwrap*
functions the carry
parameter affects whether
styles are carried across input vector elements. Styles always carry
within a single wrapped vector element (e.g. if one of the input elements
gets wrapped into three lines, the styles will carry through those three
lines even if carry=FALSE
, but not across input vector elements).
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
normalize_state
for more details on what the normalize
parameter does,
state_at_end
to compute active state at the end of strings,
close_state
to compute the sequence required to close active state.
hello.1 <- "hello \033[41mred\033[49m world" hello.2 <- "hello\t\033[41mred\033[49m\tworld" strwrap_ctl(hello.1, 12) strwrap_ctl(hello.2, 12) ## In default mode strwrap2_ctl is the same as strwrap_ctl strwrap2_ctl(hello.2, 12) ## But you can leave whitespace unchanged, `warn` ## set to false as otherwise tabs causes warning strwrap2_ctl(hello.2, 12, strip.spaces=FALSE, warn=FALSE) ## And convert tabs to spaces strwrap2_ctl(hello.2, 12, tabs.as.spaces=TRUE) ## If your display has 8 wide tab stops the following two ## outputs should look the same writeLines(strwrap2_ctl(hello.2, 80, tabs.as.spaces=TRUE)) writeLines(hello.2) ## tab stops are NOT auto-detected, but you may provide ## your own strwrap2_ctl(hello.2, 12, tabs.as.spaces=TRUE, tab.stops=c(6, 12)) ## You can also force padding at the end to equal width writeLines(strwrap2_ctl("hello how are you today", 10, pad.end=".")) ## And a more involved example where we read the ## NEWS file, color it line by line, wrap it to ## 25 width and display some of it in 3 columns ## (works best on displays that support 256 color ## SGR sequences) NEWS <- readLines(file.path(R.home('doc'), 'NEWS')) NEWS.C <- fansi_lines(NEWS, step=2) # color each line W <- strwrap2_ctl(NEWS.C, 25, pad.end=" ", wrap.always=TRUE) writeLines(c("", paste(W[1:20], W[100:120], W[200:220]), ""))
hello.1 <- "hello \033[41mred\033[49m world" hello.2 <- "hello\t\033[41mred\033[49m\tworld" strwrap_ctl(hello.1, 12) strwrap_ctl(hello.2, 12) ## In default mode strwrap2_ctl is the same as strwrap_ctl strwrap2_ctl(hello.2, 12) ## But you can leave whitespace unchanged, `warn` ## set to false as otherwise tabs causes warning strwrap2_ctl(hello.2, 12, strip.spaces=FALSE, warn=FALSE) ## And convert tabs to spaces strwrap2_ctl(hello.2, 12, tabs.as.spaces=TRUE) ## If your display has 8 wide tab stops the following two ## outputs should look the same writeLines(strwrap2_ctl(hello.2, 80, tabs.as.spaces=TRUE)) writeLines(hello.2) ## tab stops are NOT auto-detected, but you may provide ## your own strwrap2_ctl(hello.2, 12, tabs.as.spaces=TRUE, tab.stops=c(6, 12)) ## You can also force padding at the end to equal width writeLines(strwrap2_ctl("hello how are you today", 10, pad.end=".")) ## And a more involved example where we read the ## NEWS file, color it line by line, wrap it to ## 25 width and display some of it in 3 columns ## (works best on displays that support 256 color ## SGR sequences) NEWS <- readLines(file.path(R.home('doc'), 'NEWS')) NEWS.C <- fansi_lines(NEWS, step=2) # color each line W <- strwrap2_ctl(NEWS.C, 25, pad.end=" ", wrap.always=TRUE) writeLines(c("", paste(W[1:20], W[100:120], W[200:220]), ""))
substr_ctl
is a drop-in replacement for substr
. Performance is
slightly slower than substr
, and more so for type = 'width'
. Special
Control Sequences will be included in the substrings to reflect their format
when as it was when part of the source string. substr2_ctl
adds the
ability to extract substrings based on grapheme count or display width in
addition to the normal character width, as well as several other options.
substr_ctl( x, start, stop, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) substr2_ctl( x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) substr_ctl( x, start, stop, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) <- value substr2_ctl( x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) <- value
substr_ctl( x, start, stop, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) substr2_ctl( x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) substr_ctl( x, start, stop, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) <- value substr2_ctl( x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces", FALSE), tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE), carry = getOption("fansi.carry", FALSE), terminate = getOption("fansi.terminate", TRUE) ) <- value
x |
a character vector or object that can be coerced to such. |
start |
integer. The first element to be extracted or replaced. |
stop |
integer. The first element to be extracted or replaced. |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
normalize |
TRUE or FALSE (default) whether SGR sequence should be
normalized out such that there is one distinct sequence for each SGR code.
normalized strings will occupy more space (e.g. "\033[31;42m" becomes
"\033[31m\033[42m"), but will work better with code that assumes each SGR
code will be in its own escape as |
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
terminate |
TRUE (default) or FALSE whether substrings should have
active state closed to avoid it bleeding into other strings they may be
prepended onto. This does not stop state from carrying if |
type |
character(1L) partial matching
|
round |
character(1L) partial matching
|
tabs.as.spaces |
FALSE (default) or TRUE, whether to convert tabs to
spaces (and supress tab related warnings). This can only be set to TRUE if
|
tab.stops |
integer(1:n) indicating position of tab stops to use when converting tabs to spaces. If there are more tabs in a line than defined tab stops the last tab stop is re-used. For the purposes of applying tab stops, each input line is considered a line and the character count begins from the beginning of the input line. |
value |
a character vector or object that can be coerced to such. |
A character vector of the same length and with the same attributes as x (after possible coercion and re-encoding to UTF-8).
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
When computing substrings, Normal (non-control) characters are considered to occupy positions in strings, whereas Control Sequences occupy the interstices between them. The string:
"hello-\033[31mworld\033[m!"
is interpreted as:
1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 h e l l o -|w o r l d|! ^ ^ \033[31m \033[m
start
and stop
reference character positions so they never explicitly
select for the interstitial Control Sequences. The latter are implicitly
selected if they appear in interstices after the first character and before
the last. Additionally, because Special Sequences (CSI SGR and OSC
hyperlinks) affect all subsequent characters in a string, any active Special
Sequence, whether opened just before a character or much before, will be
reflected in the state fansi
prepends to the beginning of each substring.
It is possible to select Control Sequences at the end of a string by
specifying stop
values past the end of the string, although for Special
Sequences this only produces visible results if terminate
is set to
FALSE
. Similarly, it is possible to select Control Sequences preceding
the beginning of a string by specifying start
values less than one,
although as noted earlier this is unnecessary for Special Sequences as
those are output by fansi
before each substring.
Because exact substrings on anything other than character count cannot be
guaranteed (e.g. as a result of multi-byte encodings, or double display-width
characters) substr2_ctl
must make assumptions on how to resolve provided
start
/stop
values that are infeasible and does so via the round
parameter.
If we use "start" as the round
value, then any time the start
value corresponds to the middle of a multi-byte or a wide character, then
that character is included in the substring, while any similar partially
included character via the stop
is left out. The converse is true if we
use "stop" as the round
value. "neither" would cause all partial
characters to be dropped irrespective whether they correspond to start
or
stop
, and "both" could cause all of them to be included. See examples.
A number of Normal characters such as combining diacritic marks have
reported width of zero. These are typically displayed overlaid on top of the
preceding glyph, as in the case of "e\u301"
forming "e" with an acute
accent. Unlike Control Sequences, which also have reported width of zero,
fansi
groups zero-width Normal characters with the last preceding
non-zero width Normal character. This is incorrect for some rare
zero-width Normal characters such as prepending marks (see "Output
Stability" and "Graphemes").
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
Semantics for replacement functions have the additional requirement that the
result appear as if it is the input modified in place between the positions
designated by start
and stop
. terminate
only affects the boundaries
between the original substring and the spliced one, normalize
only affects
the same boundaries, and tabs.as.spaces
only affects value
, and x
must
be ASCII only or marked "UTF-8".
terminate = FALSE
only makes sense in replacement mode if only one of x
or value
contains Control Sequences. fansi
will not account for any
interactions of state in x
and value
.
The carry
parameter causes state to carry within the original string and
the replacement values independently, as if they were columns of text cut
from different pages and pasted together. String values for carry
are
disallowed in replacement mode as it is ambiguous which of x
or value
they would modify (see examples).
When in type = 'width'
mode, it is only guaranteed that the result will be
no wider than the original x
. Narrower strings may result if a mixture
of narrow and wide graphemes cannot be replaced exactly with the same width
value, possibly because the provided start
and stop
values (or the
implicit ones generated for value
) do not align with grapheme boundaries.
fansi
approximates grapheme widths and counts by using heuristics for
grapheme breaks that work for most common graphemes, including emoji
combining sequences. The heuristic is known to work incorrectly with
invalid combining sequences, prepending marks, and sequence interruptors.
fansi
does not provide a full implementation of grapheme break detection to
avoid carrying a copy of the Unicode grapheme breaks table, and also because
the hope is that R will add the feature eventually itself.
The utf8
package provides a
conforming grapheme parsing implementation.
fansi
is unaware of text directionality and operates as if all strings are
left to right (LTR). Using fansi
function with strings that contain mixed
direction scripts (i.e. both LTR and RTL) may produce undesirable results.
Non-ASCII strings are converted to and returned in UTF-8 encoding. Width calculations will not work properly in R < 3.2.2.
If stop
< start
, the return value is always an empty string.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
normalize_state
for more details on what the normalize
parameter does,
state_at_end
to compute active state at the end of strings,
close_state
to compute the sequence required to close active state.
substr_ctl("\033[42mhello\033[m world", 1, 9) substr_ctl("\033[42mhello\033[m world", 3, 9) ## Positions 2 and 4 are in the middle of the full width W (\uFF37) for ## the `start` and `stop` positions respectively. Use `round` ## to control result: x <- "\uFF37n\uFF37" x substr2_ctl(x, 2, 4, type='width', round='start') substr2_ctl(x, 2, 4, type='width', round='stop') substr2_ctl(x, 2, 4, type='width', round='neither') substr2_ctl(x, 2, 4, type='width', round='both') ## We can specify which escapes are considered special: substr_ctl("\033[31mhello\tworld", 1, 6, ctl='sgr', warn=FALSE) substr_ctl("\033[31mhello\tworld", 1, 6, ctl=c('all', 'c0'), warn=FALSE) ## `carry` allows SGR to carry from one element to the next substr_ctl(c("\033[33mhello", "world"), 1, 3) substr_ctl(c("\033[33mhello", "world"), 1, 3, carry=TRUE) substr_ctl(c("\033[33mhello", "world"), 1, 3, carry="\033[44m") ## We can omit the termination bleed <- substr_ctl(c("\033[41mhello", "world"), 1, 3, terminate=FALSE) writeLines(bleed) # Style will bleed out of string end <- "\033[0m\n" writeLines(end) # Stanch bleeding ## Trailing sequences omitted unless `stop` past end. substr_ctl("ABC\033[42m", 1, 3, terminate=FALSE) substr_ctl("ABC\033[42m", 1, 4, terminate=FALSE) ## Replacement functions x0<- x1 <- x2 <- x3 <- c("\033[42mABC", "\033[34mDEF") substr_ctl(x1, 2, 2) <- "_" substr_ctl(x2, 2, 2) <- "\033[m_" substr_ctl(x3, 2, 2) <- "\033[45m_" writeLines(c(x0, end, x1, end, x2, end, x3, end)) ## With `carry = TRUE` strings look like original x0<- x1 <- x2 <- x3 <- c("\033[42mABC", "\033[34mDEF") substr_ctl(x0, 2, 2, carry=TRUE) <- "_" substr_ctl(x1, 2, 2, carry=TRUE) <- "\033[m_" substr_ctl(x2, 2, 2, carry=TRUE) <- "\033[45m_" writeLines(c(x0, end, x1, end, x2, end, x3, end)) ## Work-around to specify carry strings in replacement mode x <- c("ABC", "DEF") val <- "#" x2 <- c("\033[42m", x) val2 <- c("\033[45m", rep_len(val, length(x))) substr_ctl(x2, 2, 2, carry=TRUE) <- val2 (x <- x[-1])
substr_ctl("\033[42mhello\033[m world", 1, 9) substr_ctl("\033[42mhello\033[m world", 3, 9) ## Positions 2 and 4 are in the middle of the full width W (\uFF37) for ## the `start` and `stop` positions respectively. Use `round` ## to control result: x <- "\uFF37n\uFF37" x substr2_ctl(x, 2, 4, type='width', round='start') substr2_ctl(x, 2, 4, type='width', round='stop') substr2_ctl(x, 2, 4, type='width', round='neither') substr2_ctl(x, 2, 4, type='width', round='both') ## We can specify which escapes are considered special: substr_ctl("\033[31mhello\tworld", 1, 6, ctl='sgr', warn=FALSE) substr_ctl("\033[31mhello\tworld", 1, 6, ctl=c('all', 'c0'), warn=FALSE) ## `carry` allows SGR to carry from one element to the next substr_ctl(c("\033[33mhello", "world"), 1, 3) substr_ctl(c("\033[33mhello", "world"), 1, 3, carry=TRUE) substr_ctl(c("\033[33mhello", "world"), 1, 3, carry="\033[44m") ## We can omit the termination bleed <- substr_ctl(c("\033[41mhello", "world"), 1, 3, terminate=FALSE) writeLines(bleed) # Style will bleed out of string end <- "\033[0m\n" writeLines(end) # Stanch bleeding ## Trailing sequences omitted unless `stop` past end. substr_ctl("ABC\033[42m", 1, 3, terminate=FALSE) substr_ctl("ABC\033[42m", 1, 4, terminate=FALSE) ## Replacement functions x0<- x1 <- x2 <- x3 <- c("\033[42mABC", "\033[34mDEF") substr_ctl(x1, 2, 2) <- "_" substr_ctl(x2, 2, 2) <- "\033[m_" substr_ctl(x3, 2, 2) <- "\033[45m_" writeLines(c(x0, end, x1, end, x2, end, x3, end)) ## With `carry = TRUE` strings look like original x0<- x1 <- x2 <- x3 <- c("\033[42mABC", "\033[34mDEF") substr_ctl(x0, 2, 2, carry=TRUE) <- "_" substr_ctl(x1, 2, 2, carry=TRUE) <- "\033[m_" substr_ctl(x2, 2, 2, carry=TRUE) <- "\033[45m_" writeLines(c(x0, end, x1, end, x2, end, x3, end)) ## Work-around to specify carry strings in replacement mode x <- c("ABC", "DEF") val <- "#" x2 <- c("\033[42m", x) val2 <- c("\033[45m", rep_len(val, length(x))) substr_ctl(x2, 2, 2, carry=TRUE) <- val2 (x <- x[-1])
Finds horizontal tab characters (0x09) in a string and replaces them with the spaces that produce the same horizontal offset.
tabs_as_spaces( x, tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), ctl = "all" )
tabs_as_spaces( x, tab.stops = getOption("fansi.tab.stops", 8L), warn = getOption("fansi.warn", TRUE), ctl = "all" )
x |
character vector or object coercible to character; any tabs therein will be replaced. |
tab.stops |
integer(1:n) indicating position of tab stops to use when converting tabs to spaces. If there are more tabs in a line than defined tab stops the last tab stop is re-used. For the purposes of applying tab stops, each input line is considered a line and the character count begins from the beginning of the input line. |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
Since we do not know of a reliable cross platform means of detecting tab stops you will need to provide them yourself if you are using anything outside of the standard tab stop every 8 characters that is the default.
character, x
with tabs replaced by spaces, with elements
possibly converted to UTF-8.
Non-ASCII strings are converted to and returned in UTF-8 encoding. The
ctl
parameter only affects which Control Sequences are considered zero
width. Tabs will always be converted to spaces, irrespective of the ctl
setting.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
string <- '1\t12\t123\t1234\t12345678' tabs_as_spaces(string) writeLines( c( '-------|-------|-------|-------|-------|', tabs_as_spaces(string) ) ) writeLines( c( '-|--|--|--|--|--|--|--|--|--|--|', tabs_as_spaces(string, tab.stops=c(2, 3)) ) ) writeLines( c( '-|--|-------|-------|-------|', tabs_as_spaces(string, tab.stops=c(2, 3, 8)) ) )
string <- '1\t12\t123\t1234\t12345678' tabs_as_spaces(string) writeLines( c( '-------|-------|-------|-------|-------|', tabs_as_spaces(string) ) ) writeLines( c( '-|--|--|--|--|--|--|--|--|--|--|', tabs_as_spaces(string, tab.stops=c(2, 3)) ) ) writeLines( c( '-|--|-------|-------|-------|', tabs_as_spaces(string, tab.stops=c(2, 3, 8)) ) )
Outputs ANSI CSI SGR formatted text to screen so that you may visually inspect what color capabilities your terminal supports.
term_cap_test()
term_cap_test()
The three tested terminal capabilities are:
"bright" for bright colors with SGR codes in 90-97 and 100-107
"256" for colors defined by "38;5;x" and "48;5;x" where x is in 0-255
"truecolor" for colors defined by "38;2;x;y;z" and "48;x;y;x" where x, y, and z are in 0-255
Each of the color capabilities your terminal supports should be displayed with a blue background and a red foreground. For reference the corresponding CSI SGR sequences are displayed as well.
You should compare the screen output from this function to
getOption('fansi.term.cap', dflt_term_cap)
to ensure that they are self
consistent.
By default fansi
assumes terminals support bright and 256 color
modes, and also tests for truecolor support via the $COLORTERM system
variable.
Functions with the term.cap
parameter like substr_ctl
will warn if they
encounter 256 or true color SGR sequences and term.cap
indicates they are
unsupported as such a terminal may misinterpret those sequences. Bright
codes and OSC hyperlinks in terminals that do not support them will likely be
silently ignored, so fansi
functions do not warn about those.
character the test vector, invisibly
term_cap_test()
term_cap_test()
Interprets CSI SGR sequences and OSC hyperlinks to produce strings with
the state reproduced with SPAN elements, inline CSS styles, and A anchors.
Optionally for colors, the SPAN elements may be assigned classes instead of
inline styles, in which case it is the user's responsibility to provide a
style sheet. Input that contains special HTML characters ("<", ">", "&",
"'", and "\"") likely should be escaped with html_esc
, and to_html
will
warn if it encounters the first two.
to_html( x, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), classes = FALSE, carry = getOption("fansi.carry", TRUE) )
to_html( x, warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), classes = FALSE, carry = getOption("fansi.carry", TRUE) )
x |
a character vector or object that can be coerced to such. |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
classes |
FALSE (default), TRUE, or character vector of either 16, 32, or 512 class names. Character strings may only contain ASCII characters corresponding to letters, numbers, the hyphen, or the underscore. It is the user's responsibility to provide values that are legal class names.
|
carry |
TRUE, FALSE (default), or a scalar string, controls whether to
interpret the character vector as a "single document" (TRUE or string) or
as independent elements (FALSE). In "single document" mode, active state
at the end of an input element is considered active at the beginning of the
next vector element, simulating what happens with a document with active
state at the end of a line. If FALSE each vector element is interpreted as
if there were no active state when it begins. If character, then the
active state at the end of the |
Only "observable" formats are translated. These include colors, background-colors, and basic styles (CSI SGR codes 1-6, 8, 9). Style 7, the "inverse" style, is implemented by explicitly switching foreground and background colors, if there are any. Styles 5-6 (blink) are rendered as "text-decoration" but likely will do nothing in the browser. Style 8 (conceal) sets the color to transparent.
Parameters in OSC sequences are not copied over as they might have different semantics in the OSC sequences than they would in HTML (e.g. the "id" parameter is intended to be non-unique in OSC).
Each element of the input vector is translated into a stand-alone valid HTML
string. In particular, any open tags generated by fansi
are closed at the
end of an element and re-opened on the subsequent element with the same
style. This allows safe combination of HTML translated strings, for example
by paste
ing them together. The trade-off is that there may be redundant
HTML produced. To reduce redundancy you can first collapse the input vector
into one string, being mindful that very large strings may exceed maximum
string size when converted to HTML.
fansi
-opened tags are closed and new ones open anytime the "observable"
state changes. to_html
never produces nested tags, even if at times
that might produce more compact output. While it would be possible to
match a CSI/OSC encoded state with nested tags, it would increase the
complexity of the code substantially for little gain.
A character vector of the same length as x
with all escape
sequences removed and any basic ANSI CSI SGR escape sequences applied via
SPAN HTML tags.
Non-ASCII strings are converted to and returned in UTF-8 encoding.
to_html
always terminates as not doing so produces
invalid HTML. If you wish for the last active SPAN to bleed into
subsequent text you may do so with e.g. sub("</span>(?:</a>)?$", "", x)
or similar. Additionally, unlike other functions, the default is
carry = TRUE
for compatibility with semantics of prior versions of
fansi
.
Other HTML functions:
html_esc()
,
in_html()
,
make_styles()
to_html("hello\033[31;42;1mworld\033[m") to_html("hello\033[31;42;1mworld\033[m", classes=TRUE) ## Input contains HTML special chars x <- "<hello \033[42m'there' \033[34m &\033[m \"moon\"!" writeLines(x) ## Not run: in_html( c( to_html(html_esc(x)), # Good to_html(x) # Bad (warning)! ) ) ## End(Not run) ## Generate some class names for basic colors classes <- expand.grid( "myclass", c("fg", "bg"), c("black", "red", "green", "yellow", "blue", "magenta", "cyan", "white") ) classes # order is important! classes <- do.call(paste, c(classes, sep="-")) ## We only provide 16 classes, so Only basic colors are ## mapped to classes; others styled inline. to_html( "\033[94mhello\033[m \033[31;42;1mworld\033[m", classes=classes ) ## Create a whole web page with a style sheet for 256 colors and ## the colors shown in a table. class.256 <- do.call(paste, c(expand.grid(c("fg", "bg"), 0:255), sep="-")) sgr.256 <- sgr_256() # A demo of all 256 colors writeLines(sgr.256[1:8]) # SGR formatting ## Convert to HTML using classes instead of inline styles: html.256 <- to_html(sgr.256, classes=class.256) writeLines(html.256[1]) # No inline colors ## Generate different style sheets. See `?make_styles` for details. default <- make_styles(class.256) mix <- matrix(c(.6,.2,.2, .2,.6,.2, .2,.2,.6), 3) desaturated <- make_styles(class.256, mix) writeLines(default[1:4]) writeLines(desaturated[1:4]) ## Embed in HTML page and diplay; only CSS changing ## Not run: in_html(html.256) # no CSS in_html(html.256, css=default) # default CSS in_html(html.256, css=desaturated) # desaturated CSS ## End(Not run)
to_html("hello\033[31;42;1mworld\033[m") to_html("hello\033[31;42;1mworld\033[m", classes=TRUE) ## Input contains HTML special chars x <- "<hello \033[42m'there' \033[34m &\033[m \"moon\"!" writeLines(x) ## Not run: in_html( c( to_html(html_esc(x)), # Good to_html(x) # Bad (warning)! ) ) ## End(Not run) ## Generate some class names for basic colors classes <- expand.grid( "myclass", c("fg", "bg"), c("black", "red", "green", "yellow", "blue", "magenta", "cyan", "white") ) classes # order is important! classes <- do.call(paste, c(classes, sep="-")) ## We only provide 16 classes, so Only basic colors are ## mapped to classes; others styled inline. to_html( "\033[94mhello\033[m \033[31;42;1mworld\033[m", classes=classes ) ## Create a whole web page with a style sheet for 256 colors and ## the colors shown in a table. class.256 <- do.call(paste, c(expand.grid(c("fg", "bg"), 0:255), sep="-")) sgr.256 <- sgr_256() # A demo of all 256 colors writeLines(sgr.256[1:8]) # SGR formatting ## Convert to HTML using classes instead of inline styles: html.256 <- to_html(sgr.256, classes=class.256) writeLines(html.256[1]) # No inline colors ## Generate different style sheets. See `?make_styles` for details. default <- make_styles(class.256) mix <- matrix(c(.6,.2,.2, .2,.6,.2, .2,.2,.6), 3) desaturated <- make_styles(class.256, mix) writeLines(default[1:4]) writeLines(desaturated[1:4]) ## Embed in HTML page and diplay; only CSS changing ## Not run: in_html(html.256) # no CSS in_html(html.256, css=default) # default CSS in_html(html.256, css=desaturated) # desaturated CSS ## End(Not run)
Removes any whitespace before the first and/or after the last non-Control
Sequence character. Unlike with the base::trimws
, only the default
whitespace
specification is supported.
trimws_ctl( x, which = c("both", "left", "right"), whitespace = "[ \t\r\n]", warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE) )
trimws_ctl( x, which = c("both", "left", "right"), whitespace = "[ \t\r\n]", warn = getOption("fansi.warn", TRUE), term.cap = getOption("fansi.term.cap", dflt_term_cap()), ctl = "all", normalize = getOption("fansi.normalize", FALSE) )
x |
a character vector |
which |
a character string specifying whether to remove both
leading and trailing whitespace (default), or only leading
( |
whitespace |
must be set to the default value, in the future it may become possible to change this parameter. |
warn |
TRUE (default) or FALSE, whether to warn when potentially
problematic Control Sequences are encountered. These could cause the
assumptions |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
ctl |
character, which Control Sequences should be treated
specially. Special treatment is context dependent, and may include
detecting them and/or computing their display/character width as zero. For
the SGR subset of the ANSI CSI sequences, and OSC hyperlinks,
|
normalize |
TRUE or FALSE (default) whether SGR sequence should be
normalized out such that there is one distinct sequence for each SGR code.
normalized strings will occupy more space (e.g. "\033[31;42m" becomes
"\033[31m\033[42m"), but will work better with code that assumes each SGR
code will be in its own escape as |
The input with white space removed as described.
Control Sequences are non-printing characters or sequences of characters.
Special Sequences are a subset of the Control Sequences, and include CSI
SGR sequences which can be used to change rendered appearance of text, and
OSC hyperlinks. See fansi
for details.
Several factors could affect the exact output produced by fansi
functions across versions of fansi
, R
, and/or across systems.
In general it is best not to rely on exact fansi
output, e.g. by
embedding it in tests.
Width and grapheme calculations depend on locale, Unicode database
version, and grapheme processing logic (which is still in development), among
other things. For the most part fansi
(currently) uses the internals of
base::nchar(type='width')
, but there are exceptions and this may change in
the future.
How a particular display format is encoded in Control Sequences is
not guaranteed to be stable across fansi
versions. Additionally, which
Special Sequences are re-encoded vs transcribed untouched may change.
In general we will strive to keep the rendered appearance stable.
To maximize the odds of getting stable output set normalize_state
to
TRUE
and type
to "chars"
in functions that allow it, and
set term.cap
to a specific set of capabilities.
trimws_ctl(" \033[31m\thello world\t\033[39m ")
trimws_ctl(" \033[31m\thello world\t\033[39m ")
Will return position and types of unhandled Control Sequences in a
character vector. Unhandled sequences may cause fansi
to interpret strings
in a way different to your display. See fansi for details. Functions that
interpret Special Sequences (CSI SGR or OSC hyperlinks) might omit bad
Special Sequences or some of their components in output substrings,
particularly if they are leading or trailing. Some functions are more
tolerant of bad inputs than others. For example nchar_ctl
will not
report unsupported colors because it only cares about counts or widths.
unhandled_ctl
will report all potentially problematic sequences.
unhandled_ctl(x, term.cap = getOption("fansi.term.cap", dflt_term_cap()))
unhandled_ctl(x, term.cap = getOption("fansi.term.cap", dflt_term_cap()))
x |
character vector |
term.cap |
character a vector of the capabilities of the terminal, can
be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes
starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with
"38;2" or "48;2"), and "all". "all" behaves as it does for the |
To work around tabs present in input, you can use tabs_as_spaces
or the
tabs.as.spaces
parameter on functions that have it, or the strip_ctl
function to remove the troublesome sequences. Alternatively, you can use
warn=FALSE
to suppress the warnings.
This is a debugging function that is not optimized for speed and the precise
output of which might change with fansi
versions.
The return value is a data frame with five columns:
index: integer the index in x
with the unhandled sequence
start: integer the start position of the sequence (in characters)
stop: integer the end of the sequence (in characters), but note that if there are multiple ESC sequences abutting each other they will all be treated as one, even if some of those sequences are valid.
error: the reason why the sequence was not handled:
unknown-substring: SGR substring with a value that does not correspond to a known SGR code or OSC hyperlink with unsupported parameters.
invalid-substr: SGR contains uncommon characters in ":<=>", intermediate bytes, other invalid characters, or there is an invalid subsequence (e.g. "ESC[38;2m" which should specify an RGB triplet but does not). OSCs contain invalid bytes, or OSC hyperlinks contain otherwise valid OSC bytes in 0x08-0x0d.
exceed-term-cap: contains color codes not supported by the terminal (see term_cap_test). Bright colors with color codes in the 90-97 and 100-107 range in terminals that do not support them are not considered errors, whereas 256 or truecolor codes in terminals that do not support them are. This is because the latter are often misinterpreted by terminals that do not support them, whereas the former are typically silently ignored.
CSI/OSC: a non-SGR CSI sequence, or non-hyperlink OSC sequence.
CSI/OSC-bad-substr: a CSI or OSC sequence containing invalid characters.
malformed-CSI/OSC: a malformed CSI or OSC sequence, typically one that never encounters its closing sequence before the end of a string.
non-CSI/OSC: a non-CSI or non-OSC escape sequence, i.e. one where the ESC is followed by something other than "[" or "]". Since we assume all non-CSI sequences are only 2 characters long include the ESC, this type of sequence is the most likely to cause problems as some are not actually two characters long.
malformed-ESC: a malformed two byte ESC sequence (i.e. one not ending in 0x40-0x7e).
C0: a "C0" control character (e.g. tab, bell, etc.).
malformed-UTF8: illegal UTF8 encoding.
non-ASCII: non-ASCII bytes in escape sequences.
translated: whether the string was translated to UTF-8, might be helpful in
odd cases were character offsets change depending on encoding. You should
only worry about this if you cannot tie out the start
/stop
values to
the escape sequence shown.
esc: character the unhandled escape sequence
Data frame with as many rows as there are unhandled escape sequences and columns containing useful information for debugging the problem. See details.
Non-ASCII strings are converted to UTF-8 encoding.
?fansi
for details on how Control Sequences are
interpreted, particularly if you are getting unexpected results,
unhandled_ctl
for detecting bad control sequences.
string <- c( "\033[41mhello world\033[m", "foo\033[22>m", "\033[999mbar", "baz \033[31#3m", "a\033[31k", "hello\033m world" ) unhandled_ctl(string)
string <- c( "\033[41mhello world\033[m", "foo\033[22>m", "\033[999mbar", "baz \033[31#3m", "a\033[31k", "hello\033m world" ) unhandled_ctl(string)