[79a0317] | 1 | # he [![Build status](https://travis-ci.org/mathiasbynens/he.svg?branch=master)](https://travis-ci.org/mathiasbynens/he) [![Code coverage status](https://codecov.io/github/mathiasbynens/he/coverage.svg?branch=master)](https://codecov.io/github/mathiasbynens/he?branch=master) [![Dependency status](https://gemnasium.com/mathiasbynens/he.svg)](https://gemnasium.com/mathiasbynens/he)
|
---|
| 2 |
|
---|
| 3 | _he_ (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports [all standardized named character references as per HTML](https://html.spec.whatwg.org/multipage/syntax.html#named-character-references), handles [ambiguous ampersands](https://mathiasbynens.be/notes/ambiguous-ampersands) and other edge cases [just like a browser would](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references), has an extensive test suite, and — contrary to many other JavaScript solutions — _he_ handles astral Unicode symbols just fine. [An online demo is available.](https://mothereff.in/html-entities)
|
---|
| 4 |
|
---|
| 5 | ## Installation
|
---|
| 6 |
|
---|
| 7 | Via [npm](https://www.npmjs.com/):
|
---|
| 8 |
|
---|
| 9 | ```bash
|
---|
| 10 | npm install he
|
---|
| 11 | ```
|
---|
| 12 |
|
---|
| 13 | Via [Bower](http://bower.io/):
|
---|
| 14 |
|
---|
| 15 | ```bash
|
---|
| 16 | bower install he
|
---|
| 17 | ```
|
---|
| 18 |
|
---|
| 19 | Via [Component](https://github.com/component/component):
|
---|
| 20 |
|
---|
| 21 | ```bash
|
---|
| 22 | component install mathiasbynens/he
|
---|
| 23 | ```
|
---|
| 24 |
|
---|
| 25 | In a browser:
|
---|
| 26 |
|
---|
| 27 | ```html
|
---|
| 28 | <script src="he.js"></script>
|
---|
| 29 | ```
|
---|
| 30 |
|
---|
| 31 | In [Node.js](https://nodejs.org/), [io.js](https://iojs.org/), [Narwhal](http://narwhaljs.org/), and [RingoJS](http://ringojs.org/):
|
---|
| 32 |
|
---|
| 33 | ```js
|
---|
| 34 | var he = require('he');
|
---|
| 35 | ```
|
---|
| 36 |
|
---|
| 37 | In [Rhino](http://www.mozilla.org/rhino/):
|
---|
| 38 |
|
---|
| 39 | ```js
|
---|
| 40 | load('he.js');
|
---|
| 41 | ```
|
---|
| 42 |
|
---|
| 43 | Using an AMD loader like [RequireJS](http://requirejs.org/):
|
---|
| 44 |
|
---|
| 45 | ```js
|
---|
| 46 | require(
|
---|
| 47 | {
|
---|
| 48 | 'paths': {
|
---|
| 49 | 'he': 'path/to/he'
|
---|
| 50 | }
|
---|
| 51 | },
|
---|
| 52 | ['he'],
|
---|
| 53 | function(he) {
|
---|
| 54 | console.log(he);
|
---|
| 55 | }
|
---|
| 56 | );
|
---|
| 57 | ```
|
---|
| 58 |
|
---|
| 59 | ## API
|
---|
| 60 |
|
---|
| 61 | ### `he.version`
|
---|
| 62 |
|
---|
| 63 | A string representing the semantic version number.
|
---|
| 64 |
|
---|
| 65 | ### `he.encode(text, options)`
|
---|
| 66 |
|
---|
| 67 | This function takes a string of text and encodes (by default) any symbols that aren’t printable ASCII symbols and `&`, `<`, `>`, `"`, `'`, and `` ` ``, replacing them with character references.
|
---|
| 68 |
|
---|
| 69 | ```js
|
---|
| 70 | he.encode('foo © bar ≠ baz 𝌆 qux');
|
---|
| 71 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 72 | ```
|
---|
| 73 |
|
---|
| 74 | As long as the input string contains [allowed code points](https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream) only, the return value of this function is always valid HTML. Any [(invalid) code points that cannot be represented using a character reference](https://html.spec.whatwg.org/multipage/syntax.html#table-charref-overrides) in the input are not encoded:
|
---|
| 75 |
|
---|
| 76 | ```js
|
---|
| 77 | he.encode('foo \0 bar');
|
---|
| 78 | // → 'foo \0 bar'
|
---|
| 79 | ```
|
---|
| 80 |
|
---|
| 81 | However, enabling [the `strict` option](https://github.com/mathiasbynens/he#strict) causes invalid code points to throw an exception. With `strict` enabled, `he.encode` either throws (if the input contains invalid code points) or returns a string of valid HTML.
|
---|
| 82 |
|
---|
| 83 | The `options` object is optional. It recognizes the following properties:
|
---|
| 84 |
|
---|
| 85 | #### `useNamedReferences`
|
---|
| 86 |
|
---|
| 87 | The default value for the `useNamedReferences` option is `false`. This means that `encode()` will not use any named character references (e.g. `©`) in the output — hexadecimal escapes (e.g. `©`) will be used instead. Set it to `true` to enable the use of named references.
|
---|
| 88 |
|
---|
| 89 | **Note that if compatibility with older browsers is a concern, this option should remain disabled.**
|
---|
| 90 |
|
---|
| 91 | ```js
|
---|
| 92 | // Using the global default setting (defaults to `false`):
|
---|
| 93 | he.encode('foo © bar ≠ baz 𝌆 qux');
|
---|
| 94 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 95 |
|
---|
| 96 | // Passing an `options` object to `encode`, to explicitly disallow named references:
|
---|
| 97 | he.encode('foo © bar ≠ baz 𝌆 qux', {
|
---|
| 98 | 'useNamedReferences': false
|
---|
| 99 | });
|
---|
| 100 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 101 |
|
---|
| 102 | // Passing an `options` object to `encode`, to explicitly allow named references:
|
---|
| 103 | he.encode('foo © bar ≠ baz 𝌆 qux', {
|
---|
| 104 | 'useNamedReferences': true
|
---|
| 105 | });
|
---|
| 106 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 107 | ```
|
---|
| 108 |
|
---|
| 109 | #### `decimal`
|
---|
| 110 |
|
---|
| 111 | The default value for the `decimal` option is `false`. If the option is enabled, `encode` will generally use decimal escapes (e.g. `©`) rather than hexadecimal escapes (e.g. `©`). Beside of this replacement, the basic behavior remains the same when combined with other options. For example: if both options `useNamedReferences` and `decimal` are enabled, named references (e.g. `©`) are used over decimal escapes. HTML entities without a named reference are encoded using decimal escapes.
|
---|
| 112 |
|
---|
| 113 | ```js
|
---|
| 114 | // Using the global default setting (defaults to `false`):
|
---|
| 115 | he.encode('foo © bar ≠ baz 𝌆 qux');
|
---|
| 116 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 117 |
|
---|
| 118 | // Passing an `options` object to `encode`, to explicitly disable decimal escapes:
|
---|
| 119 | he.encode('foo © bar ≠ baz 𝌆 qux', {
|
---|
| 120 | 'decimal': false
|
---|
| 121 | });
|
---|
| 122 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 123 |
|
---|
| 124 | // Passing an `options` object to `encode`, to explicitly enable decimal escapes:
|
---|
| 125 | he.encode('foo © bar ≠ baz 𝌆 qux', {
|
---|
| 126 | 'decimal': true
|
---|
| 127 | });
|
---|
| 128 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 129 |
|
---|
| 130 | // Passing an `options` object to `encode`, to explicitly allow named references and decimal escapes:
|
---|
| 131 | he.encode('foo © bar ≠ baz 𝌆 qux', {
|
---|
| 132 | 'useNamedReferences': true,
|
---|
| 133 | 'decimal': true
|
---|
| 134 | });
|
---|
| 135 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 136 | ```
|
---|
| 137 |
|
---|
| 138 | #### `encodeEverything`
|
---|
| 139 |
|
---|
| 140 | The default value for the `encodeEverything` option is `false`. This means that `encode()` will not use any character references for printable ASCII symbols that don’t need escaping. Set it to `true` to encode every symbol in the input string. When set to `true`, this option takes precedence over `allowUnsafeSymbols` (i.e. setting the latter to `true` in such a case has no effect).
|
---|
| 141 |
|
---|
| 142 | ```js
|
---|
| 143 | // Using the global default setting (defaults to `false`):
|
---|
| 144 | he.encode('foo © bar ≠ baz 𝌆 qux');
|
---|
| 145 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 146 |
|
---|
| 147 | // Passing an `options` object to `encode`, to explicitly encode all symbols:
|
---|
| 148 | he.encode('foo © bar ≠ baz 𝌆 qux', {
|
---|
| 149 | 'encodeEverything': true
|
---|
| 150 | });
|
---|
| 151 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 152 |
|
---|
| 153 | // This setting can be combined with the `useNamedReferences` option:
|
---|
| 154 | he.encode('foo © bar ≠ baz 𝌆 qux', {
|
---|
| 155 | 'encodeEverything': true,
|
---|
| 156 | 'useNamedReferences': true
|
---|
| 157 | });
|
---|
| 158 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 159 | ```
|
---|
| 160 |
|
---|
| 161 | #### `strict`
|
---|
| 162 |
|
---|
| 163 | The default value for the `strict` option is `false`. This means that `encode()` will encode any HTML text content you feed it, even if it contains any symbols that cause [parse errors](https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
|
---|
| 164 |
|
---|
| 165 | ```js
|
---|
| 166 | // Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
|
---|
| 167 | he.encode('\x01');
|
---|
| 168 | // → ''
|
---|
| 169 |
|
---|
| 170 | // Passing an `options` object to `encode`, to explicitly enable error-tolerant mode:
|
---|
| 171 | he.encode('\x01', {
|
---|
| 172 | 'strict': false
|
---|
| 173 | });
|
---|
| 174 | // → ''
|
---|
| 175 |
|
---|
| 176 | // Passing an `options` object to `encode`, to explicitly enable strict mode:
|
---|
| 177 | he.encode('\x01', {
|
---|
| 178 | 'strict': true
|
---|
| 179 | });
|
---|
| 180 | // → Parse error
|
---|
| 181 | ```
|
---|
| 182 |
|
---|
| 183 | #### `allowUnsafeSymbols`
|
---|
| 184 |
|
---|
| 185 | The default value for the `allowUnsafeSymbols` option is `false`. This means that characters that are unsafe for use in HTML content (`&`, `<`, `>`, `"`, `'`, and `` ` ``) will be encoded. When set to `true`, only non-ASCII characters will be encoded. If the `encodeEverything` option is set to `true`, this option will be ignored.
|
---|
| 186 |
|
---|
| 187 | ```js
|
---|
| 188 | he.encode('foo © and & ampersand', {
|
---|
| 189 | 'allowUnsafeSymbols': true
|
---|
| 190 | });
|
---|
| 191 | // → 'foo © and & ampersand'
|
---|
| 192 | ```
|
---|
| 193 |
|
---|
| 194 | #### Overriding default `encode` options globally
|
---|
| 195 |
|
---|
| 196 | The global default setting can be overridden by modifying the `he.encode.options` object. This saves you from passing in an `options` object for every call to `encode` if you want to use the non-default setting.
|
---|
| 197 |
|
---|
| 198 | ```js
|
---|
| 199 | // Read the global default setting:
|
---|
| 200 | he.encode.options.useNamedReferences;
|
---|
| 201 | // → `false` by default
|
---|
| 202 |
|
---|
| 203 | // Override the global default setting:
|
---|
| 204 | he.encode.options.useNamedReferences = true;
|
---|
| 205 |
|
---|
| 206 | // Using the global default setting, which is now `true`:
|
---|
| 207 | he.encode('foo © bar ≠ baz 𝌆 qux');
|
---|
| 208 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 209 | ```
|
---|
| 210 |
|
---|
| 211 | ### `he.decode(html, options)`
|
---|
| 212 |
|
---|
| 213 | This function takes a string of HTML and decodes any named and numerical character references in it using [the algorithm described in section 12.2.4.69 of the HTML spec](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references).
|
---|
| 214 |
|
---|
| 215 | ```js
|
---|
| 216 | he.decode('foo © bar ≠ baz 𝌆 qux');
|
---|
| 217 | // → 'foo © bar ≠ baz 𝌆 qux'
|
---|
| 218 | ```
|
---|
| 219 |
|
---|
| 220 | The `options` object is optional. It recognizes the following properties:
|
---|
| 221 |
|
---|
| 222 | #### `isAttributeValue`
|
---|
| 223 |
|
---|
| 224 | The default value for the `isAttributeValue` option is `false`. This means that `decode()` will decode the string as if it were used in [a text context in an HTML document](https://html.spec.whatwg.org/multipage/syntax.html#data-state). HTML has different rules for [parsing character references in attribute values](https://html.spec.whatwg.org/multipage/syntax.html#character-reference-in-attribute-value-state) — set this option to `true` to treat the input string as if it were used as an attribute value.
|
---|
| 225 |
|
---|
| 226 | ```js
|
---|
| 227 | // Using the global default setting (defaults to `false`, i.e. HTML text context):
|
---|
| 228 | he.decode('foo&bar');
|
---|
| 229 | // → 'foo&bar'
|
---|
| 230 |
|
---|
| 231 | // Passing an `options` object to `decode`, to explicitly assume an HTML text context:
|
---|
| 232 | he.decode('foo&bar', {
|
---|
| 233 | 'isAttributeValue': false
|
---|
| 234 | });
|
---|
| 235 | // → 'foo&bar'
|
---|
| 236 |
|
---|
| 237 | // Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context:
|
---|
| 238 | he.decode('foo&bar', {
|
---|
| 239 | 'isAttributeValue': true
|
---|
| 240 | });
|
---|
| 241 | // → 'foo&bar'
|
---|
| 242 | ```
|
---|
| 243 |
|
---|
| 244 | #### `strict`
|
---|
| 245 |
|
---|
| 246 | The default value for the `strict` option is `false`. This means that `decode()` will decode any HTML text content you feed it, even if it contains any entities that cause [parse errors](https://html.spec.whatwg.org/multipage/syntax.html#tokenizing-character-references). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
|
---|
| 247 |
|
---|
| 248 | ```js
|
---|
| 249 | // Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
|
---|
| 250 | he.decode('foo&bar');
|
---|
| 251 | // → 'foo&bar'
|
---|
| 252 |
|
---|
| 253 | // Passing an `options` object to `decode`, to explicitly enable error-tolerant mode:
|
---|
| 254 | he.decode('foo&bar', {
|
---|
| 255 | 'strict': false
|
---|
| 256 | });
|
---|
| 257 | // → 'foo&bar'
|
---|
| 258 |
|
---|
| 259 | // Passing an `options` object to `decode`, to explicitly enable strict mode:
|
---|
| 260 | he.decode('foo&bar', {
|
---|
| 261 | 'strict': true
|
---|
| 262 | });
|
---|
| 263 | // → Parse error
|
---|
| 264 | ```
|
---|
| 265 |
|
---|
| 266 | #### Overriding default `decode` options globally
|
---|
| 267 |
|
---|
| 268 | The global default settings for the `decode` function can be overridden by modifying the `he.decode.options` object. This saves you from passing in an `options` object for every call to `decode` if you want to use a non-default setting.
|
---|
| 269 |
|
---|
| 270 | ```js
|
---|
| 271 | // Read the global default setting:
|
---|
| 272 | he.decode.options.isAttributeValue;
|
---|
| 273 | // → `false` by default
|
---|
| 274 |
|
---|
| 275 | // Override the global default setting:
|
---|
| 276 | he.decode.options.isAttributeValue = true;
|
---|
| 277 |
|
---|
| 278 | // Using the global default setting, which is now `true`:
|
---|
| 279 | he.decode('foo&bar');
|
---|
| 280 | // → 'foo&bar'
|
---|
| 281 | ```
|
---|
| 282 |
|
---|
| 283 | ### `he.escape(text)`
|
---|
| 284 |
|
---|
| 285 | This function takes a string of text and escapes it for use in text contexts in XML or HTML documents. Only the following characters are escaped: `&`, `<`, `>`, `"`, `'`, and `` ` ``.
|
---|
| 286 |
|
---|
| 287 | ```js
|
---|
| 288 | he.escape('<img src=\'x\' onerror="prompt(1)">');
|
---|
| 289 | // → '<img src='x' onerror="prompt(1)">'
|
---|
| 290 | ```
|
---|
| 291 |
|
---|
| 292 | ### `he.unescape(html, options)`
|
---|
| 293 |
|
---|
| 294 | `he.unescape` is an alias for `he.decode`. It takes a string of HTML and decodes any named and numerical character references in it.
|
---|
| 295 |
|
---|
| 296 | ### Using the `he` binary
|
---|
| 297 |
|
---|
| 298 | To use the `he` binary in your shell, simply install _he_ globally using npm:
|
---|
| 299 |
|
---|
| 300 | ```bash
|
---|
| 301 | npm install -g he
|
---|
| 302 | ```
|
---|
| 303 |
|
---|
| 304 | After that you will be able to encode/decode HTML entities from the command line:
|
---|
| 305 |
|
---|
| 306 | ```bash
|
---|
| 307 | $ he --encode 'föo ♥ bår 𝌆 baz'
|
---|
| 308 | föo ♥ bår 𝌆 baz
|
---|
| 309 |
|
---|
| 310 | $ he --encode --use-named-refs 'föo ♥ bår 𝌆 baz'
|
---|
| 311 | föo ♥ bår 𝌆 baz
|
---|
| 312 |
|
---|
| 313 | $ he --decode 'föo ♥ bår 𝌆 baz'
|
---|
| 314 | föo ♥ bår 𝌆 baz
|
---|
| 315 | ```
|
---|
| 316 |
|
---|
| 317 | Read a local text file, encode it for use in an HTML text context, and save the result to a new file:
|
---|
| 318 |
|
---|
| 319 | ```bash
|
---|
| 320 | $ he --encode < foo.txt > foo-escaped.html
|
---|
| 321 | ```
|
---|
| 322 |
|
---|
| 323 | Or do the same with an online text file:
|
---|
| 324 |
|
---|
| 325 | ```bash
|
---|
| 326 | $ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.html
|
---|
| 327 | ```
|
---|
| 328 |
|
---|
| 329 | Or, the opposite — read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file:
|
---|
| 330 |
|
---|
| 331 | ```bash
|
---|
| 332 | $ he --decode < foo-escaped.html > foo.txt
|
---|
| 333 | ```
|
---|
| 334 |
|
---|
| 335 | Or do the same with an online HTML snippet:
|
---|
| 336 |
|
---|
| 337 | ```bash
|
---|
| 338 | $ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txt
|
---|
| 339 | ```
|
---|
| 340 |
|
---|
| 341 | See `he --help` for the full list of options.
|
---|
| 342 |
|
---|
| 343 | ## Support
|
---|
| 344 |
|
---|
| 345 | _he_ has been tested in at least:
|
---|
| 346 |
|
---|
| 347 | * Chrome 27-50
|
---|
| 348 | * Firefox 3-45
|
---|
| 349 | * Safari 4-9
|
---|
| 350 | * Opera 10-12, 15–37
|
---|
| 351 | * IE 6–11
|
---|
| 352 | * Edge
|
---|
| 353 | * Narwhal 0.3.2
|
---|
| 354 | * Node.js v0.10, v0.12, v4, v5
|
---|
| 355 | * PhantomJS 1.9.0
|
---|
| 356 | * Rhino 1.7RC4
|
---|
| 357 | * RingoJS 0.8-0.11
|
---|
| 358 |
|
---|
| 359 | ## Unit tests & code coverage
|
---|
| 360 |
|
---|
| 361 | After cloning this repository, run `npm install` to install the dependencies needed for he development and testing. You may want to install Istanbul _globally_ using `npm install istanbul -g`.
|
---|
| 362 |
|
---|
| 363 | Once that’s done, you can run the unit tests in Node using `npm test` or `node tests/tests.js`. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use `grunt test`.
|
---|
| 364 |
|
---|
| 365 | To generate the code coverage report, use `grunt cover`.
|
---|
| 366 |
|
---|
| 367 | ## Acknowledgements
|
---|
| 368 |
|
---|
| 369 | Thanks to [Simon Pieters](https://simon.html5.org/) ([@zcorpan](https://twitter.com/zcorpan)) for the many suggestions.
|
---|
| 370 |
|
---|
| 371 | ## Author
|
---|
| 372 |
|
---|
| 373 | | [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") |
|
---|
| 374 | |---|
|
---|
| 375 | | [Mathias Bynens](https://mathiasbynens.be/) |
|
---|
| 376 |
|
---|
| 377 | ## License
|
---|
| 378 |
|
---|
| 379 | _he_ is available under the [MIT](https://mths.be/mit) license.
|
---|