[79a0317] | 1 | # Regenerate [![Build status](https://travis-ci.org/mathiasbynens/regenerate.svg?branch=master)](https://travis-ci.org/mathiasbynens/regenerate) [![Code coverage status](https://img.shields.io/codecov/c/github/mathiasbynens/regenerate.svg)](https://codecov.io/gh/mathiasbynens/regenerate)
|
---|
| 2 |
|
---|
| 3 | _Regenerate_ is a Unicode-aware regex generator for JavaScript. It allows you to easily generate ES5-compatible regular expressions based on a given set of Unicode symbols or code points. (This is trickier than you might think, because of [how JavaScript deals with astral symbols](https://mathiasbynens.be/notes/javascript-unicode).)
|
---|
| 4 |
|
---|
| 5 | ## Installation
|
---|
| 6 |
|
---|
| 7 | Via [npm](https://npmjs.org/):
|
---|
| 8 |
|
---|
| 9 | ```bash
|
---|
| 10 | npm install regenerate
|
---|
| 11 | ```
|
---|
| 12 |
|
---|
| 13 | Via [Bower](http://bower.io/):
|
---|
| 14 |
|
---|
| 15 | ```bash
|
---|
| 16 | bower install regenerate
|
---|
| 17 | ```
|
---|
| 18 |
|
---|
| 19 | In a browser:
|
---|
| 20 |
|
---|
| 21 | ```html
|
---|
| 22 | <script src="regenerate.js"></script>
|
---|
| 23 | ```
|
---|
| 24 |
|
---|
| 25 | In [Node.js](https://nodejs.org/), [io.js](https://iojs.org/), and [RingoJS ≥ v0.8.0](http://ringojs.org/):
|
---|
| 26 |
|
---|
| 27 | ```js
|
---|
| 28 | var regenerate = require('regenerate');
|
---|
| 29 | ```
|
---|
| 30 |
|
---|
| 31 | In [Narwhal](http://narwhaljs.org/) and [RingoJS ≤ v0.7.0](http://ringojs.org/):
|
---|
| 32 |
|
---|
| 33 | ```js
|
---|
| 34 | var regenerate = require('regenerate').regenerate;
|
---|
| 35 | ```
|
---|
| 36 |
|
---|
| 37 | In [Rhino](http://www.mozilla.org/rhino/):
|
---|
| 38 |
|
---|
| 39 | ```js
|
---|
| 40 | load('regenerate.js');
|
---|
| 41 | ```
|
---|
| 42 |
|
---|
| 43 | Using an AMD loader like [RequireJS](http://requirejs.org/):
|
---|
| 44 |
|
---|
| 45 | ```js
|
---|
| 46 | require(
|
---|
| 47 | {
|
---|
| 48 | 'paths': {
|
---|
| 49 | 'regenerate': 'path/to/regenerate'
|
---|
| 50 | }
|
---|
| 51 | },
|
---|
| 52 | ['regenerate'],
|
---|
| 53 | function(regenerate) {
|
---|
| 54 | console.log(regenerate);
|
---|
| 55 | }
|
---|
| 56 | );
|
---|
| 57 | ```
|
---|
| 58 |
|
---|
| 59 | ## API
|
---|
| 60 |
|
---|
| 61 | ### `regenerate(value1, value2, value3, ...)`
|
---|
| 62 |
|
---|
| 63 | The main Regenerate function. Calling this function creates a new set that gets a chainable API.
|
---|
| 64 |
|
---|
| 65 | ```js
|
---|
| 66 | var set = regenerate()
|
---|
| 67 | .addRange(0x60, 0x69) // add U+0060 to U+0069
|
---|
| 68 | .remove(0x62, 0x64) // remove U+0062 and U+0064
|
---|
| 69 | .add(0x1D306); // add U+1D306
|
---|
| 70 | set.valueOf();
|
---|
| 71 | // → [0x60, 0x61, 0x63, 0x65, 0x66, 0x67, 0x68, 0x69, 0x1D306]
|
---|
| 72 | set.toString();
|
---|
| 73 | // → '[`ace-i]|\\uD834\\uDF06'
|
---|
| 74 | set.toRegExp();
|
---|
| 75 | // → /[`ace-i]|\uD834\uDF06/
|
---|
| 76 | ```
|
---|
| 77 |
|
---|
| 78 | Any arguments passed to `regenerate()` will be added to the set right away. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.
|
---|
| 79 |
|
---|
| 80 | ```js
|
---|
| 81 | regenerate(0x1D306, 'A', '©', 0x2603).toString();
|
---|
| 82 | // → '[A\\xA9\\u2603]|\\uD834\\uDF06'
|
---|
| 83 |
|
---|
| 84 | var items = [0x1D306, 'A', '©', 0x2603];
|
---|
| 85 | regenerate(items).toString();
|
---|
| 86 | // → '[A\\xA9\\u2603]|\\uD834\\uDF06'
|
---|
| 87 | ```
|
---|
| 88 |
|
---|
| 89 | ### `regenerate.prototype.add(value1, value2, value3, ...)`
|
---|
| 90 |
|
---|
| 91 | Any arguments passed to `add()` are added to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.
|
---|
| 92 |
|
---|
| 93 | ```js
|
---|
| 94 | regenerate().add(0x1D306, 'A', '©', 0x2603).toString();
|
---|
| 95 | // → '[A\\xA9\\u2603]|\\uD834\\uDF06'
|
---|
| 96 |
|
---|
| 97 | var items = [0x1D306, 'A', '©', 0x2603];
|
---|
| 98 | regenerate().add(items).toString();
|
---|
| 99 | // → '[A\\xA9\\u2603]|\\uD834\\uDF06'
|
---|
| 100 | ```
|
---|
| 101 |
|
---|
| 102 | It’s also possible to pass in a Regenerate instance. Doing so adds all code points in that instance to the current set.
|
---|
| 103 |
|
---|
| 104 | ```js
|
---|
| 105 | var set = regenerate(0x1D306, 'A');
|
---|
| 106 | regenerate().add('©', 0x2603).add(set).toString();
|
---|
| 107 | // → '[A\\xA9\\u2603]|\\uD834\\uDF06'
|
---|
| 108 | ```
|
---|
| 109 |
|
---|
| 110 | Note that the initial call to `regenerate()` acts like `add()`. This allows you to create a new Regenerate instance and add some code points to it in one go:
|
---|
| 111 |
|
---|
| 112 | ```js
|
---|
| 113 | regenerate(0x1D306, 'A', '©', 0x2603).toString();
|
---|
| 114 | // → '[A\\xA9\\u2603]|\\uD834\\uDF06'
|
---|
| 115 | ```
|
---|
| 116 |
|
---|
| 117 | ### `regenerate.prototype.remove(value1, value2, value3, ...)`
|
---|
| 118 |
|
---|
| 119 | Any arguments passed to `remove()` are removed from the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted, as well as arrays containing values of these types.
|
---|
| 120 |
|
---|
| 121 | ```js
|
---|
| 122 | regenerate(0x1D306, 'A', '©', 0x2603).remove('☃').toString();
|
---|
| 123 | // → '[A\\xA9]|\\uD834\\uDF06'
|
---|
| 124 | ```
|
---|
| 125 |
|
---|
| 126 | It’s also possible to pass in a Regenerate instance. Doing so removes all code points in that instance from the current set.
|
---|
| 127 |
|
---|
| 128 | ```js
|
---|
| 129 | var set = regenerate('☃');
|
---|
| 130 | regenerate(0x1D306, 'A', '©', 0x2603).remove(set).toString();
|
---|
| 131 | // → '[A\\xA9]|\\uD834\\uDF06'
|
---|
| 132 | ```
|
---|
| 133 |
|
---|
| 134 | ### `regenerate.prototype.addRange(start, end)`
|
---|
| 135 |
|
---|
| 136 | Adds a range of code points from `start` to `end` (inclusive) to the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.
|
---|
| 137 |
|
---|
| 138 | ```js
|
---|
| 139 | regenerate(0x1D306).addRange(0x00, 0xFF).toString(16);
|
---|
| 140 | // → '[\\0-\\xFF]|\\uD834\\uDF06'
|
---|
| 141 |
|
---|
| 142 | regenerate().addRange('A', 'z').toString();
|
---|
| 143 | // → '[A-z]'
|
---|
| 144 | ```
|
---|
| 145 |
|
---|
| 146 | ### `regenerate.prototype.removeRange(start, end)`
|
---|
| 147 |
|
---|
| 148 | Removes a range of code points from `start` to `end` (inclusive) from the set. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.
|
---|
| 149 |
|
---|
| 150 | ```js
|
---|
| 151 | regenerate()
|
---|
| 152 | .addRange(0x000000, 0x10FFFF) // add all Unicode code points
|
---|
| 153 | .removeRange('A', 'z') // remove all symbols from `A` to `z`
|
---|
| 154 | .toString();
|
---|
| 155 | // → '[\\0-@\\{-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'
|
---|
| 156 |
|
---|
| 157 | regenerate()
|
---|
| 158 | .addRange(0x000000, 0x10FFFF) // add all Unicode code points
|
---|
| 159 | .removeRange(0x0041, 0x007A) // remove all code points from U+0041 to U+007A
|
---|
| 160 | .toString();
|
---|
| 161 | // → '[\\0-@\\{-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'
|
---|
| 162 | ```
|
---|
| 163 |
|
---|
| 164 | ### `regenerate.prototype.intersection(codePoints)`
|
---|
| 165 |
|
---|
| 166 | Removes any code points from the set that are not present in both the set and the given `codePoints` array. `codePoints` must be an array of numeric code point values, i.e. numbers.
|
---|
| 167 |
|
---|
| 168 | ```js
|
---|
| 169 | regenerate()
|
---|
| 170 | .addRange(0x00, 0xFF) // add extended ASCII code points
|
---|
| 171 | .intersection([0x61, 0x69]) // remove all code points from the set except for these
|
---|
| 172 | .toString();
|
---|
| 173 | // → '[ai]'
|
---|
| 174 | ```
|
---|
| 175 |
|
---|
| 176 | Instead of the `codePoints` array, it’s also possible to pass in a Regenerate instance.
|
---|
| 177 |
|
---|
| 178 | ```js
|
---|
| 179 | var whitelist = regenerate(0x61, 0x69);
|
---|
| 180 |
|
---|
| 181 | regenerate()
|
---|
| 182 | .addRange(0x00, 0xFF) // add extended ASCII code points
|
---|
| 183 | .intersection(whitelist) // remove all code points from the set except for those in the `whitelist` set
|
---|
| 184 | .toString();
|
---|
| 185 | // → '[ai]'
|
---|
| 186 | ```
|
---|
| 187 |
|
---|
| 188 | ### `regenerate.prototype.contains(value)`
|
---|
| 189 |
|
---|
| 190 | Returns `true` if the given value is part of the set, and `false` otherwise. Both code points (numbers) and symbols (strings consisting of a single Unicode symbol) are accepted.
|
---|
| 191 |
|
---|
| 192 | ```js
|
---|
| 193 | var set = regenerate().addRange(0x00, 0xFF);
|
---|
| 194 | set.contains('A');
|
---|
| 195 | // → true
|
---|
| 196 | set.contains(0x1D306);
|
---|
| 197 | // → false
|
---|
| 198 | ```
|
---|
| 199 |
|
---|
| 200 | ### `regenerate.prototype.clone()`
|
---|
| 201 |
|
---|
| 202 | Returns a clone of the current code point set. Any actions performed on the clone won’t mutate the original set.
|
---|
| 203 |
|
---|
| 204 | ```js
|
---|
| 205 | var setA = regenerate(0x1D306);
|
---|
| 206 | var setB = setA.clone().add(0x1F4A9);
|
---|
| 207 | setA.toArray();
|
---|
| 208 | // → [0x1D306]
|
---|
| 209 | setB.toArray();
|
---|
| 210 | // → [0x1D306, 0x1F4A9]
|
---|
| 211 | ```
|
---|
| 212 |
|
---|
| 213 | ### `regenerate.prototype.toString(options)`
|
---|
| 214 |
|
---|
| 215 | Returns a string representing (part of) a regular expression that matches all the symbols mapped to the code points within the set.
|
---|
| 216 |
|
---|
| 217 | ```js
|
---|
| 218 | regenerate(0x1D306, 0x1F4A9).toString();
|
---|
| 219 | // → '\\uD834\\uDF06|\\uD83D\\uDCA9'
|
---|
| 220 | ```
|
---|
| 221 |
|
---|
| 222 | If the `bmpOnly` property of the optional `options` object is set to `true`, the output matches surrogates individually, regardless of whether they’re lone surrogates or just part of a surrogate pair. This simplifies the output, but it can only be used in case you’re certain the strings it will be used on don’t contain any astral symbols.
|
---|
| 223 |
|
---|
| 224 | ```js
|
---|
| 225 | var highSurrogates = regenerate().addRange(0xD800, 0xDBFF);
|
---|
| 226 | highSurrogates.toString();
|
---|
| 227 | // → '[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])'
|
---|
| 228 | highSurrogates.toString({ 'bmpOnly': true });
|
---|
| 229 | // → '[\\uD800-\\uDBFF]'
|
---|
| 230 |
|
---|
| 231 | var lowSurrogates = regenerate().addRange(0xDC00, 0xDFFF);
|
---|
| 232 | lowSurrogates.toString();
|
---|
| 233 | // → '(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]'
|
---|
| 234 | lowSurrogates.toString({ 'bmpOnly': true });
|
---|
| 235 | // → '[\\uDC00-\\uDFFF]'
|
---|
| 236 | ```
|
---|
| 237 |
|
---|
| 238 | Note that lone low surrogates cannot be matched accurately using regular expressions in JavaScript without the use of [lookbehind assertions](https://mathiasbynens.be/notes/es-regexp-proposals#lookbehinds), which aren't yet widely supported. Regenerate’s output makes a best-effort approach but [there can be false negatives in this regard](https://github.com/mathiasbynens/regenerate/issues/28#issuecomment-72224808).
|
---|
| 239 |
|
---|
| 240 | If the `hasUnicodeFlag` property of the optional `options` object is set to `true`, the output makes use of Unicode code point escapes (`\u{…}`) where applicable. This simplifies the output at the cost of compatibility and portability, since it means the output can only be used as a pattern in a regular expression with [the ES6 `u` flag](https://mathiasbynens.be/notes/es6-unicode-regex) enabled.
|
---|
| 241 |
|
---|
| 242 | ```js
|
---|
| 243 | var set = regenerate().addRange(0x0, 0x10FFFF);
|
---|
| 244 |
|
---|
| 245 | set.toString();
|
---|
| 246 | // → '[\\0-\\uD7FF\\uE000-\\uFFFF]|[\\uD800-\\uDBFF][\\uDC00-\\uDFFF]|[\\uD800-\\uDBFF](?![\\uDC00-\\uDFFF])|(?:[^\\uD800-\\uDBFF]|^)[\\uDC00-\\uDFFF]''
|
---|
| 247 |
|
---|
| 248 | set.toString({ 'hasUnicodeFlag': true });
|
---|
| 249 | // → '[\\0-\\u{10FFFF}]'
|
---|
| 250 | ```
|
---|
| 251 |
|
---|
| 252 | ### `regenerate.prototype.toRegExp(flags = '')`
|
---|
| 253 |
|
---|
| 254 | Returns a regular expression that matches all the symbols mapped to the code points within the set. Optionally, you can pass [flags](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#Parameters) to be added to the regular expression.
|
---|
| 255 |
|
---|
| 256 | ```js
|
---|
| 257 | var regex = regenerate(0x1D306, 0x1F4A9).toRegExp();
|
---|
| 258 | // → /\uD834\uDF06|\uD83D\uDCA9/
|
---|
| 259 | regex.test('𝌆');
|
---|
| 260 | // → true
|
---|
| 261 | regex.test('A');
|
---|
| 262 | // → false
|
---|
| 263 |
|
---|
| 264 | // With flags:
|
---|
| 265 | var regex = regenerate(0x1D306, 0x1F4A9).toRegExp('g');
|
---|
| 266 | // → /\uD834\uDF06|\uD83D\uDCA9/g
|
---|
| 267 | ```
|
---|
| 268 |
|
---|
| 269 | **Note:** This probably shouldn’t be used. Regenerate is intended as a tool that is used as part of a build process, not at runtime.
|
---|
| 270 |
|
---|
| 271 | ### `regenerate.prototype.valueOf()` or `regenerate.prototype.toArray()`
|
---|
| 272 |
|
---|
| 273 | Returns a sorted array of unique code points in the set.
|
---|
| 274 |
|
---|
| 275 | ```js
|
---|
| 276 | regenerate(0x1D306)
|
---|
| 277 | .addRange(0x60, 0x65)
|
---|
| 278 | .add(0x59, 0x60) // note: 0x59 is added after 0x65, and 0x60 is a duplicate
|
---|
| 279 | .valueOf();
|
---|
| 280 | // → [0x59, 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x1D306]
|
---|
| 281 | ```
|
---|
| 282 |
|
---|
| 283 | ### `regenerate.version`
|
---|
| 284 |
|
---|
| 285 | A string representing the semantic version number.
|
---|
| 286 |
|
---|
| 287 | ## Combine Regenerate with other libraries
|
---|
| 288 |
|
---|
| 289 | Regenerate gets even better when combined with other libraries such as [Punycode.js](https://mths.be/punycode). Here’s an example where [Punycode.js](https://mths.be/punycode) is used to convert a string into an array of code points, that is then passed on to Regenerate:
|
---|
| 290 |
|
---|
| 291 | ```js
|
---|
| 292 | var regenerate = require('regenerate');
|
---|
| 293 | var punycode = require('punycode');
|
---|
| 294 |
|
---|
| 295 | var string = 'Lorem ipsum dolor sit amet.';
|
---|
| 296 | // Get an array of all code points used in the string:
|
---|
| 297 | var codePoints = punycode.ucs2.decode(string);
|
---|
| 298 |
|
---|
| 299 | // Generate a regular expression that matches any of the symbols used in the string:
|
---|
| 300 | regenerate(codePoints).toString();
|
---|
| 301 | // → '[ \\.Ladeilmopr-u]'
|
---|
| 302 | ```
|
---|
| 303 |
|
---|
| 304 | In ES6 you can do something similar with [`Array.from`](https://mths.be/array-from) which uses [the string’s iterator](https://mathiasbynens.be/notes/javascript-unicode#iterating-over-symbols) to split the given string into an array of strings that each contain a single symbol. [`regenerate()`](#regenerateprototypeaddvalue1-value2-value3-) accepts both strings and code points, remember?
|
---|
| 305 |
|
---|
| 306 | ```js
|
---|
| 307 | var regenerate = require('regenerate');
|
---|
| 308 |
|
---|
| 309 | var string = 'Lorem ipsum dolor sit amet.';
|
---|
| 310 | // Get an array of all symbols used in the string:
|
---|
| 311 | var symbols = Array.from(string);
|
---|
| 312 |
|
---|
| 313 | // Generate a regular expression that matches any of the symbols used in the string:
|
---|
| 314 | regenerate(symbols).toString();
|
---|
| 315 | // → '[ \\.Ladeilmopr-u]'
|
---|
| 316 | ```
|
---|
| 317 |
|
---|
| 318 | ## Support
|
---|
| 319 |
|
---|
| 320 | Regenerate supports at least Chrome 27+, Firefox 3+, Safari 4+, Opera 10+, IE 6+, Node.js v0.10.0+, io.js v1.0.0+, Narwhal 0.3.2+, RingoJS 0.8+, PhantomJS 1.9.0+, and Rhino 1.7RC4+.
|
---|
| 321 |
|
---|
| 322 | ## Unit tests & code coverage
|
---|
| 323 |
|
---|
| 324 | After cloning this repository, run `npm install` to install the dependencies needed for Regenerate development and testing. You may want to install Istanbul _globally_ using `npm install istanbul -g`.
|
---|
| 325 |
|
---|
| 326 | Once that’s done, you can run the unit tests in Node using `npm test` or `node tests/tests.js`. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use `grunt test`.
|
---|
| 327 |
|
---|
| 328 | To generate the code coverage report, use `grunt cover`.
|
---|
| 329 |
|
---|
| 330 | ## Author
|
---|
| 331 |
|
---|
| 332 | | [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") |
|
---|
| 333 | |---|
|
---|
| 334 | | [Mathias Bynens](https://mathiasbynens.be/) |
|
---|
| 335 |
|
---|
| 336 | ## License
|
---|
| 337 |
|
---|
| 338 | Regenerate is available under the [MIT](https://mths.be/mit) license.
|
---|