Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

source: imaps-frontend/node_modules/graphemer/README.md@ 0c6b92a

main

Last change on this file since 0c6b92a was d565449, checked in by stefan toskovski <stefantoska84@…>, 12 months ago
Update repo after prototype presentation
Property mode set to `100644`
File size: 5.9 KB

Rev	Line
[d565449]	1	# Graphemer: Unicode Character Splitter 🪓
	2
	3	## Introduction
	4
	5	This library continues the work of [Grapheme Splitter](https://github.com/orling/grapheme-splitter) and supports the following unicode versions:
	6
	7	- Unicode 15 and below `[v1.4.0]`
	8	- Unicode 14 and below `[v1.3.0]`
	9	- Unicode 13 and below `[v1.1.0]`
	10	- Unicode 11 and below `[v1.0.0]` (Unicode 10 supported by `grapheme-splitter`)
	11
	12	In JavaScript there is not always a one-to-one relationship between string characters and what a user would call a separate visual "letter". Some symbols are represented by several characters. This can cause issues when splitting strings and inadvertently cutting a multi-char letter in half, or when you need the actual number of letters in a string.
	13
	14	For example, emoji characters like "🌷","🎁","💩","😜" and "👍" are represented by two JavaScript characters each (high surrogate and low surrogate). That is,
	15
	16	```javascript
	17	'🌷'.length == 2;
	18	```
	19
	20	The combined emoji are even longer:
	21
	22	```javascript
	23	'🏳️‍🌈'.length == 6;
	24	```
	25
	26	What's more, some languages often include combining marks - characters that are used to modify the letters before them. Common examples are the German letter ü and the Spanish letter ñ. Sometimes they can be represented alternatively both as a single character and as a letter + combining mark, with both forms equally valid:
	27
	28	```javascript
	29	var two = 'ñ'; // unnormalized two-char n+◌̃, i.e. "\u006E\u0303";
	30	var one = 'ñ'; // normalized single-char, i.e. "\u00F1"
	31
	32	console.log(one != two); // prints 'true'
	33	```
	34
	35	Unicode normalization, as performed by the popular punycode.js library or ECMAScript 6's String.normalize, can sometimes fix those differences and turn two-char sequences into single characters. But it is not enough in all cases. Some languages like Hindi make extensive use of combining marks on their letters, that have no dedicated single-codepoint Unicode sequences, due to the sheer number of possible combinations.
	36	For example, the Hindi word "अनुच्छेद" is comprised of 5 letters and 3 combining marks:
	37
	38	अ + न + ु + च + ् + छ + े + द
	39
	40	which is in fact just 5 user-perceived letters:
	41
	42	अ + नु + च् + छे + द
	43
	44	and which Unicode normalization would not combine properly.
	45	There are also the unusual letter+combining mark combinations which have no dedicated Unicode codepoint. The string Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘ obviously has 5 separate letters, but is in fact comprised of 58 JavaScript characters, most of which are combining marks.
	46
	47	Enter the `graphemer` library. It can be used to properly split JavaScript strings into what a human user would call separate letters (or "extended grapheme clusters" in Unicode terminology), no matter what their internal representation is. It is an implementation on the [Default Grapheme Cluster Boundary](http://unicode.org/reports/tr29/#Default_Grapheme_Cluster_Table) of [UAX #29](http://www.unicode.org/reports/tr29/).
	48
	49	## Installation
	50
	51	Install `graphemer` using the NPM command below:
	52
	53	```
	54	$ npm i graphemer
	55	```
	56
	57	## Usage
	58
	59	If you're using [Typescript](https://www.typescriptlang.org/) or a compiler like [Babel](https://babeljs.io/) (or something like Create React App) things are pretty simple; just import, initialize and use!
	60
	61	```javascript
	62	import Graphemer from 'graphemer';
	63
	64	const splitter = new Graphemer();
	65
	66	// split the string to an array of grapheme clusters (one string each)
	67	const graphemes = splitter.splitGraphemes(string);
	68
	69	// iterate the string to an iterable iterator of grapheme clusters (one string each)
	70	const graphemeIterator = splitter.iterateGraphemes(string);
	71
	72	// or do this if you just need their number
	73	const graphemeCount = splitter.countGraphemes(string);
	74	```
	75
	76	If you're using vanilla Node you can use the `require()` method.
	77
	78	```javascript
	79	const Graphemer = require('graphemer').default;
	80
	81	const splitter = new Graphemer();
	82
	83	const graphemes = splitter.splitGraphemes(string);
	84	```
	85
	86	## Examples
	87
	88	```javascript
	89	import Graphemer from 'graphemer';
	90
	91	const splitter = new Graphemer();
	92
	93	// plain latin alphabet - nothing spectacular
	94	splitter.splitGraphemes('abcd'); // returns ["a", "b", "c", "d"]
	95
	96	// two-char emojis and six-char combined emoji
	97	splitter.splitGraphemes('🌷🎁💩😜👍🏳️‍🌈'); // returns ["🌷","🎁","💩","😜","👍","🏳️‍🌈"]
	98
	99	// diacritics as combining marks, 10 JavaScript chars
	100	splitter.splitGraphemes('Ĺo͂řȩm̅'); // returns ["Ĺ","o͂","ř","ȩ","m̅"]
	101
	102	// individual Korean characters (Jamo), 4 JavaScript chars
	103	splitter.splitGraphemes('뎌쉐'); // returns ["뎌","쉐"]
	104
	105	// Hindi text with combining marks, 8 JavaScript chars
	106	splitter.splitGraphemes('अनुच्छेद'); // returns ["अ","नु","च्","छे","द"]
	107
	108	// demonic multiple combining marks, 75 JavaScript chars
	109	splitter.splitGraphemes('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞'); // returns ["Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍","A̴̵̜̰͔ͫ͗͢","L̠ͨͧͩ͘","G̴̻͈͍͔̹̑͗̎̅͛́","Ǫ̵̹̻̝̳͂̌̌͘","!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"]
	110	```
	111
	112	## TypeScript
	113
	114	Graphemer is built with TypeScript and, of course, includes type declarations.
	115
	116	```javascript
	117	import Graphemer from 'graphemer';
	118
	119	const splitter = new Graphemer();
	120
	121	const split: string[] = splitter.splitGraphemes('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞');
	122	```
	123
	124	## Contributing
	125
	126	See [Contribution Guide](./CONTRIBUTING.md).
	127
	128	## Acknowledgements
	129
	130	This library is a fork of the incredible work done by Orlin Georgiev and Huáng Jùnliàng at https://github.com/orling/grapheme-splitter.
	131
	132	The original library was heavily influenced by Devon Govett's excellent [grapheme-breaker](https://github.com/devongovett/grapheme-breaker) CoffeeScript library.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: