[6a3a178] | 1 |
|
---|
| 2 | chardet [![Build Status](https://travis-ci.org/runk/node-chardet.png)](https://travis-ci.org/runk/node-chardet)
|
---|
| 3 | =====
|
---|
| 4 |
|
---|
| 5 | Chardet is a character detection module for NodeJS written in pure Javascript.
|
---|
| 6 | Module is based on ICU project http://site.icu-project.org/, which uses character
|
---|
| 7 | occurency analysis to determine the most probable encoding.
|
---|
| 8 |
|
---|
| 9 | ## Installation
|
---|
| 10 |
|
---|
| 11 | ```
|
---|
| 12 | npm i chardet
|
---|
| 13 | ```
|
---|
| 14 |
|
---|
| 15 | ## Usage
|
---|
| 16 |
|
---|
| 17 | To return the encoding with the highest confidence:
|
---|
| 18 | ```javascript
|
---|
| 19 | var chardet = require('chardet');
|
---|
| 20 | chardet.detect(Buffer.alloc('hello there!'));
|
---|
| 21 | // or
|
---|
| 22 | chardet.detectFile('/path/to/file', function(err, encoding) {});
|
---|
| 23 | // or
|
---|
| 24 | chardet.detectFileSync('/path/to/file');
|
---|
| 25 | ```
|
---|
| 26 |
|
---|
| 27 |
|
---|
| 28 | To return the full list of possible encodings:
|
---|
| 29 | ```javascript
|
---|
| 30 | var chardet = require('chardet');
|
---|
| 31 | chardet.detectAll(Buffer.alloc('hello there!'));
|
---|
| 32 | // or
|
---|
| 33 | chardet.detectFileAll('/path/to/file', function(err, encoding) {});
|
---|
| 34 | // or
|
---|
| 35 | chardet.detectFileAllSync('/path/to/file');
|
---|
| 36 |
|
---|
| 37 | //Returned value is an array of objects sorted by confidence value in decending order
|
---|
| 38 | //e.g. [{ confidence: 90, name: 'UTF-8'}, {confidence: 20, name: 'windows-1252', lang: 'fr'}]
|
---|
| 39 | ```
|
---|
| 40 |
|
---|
| 41 | ## Working with large data sets
|
---|
| 42 |
|
---|
| 43 | Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy),
|
---|
| 44 | you can sample only first N bytes of the buffer:
|
---|
| 45 |
|
---|
| 46 | ```javascript
|
---|
| 47 | chardet.detectFile('/path/to/file', { sampleSize: 32 }, function(err, encoding) {});
|
---|
| 48 | ```
|
---|
| 49 |
|
---|
| 50 | ## Supported Encodings:
|
---|
| 51 |
|
---|
| 52 | * UTF-8
|
---|
| 53 | * UTF-16 LE
|
---|
| 54 | * UTF-16 BE
|
---|
| 55 | * UTF-32 LE
|
---|
| 56 | * UTF-32 BE
|
---|
| 57 | * ISO-2022-JP
|
---|
| 58 | * ISO-2022-KR
|
---|
| 59 | * ISO-2022-CN
|
---|
| 60 | * Shift-JIS
|
---|
| 61 | * Big5
|
---|
| 62 | * EUC-JP
|
---|
| 63 | * EUC-KR
|
---|
| 64 | * GB18030
|
---|
| 65 | * ISO-8859-1
|
---|
| 66 | * ISO-8859-2
|
---|
| 67 | * ISO-8859-5
|
---|
| 68 | * ISO-8859-6
|
---|
| 69 | * ISO-8859-7
|
---|
| 70 | * ISO-8859-8
|
---|
| 71 | * ISO-8859-9
|
---|
| 72 | * windows-1250
|
---|
| 73 | * windows-1251
|
---|
| 74 | * windows-1252
|
---|
| 75 | * windows-1253
|
---|
| 76 | * windows-1254
|
---|
| 77 | * windows-1255
|
---|
| 78 | * windows-1256
|
---|
| 79 | * KOI8-R
|
---|
| 80 |
|
---|
| 81 | Currently only these encodings are supported, more will be added soon.
|
---|