[6a3a178] | 1 | # svg/sax
|
---|
| 2 |
|
---|
| 3 | A maintained fork of [sax-js](https://github.com/isaacs/sax-js) sax-style parser for XML and HTML.
|
---|
| 4 |
|
---|
| 5 | Designed with [node](http://nodejs.org/) in mind, but should work fine in
|
---|
| 6 | the browser or other CommonJS implementations.
|
---|
| 7 |
|
---|
| 8 | ## What This Is
|
---|
| 9 |
|
---|
| 10 | * A very simple tool to parse through an XML string.
|
---|
| 11 | * A stepping stone to a streaming HTML parser.
|
---|
| 12 | * A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML
|
---|
| 13 | docs.
|
---|
| 14 |
|
---|
| 15 | ## What This Is (probably) Not
|
---|
| 16 |
|
---|
| 17 | * An HTML Parser - That's a fine goal, but this isn't it. It's just
|
---|
| 18 | XML.
|
---|
| 19 | * A DOM Builder - You can use it to build an object model out of XML,
|
---|
| 20 | but it doesn't do that out of the box.
|
---|
| 21 | * XSLT - No DOM = no querying.
|
---|
| 22 | * 100% Compliant with (some other SAX implementation) - Most SAX
|
---|
| 23 | implementations are in Java and do a lot more than this does.
|
---|
| 24 | * An XML Validator - It does a little validation when in strict mode, but
|
---|
| 25 | not much.
|
---|
| 26 | * A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic
|
---|
| 27 | masochism.
|
---|
| 28 | * A DTD-aware Thing - Fetching DTDs is a much bigger job.
|
---|
| 29 |
|
---|
| 30 | ## Regarding `<!DOCTYPE`s and `<!ENTITY`s
|
---|
| 31 |
|
---|
| 32 | The parser will handle the basic XML entities in text nodes and attribute
|
---|
| 33 | values: `& < > ' "`. It's possible to define additional
|
---|
| 34 | entities in XML by putting them in the DTD. This parser doesn't do anything
|
---|
| 35 | with that. If you want to listen to the `ondoctype` event, and then fetch
|
---|
| 36 | the doctypes, and read the entities and add them to `parser.ENTITIES`, then
|
---|
| 37 | be my guest.
|
---|
| 38 |
|
---|
| 39 | Unknown entities will fail in strict mode, and in loose mode, will pass
|
---|
| 40 | through unmolested.
|
---|
| 41 |
|
---|
| 42 | ## Usage
|
---|
| 43 |
|
---|
| 44 | ```javascript
|
---|
| 45 | var sax = require("./lib/sax"),
|
---|
| 46 | strict = true, // set to false for html-mode
|
---|
| 47 | parser = sax.parser(strict);
|
---|
| 48 |
|
---|
| 49 | parser.onerror = function (e) {
|
---|
| 50 | // an error happened.
|
---|
| 51 | };
|
---|
| 52 | parser.ontext = function (t) {
|
---|
| 53 | // got some text. t is the string of text.
|
---|
| 54 | };
|
---|
| 55 | parser.onopentag = function (node) {
|
---|
| 56 | // opened a tag. node has "name" and "attributes"
|
---|
| 57 | };
|
---|
| 58 | parser.onattribute = function (attr) {
|
---|
| 59 | // an attribute. attr has "name" and "value"
|
---|
| 60 | };
|
---|
| 61 | parser.onend = function () {
|
---|
| 62 | // parser stream is done, and ready to have more stuff written to it.
|
---|
| 63 | };
|
---|
| 64 |
|
---|
| 65 | parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();
|
---|
| 66 | ```
|
---|
| 67 |
|
---|
| 68 |
|
---|
| 69 | ## Arguments
|
---|
| 70 |
|
---|
| 71 | Pass the following arguments to the parser function. All are optional.
|
---|
| 72 |
|
---|
| 73 | `strict` - Boolean. Whether or not to be a jerk. Default: `false`.
|
---|
| 74 |
|
---|
| 75 | `opt` - Object bag of settings regarding string formatting. All default to `false`.
|
---|
| 76 |
|
---|
| 77 | Settings supported:
|
---|
| 78 |
|
---|
| 79 | * `trim` - Boolean. Whether or not to trim text and comment nodes.
|
---|
| 80 | * `normalize` - Boolean. If true, then turn any whitespace into a single
|
---|
| 81 | space.
|
---|
| 82 | * `lowercase` - Boolean. If true, then lowercase tag names and attribute names
|
---|
| 83 | in loose mode, rather than uppercasing them.
|
---|
| 84 | * `xmlns` - Boolean. If true, then namespaces are supported.
|
---|
| 85 | * `position` - Boolean. If false, then don't track line/col/position.
|
---|
| 86 | * `strictEntities` - Boolean. If true, only parse [predefined XML
|
---|
| 87 | entities](http://www.w3.org/TR/REC-xml/#sec-predefined-ent)
|
---|
| 88 | (`&`, `'`, `>`, `<`, and `"`)
|
---|
| 89 |
|
---|
| 90 | ## Methods
|
---|
| 91 |
|
---|
| 92 | `write` - Write bytes onto the stream. You don't have to do this all at
|
---|
| 93 | once. You can keep writing as much as you want.
|
---|
| 94 |
|
---|
| 95 | `close` - Close the stream. Once closed, no more data may be written until
|
---|
| 96 | it is done processing the buffer, which is signaled by the `end` event.
|
---|
| 97 |
|
---|
| 98 | `resume` - To gracefully handle errors, assign a listener to the `error`
|
---|
| 99 | event. Then, when the error is taken care of, you can call `resume` to
|
---|
| 100 | continue parsing. Otherwise, the parser will not continue while in an error
|
---|
| 101 | state.
|
---|
| 102 |
|
---|
| 103 | ## Members
|
---|
| 104 |
|
---|
| 105 | At all times, the parser object will have the following members:
|
---|
| 106 |
|
---|
| 107 | `line`, `column`, `position` - Indications of the position in the XML
|
---|
| 108 | document where the parser currently is looking.
|
---|
| 109 |
|
---|
| 110 | `startTagPosition` - Indicates the position where the current tag starts.
|
---|
| 111 |
|
---|
| 112 | `closed` - Boolean indicating whether or not the parser can be written to.
|
---|
| 113 | If it's `true`, then wait for the `ready` event to write again.
|
---|
| 114 |
|
---|
| 115 | `strict` - Boolean indicating whether or not the parser is a jerk.
|
---|
| 116 |
|
---|
| 117 | `opt` - Any options passed into the constructor.
|
---|
| 118 |
|
---|
| 119 | `tag` - The current tag being dealt with.
|
---|
| 120 |
|
---|
| 121 | And a bunch of other stuff that you probably shouldn't touch.
|
---|
| 122 |
|
---|
| 123 | ## Events
|
---|
| 124 |
|
---|
| 125 | All events emit with a single argument. To listen to an event, assign a
|
---|
| 126 | function to `on<eventname>`. Functions get executed in the this-context of
|
---|
| 127 | the parser object. The list of supported events are also in the exported
|
---|
| 128 | `EVENTS` array.
|
---|
| 129 |
|
---|
| 130 | `error` - Indication that something bad happened. The error will be hanging
|
---|
| 131 | out on `parser.error`, and must be deleted before parsing can continue. By
|
---|
| 132 | listening to this event, you can keep an eye on that kind of stuff. Note:
|
---|
| 133 | this happens *much* more in strict mode. Argument: instance of `Error`.
|
---|
| 134 |
|
---|
| 135 | `text` - Text node. Argument: string of text.
|
---|
| 136 |
|
---|
| 137 | `doctype` - The `<!DOCTYPE` declaration. Argument: doctype string.
|
---|
| 138 |
|
---|
| 139 | `processinginstruction` - Stuff like `<?xml foo="blerg" ?>`. Argument:
|
---|
| 140 | object with `name` and `body` members. Attributes are not parsed, as
|
---|
| 141 | processing instructions have implementation dependent semantics.
|
---|
| 142 |
|
---|
| 143 | `sgmldeclaration` - Random SGML declarations. Stuff like `<!ENTITY p>`
|
---|
| 144 | would trigger this kind of event. This is a weird thing to support, so it
|
---|
| 145 | might go away at some point. SAX isn't intended to be used to parse SGML,
|
---|
| 146 | after all.
|
---|
| 147 |
|
---|
| 148 | `opentagstart` - Emitted immediately when the tag name is available,
|
---|
| 149 | but before any attributes are encountered. Argument: object with a
|
---|
| 150 | `name` field and an empty `attributes` set. Note that this is the
|
---|
| 151 | same object that will later be emitted in the `opentag` event.
|
---|
| 152 |
|
---|
| 153 | `opentag` - An opening tag. Argument: object with `name` and `attributes`.
|
---|
| 154 | In non-strict mode, tag names are uppercased, unless the `lowercase`
|
---|
| 155 | option is set. If the `xmlns` option is set, then it will contain
|
---|
| 156 | namespace binding information on the `ns` member, and will have a
|
---|
| 157 | `local`, `prefix`, and `uri` member.
|
---|
| 158 |
|
---|
| 159 | `closetag` - A closing tag. In loose mode, tags are auto-closed if their
|
---|
| 160 | parent closes. In strict mode, well-formedness is enforced. Note that
|
---|
| 161 | self-closing tags will have `closeTag` emitted immediately after `openTag`.
|
---|
| 162 | Argument: tag name.
|
---|
| 163 |
|
---|
| 164 | `attribute` - An attribute node. Argument: object with `name` and `value`.
|
---|
| 165 | In non-strict mode, attribute names are uppercased, unless the `lowercase`
|
---|
| 166 | option is set. If the `xmlns` option is set, it will also contains namespace
|
---|
| 167 | information.
|
---|
| 168 |
|
---|
| 169 | `comment` - A comment node. Argument: the string of the comment.
|
---|
| 170 |
|
---|
| 171 | `opencdata` - The opening tag of a `<![CDATA[` block.
|
---|
| 172 |
|
---|
| 173 | `cdata` - The text of a `<![CDATA[` block. Since `<![CDATA[` blocks can get
|
---|
| 174 | quite large, this event may fire multiple times for a single block, if it
|
---|
| 175 | is broken up into multiple `write()`s. Argument: the string of random
|
---|
| 176 | character data.
|
---|
| 177 |
|
---|
| 178 | `closecdata` - The closing tag (`]]>`) of a `<![CDATA[` block.
|
---|
| 179 |
|
---|
| 180 | `opennamespace` - If the `xmlns` option is set, then this event will
|
---|
| 181 | signal the start of a new namespace binding.
|
---|
| 182 |
|
---|
| 183 | `closenamespace` - If the `xmlns` option is set, then this event will
|
---|
| 184 | signal the end of a namespace binding.
|
---|
| 185 |
|
---|
| 186 | `end` - Indication that the closed stream has ended.
|
---|
| 187 |
|
---|
| 188 | `ready` - Indication that the stream has reset, and is ready to be written
|
---|
| 189 | to.
|
---|
| 190 |
|
---|
| 191 | `noscript` - In non-strict mode, `<script>` tags trigger a `"script"`
|
---|
| 192 | event, and their contents are not checked for special xml characters.
|
---|
| 193 | If you pass `noscript: true`, then this behavior is suppressed.
|
---|
| 194 |
|
---|
| 195 | ## Reporting Problems
|
---|
| 196 |
|
---|
| 197 | It's best to write a failing test if you find an issue. I will always
|
---|
| 198 | accept pull requests with failing tests if they demonstrate intended
|
---|
| 199 | behavior, but it is very hard to figure out what issue you're describing
|
---|
| 200 | without a test. Writing a test is also the best way for you yourself
|
---|
| 201 | to figure out if you really understand the issue you think you have with
|
---|
| 202 | sax-js.
|
---|