source: node_modules/autolinker/dist/commonjs/htmlParser/parse-html.d.ts

main
Last change on this file was d24f17c, checked in by Aleksandar Panovski <apano77@…>, 15 months ago

Initial commit

  • Property mode set to 100644
File size: 2.6 KB
RevLine 
[d24f17c]1/**
2 * Parses an HTML string, calling the callbacks to notify of tags and text.
3 *
4 * ## History
5 *
6 * This file previously used a regular expression to find html tags in the input
7 * text. Unfortunately, we ran into a bunch of catastrophic backtracking issues
8 * with certain input text, causing Autolinker to either hang or just take a
9 * really long time to parse the string.
10 *
11 * The current code is intended to be a O(n) algorithm that walks through
12 * the string in one pass, and tries to be as cheap as possible. We don't need
13 * to implement the full HTML spec, but rather simply determine where the string
14 * looks like an HTML tag, and where it looks like text (so that we can autolink
15 * that).
16 *
17 * This state machine parser is intended just to be a simple but performant
18 * parser of HTML for the subset of requirements we have. We simply need to:
19 *
20 * 1. Determine where HTML tags are
21 * 2. Determine the tag name (Autolinker specifically only cares about <a>,
22 * <script>, and <style> tags, so as not to link any text within them)
23 *
24 * We don't need to:
25 *
26 * 1. Create a parse tree
27 * 2. Auto-close tags with invalid markup
28 * 3. etc.
29 *
30 * The other intention behind this is that we didn't want to add external
31 * dependencies on the Autolinker utility which would increase its size. For
32 * instance, adding htmlparser2 adds 125kb to the minified output file,
33 * increasing its final size from 47kb to 172kb (at the time of writing). It
34 * also doesn't work exactly correctly, treating the string "<3 blah blah blah"
35 * as an HTML tag.
36 *
37 * Reference for HTML spec:
38 *
39 * https://www.w3.org/TR/html51/syntax.html#sec-tokenization
40 *
41 * @param {String} html The HTML to parse
42 * @param {Object} callbacks
43 * @param {Function} callbacks.onOpenTag Callback function to call when an open
44 * tag is parsed. Called with the tagName as its argument.
45 * @param {Function} callbacks.onCloseTag Callback function to call when a close
46 * tag is parsed. Called with the tagName as its argument. If a self-closing
47 * tag is found, `onCloseTag` is called immediately after `onOpenTag`.
48 * @param {Function} callbacks.onText Callback function to call when text (i.e
49 * not an HTML tag) is parsed. Called with the text (string) as its first
50 * argument, and offset (number) into the string as its second.
51 */
52export declare function parseHtml(html: string, { onOpenTag, onCloseTag, onText, onComment, onDoctype, }: {
53 onOpenTag: (tagName: string, offset: number) => void;
54 onCloseTag: (tagName: string, offset: number) => void;
55 onText: (text: string, offset: number) => void;
56 onComment: (offset: number) => void;
57 onDoctype: (offset: number) => void;
58}): void;
Note: See TracBrowser for help on using the repository browser.