libxml2
Loading...
Searching...
No Matches
HTMLparser.h File Reference

HTML parser, doesn't support HTML5. More...

Typedefs

typedef xmlParserCtxt htmlParserCtxt
 Same as xmlParserCtxt.
typedef xmlSAXHandler htmlSAXHandler
 Same as xmlSAXHandler.
typedef xmlParserInput htmlParserInput
 Same as xmlParserInput.

Enumerations

enum  htmlParserOption
 This is the set of HTML parser options that can be passed to htmlReadDoc, htmlCtxtSetOptions and other functions. More...
enum  htmlStatus
 deprecated content model

Functions

void htmlInitAutoClose (void)
const htmlElemDesc * htmlTagLookup (const xmlChar *tag)
 Lookup the HTML tag in the ElementTable.
const htmlEntityDesc * htmlEntityLookup (const xmlChar *name)
 Lookup the given entity in EntitiesTable.
const htmlEntityDesc * htmlEntityValueLookup (unsigned int value)
 Lookup the given entity in EntitiesTable.
int htmlIsAutoClosed (xmlDoc *doc, xmlNode *elem)
 The HTML DTD allows a tag to implicitly close other tags.
int htmlAutoCloseTag (xmlDoc *doc, const xmlChar *name, xmlNode *elem)
 The HTML DTD allows a tag to implicitly close other tags.
const htmlEntityDesc * htmlParseEntityRef (htmlParserCtxt *ctxt, const xmlChar **str)
int htmlParseCharRef (htmlParserCtxt *ctxt)
void htmlParseElement (htmlParserCtxt *ctxt)
 This is kept for compatibility with previous code versions.
htmlParserCtxthtmlNewParserCtxt (void)
 Allocate and initialize a new HTML parser context.
htmlParserCtxthtmlNewSAXParserCtxt (const htmlSAXHandler *sax, void *userData)
 Allocate and initialize a new HTML SAX parser context.
htmlParserCtxthtmlCreateMemoryParserCtxt (const char *buffer, int size)
 Create a parser context for an HTML in-memory document.
int htmlParseDocument (htmlParserCtxt *ctxt)
 Parse an HTML document and invoke the SAX handlers.
xmlDochtmlSAXParseDoc (const xmlChar *cur, const char *encoding, htmlSAXHandler *sax, void *userData)
 Parse an HTML in-memory document.
xmlDochtmlParseDoc (const xmlChar *cur, const char *encoding)
 Parse an HTML in-memory document and build a tree.
htmlParserCtxthtmlCreateFileParserCtxt (const char *filename, const char *encoding)
 Create a parser context to read from a file.
xmlDochtmlSAXParseFile (const char *filename, const char *encoding, htmlSAXHandler *sax, void *userData)
 parse an HTML file and build a tree.
xmlDochtmlParseFile (const char *filename, const char *encoding)
 Parse an HTML file and build a tree.
int htmlUTF8ToHtml (unsigned char *out, int *outlen, const unsigned char *in, int *inlen)
 Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.
int htmlEncodeEntities (unsigned char *out, int *outlen, const unsigned char *in, int *inlen, int quoteChar)
 Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.
int htmlIsScriptAttribute (const xmlChar *name)
 Check if an attribute is of content type Script.
int htmlHandleOmittedElem (int val)
 Set and return the previous value for handling HTML omitted tags.
htmlParserCtxthtmlCreatePushParserCtxt (htmlSAXHandler *sax, void *user_data, const char *chunk, int size, const char *filename, xmlCharEncoding enc)
 Create a parser context for using the HTML parser in push mode.
int htmlParseChunk (htmlParserCtxt *ctxt, const char *chunk, int size, int terminate)
 Parse a chunk of memory in push parser mode.
void htmlFreeParserCtxt (htmlParserCtxt *ctxt)
 Free all the memory used by a parser context.
void htmlCtxtReset (htmlParserCtxt *ctxt)
 Reset a parser context.
int htmlCtxtSetOptions (htmlParserCtxt *ctxt, int options)
 Applies the options to the parser context.
int htmlCtxtUseOptions (htmlParserCtxt *ctxt, int options)
 Applies the options to the parser context.
xmlDochtmlReadDoc (const xmlChar *cur, const char *URL, const char *encoding, int options)
 Convenience function to parse an HTML document from a zero-terminated string.
xmlDochtmlReadFile (const char *URL, const char *encoding, int options)
 Convenience function to parse an HTML file from the filesystem, the network or a global user-defined resource loader.
xmlDochtmlReadMemory (const char *buffer, int size, const char *URL, const char *encoding, int options)
 Convenience function to parse an HTML document from memory.
xmlDochtmlReadFd (int fd, const char *URL, const char *encoding, int options)
 Convenience function to parse an HTML document from a file descriptor.
xmlDochtmlReadIO (xmlInputReadCallback ioread, xmlInputCloseCallback ioclose, void *ioctx, const char *URL, const char *encoding, int options)
 Convenience function to parse an HTML document from I/O functions and context.
xmlDochtmlCtxtParseDocument (htmlParserCtxt *ctxt, xmlParserInput *input)
 Parse an HTML document and return the resulting document tree.
xmlDochtmlCtxtReadDoc (xmlParserCtxt *ctxt, const xmlChar *cur, const char *URL, const char *encoding, int options)
 Parse an HTML in-memory document and build a tree.
xmlDochtmlCtxtReadFile (xmlParserCtxt *ctxt, const char *filename, const char *encoding, int options)
 Parse an HTML file from the filesystem, the network or a user-defined resource loader.
xmlDochtmlCtxtReadMemory (xmlParserCtxt *ctxt, const char *buffer, int size, const char *URL, const char *encoding, int options)
 Parse an HTML in-memory document and build a tree.
xmlDochtmlCtxtReadFd (xmlParserCtxt *ctxt, int fd, const char *URL, const char *encoding, int options)
 Parse an HTML from a file descriptor and build a tree.
xmlDochtmlCtxtReadIO (xmlParserCtxt *ctxt, xmlInputReadCallback ioread, xmlInputCloseCallback ioclose, void *ioctx, const char *URL, const char *encoding, int options)
 Parse an HTML document from I/O functions and source and build a tree.
htmlStatus htmlAttrAllowed (const htmlElemDesc *, const xmlChar *, int)
int htmlElementAllowedHere (const htmlElemDesc *, const xmlChar *)
htmlStatus htmlElementStatusHere (const htmlElemDesc *, const htmlElemDesc *)
htmlStatus htmlNodeStatus (xmlNode *, int)

Detailed Description

HTML parser, doesn't support HTML5.

This module orginally implemented an HTML parser based on the (underspecified) HTML 4.0 spec. As of 2.14, the tokenizer conforms to HTML5. Tree construction still follows a custom, unspecified algorithm with many differences to HTML5.

The parser defaults to ISO-8859-1, the default encoding of HTTP/1.0.

Author
Daniel Veillard

Enumeration Type Documentation

◆ htmlParserOption

This is the set of HTML parser options that can be passed to htmlReadDoc, htmlCtxtSetOptions and other functions.

Enumerator
HTML_PARSE_RECOVER 

No effect as of 2.14.0.

HTML_PARSE_NODEFDTD 

Do not default to a doctype if none was found.

HTML_PARSE_NOERROR 

Disable error and warning reports to the error handlers.

Errors are still accessible with xmlCtxtGetLastError().

HTML_PARSE_NOWARNING 

Disable warning reports.

HTML_PARSE_PEDANTIC 

No effect.

HTML_PARSE_NOBLANKS 

Remove some text nodes containing only whitespace from the result document.

Which nodes are removed depends on a conservative heuristic. The reindenting feature of the serialization code relies on this option to be set when parsing. Use of this option is DISCOURAGED.

HTML_PARSE_NONET 

No effect.

HTML_PARSE_NOIMPLIED 

Do not add implied html, head or body elements.

HTML_PARSE_COMPACT 

Store small strings directly in the node struct to save memory.

HTML_PARSE_HUGE 

Relax some internal limits.

See XML_PARSE_HUGE in xmlParserOption.

Since
2.14.0

Use XML_PARSE_HUGE with older versions.

HTML_PARSE_IGNORE_ENC 

Ignore the encoding in the HTML declaration.

This option is mostly unneeded these days. The only effect is to enforce ISO-8859-1 decoding of ASCII-like data.

HTML_PARSE_BIG_LINES 

Enable reporting of line numbers larger than 65535.

Since
2.14.0

Use XML_PARSE_BIG_LINES with older versions.

HTML_PARSE_HTML5 

Make the tokenizer emit a SAX callback for each token.

This results in unbalanced invocations of startElement and endElement.

For now, this is only usable to tokenize HTML5 with custom SAX callbacks. A tree builder isn't implemented yet.

Since
2.14.0

Function Documentation

◆ htmlAttrAllowed()

htmlStatus htmlAttrAllowed ( const htmlElemDesc * elt,
const xmlChar * attr,
int legacy )
Deprecated
Don't use.
Parameters
eltHTML element
attrHTML attribute
legacywhether to allow deprecated attributes
Returns
HTML_VALID

◆ htmlAutoCloseTag()

int htmlAutoCloseTag ( xmlDoc * doc,
const xmlChar * name,
xmlNode * elem )

The HTML DTD allows a tag to implicitly close other tags.

The list is kept in htmlStartClose array. This function checks if the element or one of it's children would autoclose the given tag.

Deprecated
Internal function, don't use.
Parameters
docthe HTML document
nameThe tag name
elemthe HTML element
Returns
1 if autoclose, 0 otherwise

◆ htmlCreateFileParserCtxt()

htmlParserCtxt * htmlCreateFileParserCtxt ( const char * filename,
const char * encoding )

Create a parser context to read from a file.

Deprecated
Use htmlNewParserCtxt and htmlCtxtReadFile.

A non-NULL encoding overrides encoding declarations in the document.

Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time.

Parameters
filenamethe filename
encodingoptional encoding
Returns
the new parser context or NULL if a memory allocation failed.

◆ htmlCreateMemoryParserCtxt()

htmlParserCtxt * htmlCreateMemoryParserCtxt ( const char * buffer,
int size )

Create a parser context for an HTML in-memory document.

The input buffer must not contain any terminating null bytes.

Deprecated
Use htmlNewParserCtxt and htmlCtxtReadMemory.
Parameters
buffera pointer to a char array
sizethe size of the array
Returns
the new parser context or NULL

◆ htmlCreatePushParserCtxt()

htmlParserCtxt * htmlCreatePushParserCtxt ( htmlSAXHandler * sax,
void * user_data,
const char * chunk,
int size,
const char * filename,
xmlCharEncoding enc )

Create a parser context for using the HTML parser in push mode.

Parameters
saxa SAX handler (optional)
user_dataThe user data returned on SAX callbacks (optional)
chunka pointer to an array of chars (optional)
sizenumber of chars in the array
filenameonly used for error reporting (optional)
encencoding (deprecated, pass XML_CHAR_ENCODING_NONE)
Returns
the new parser context or NULL if a memory allocation failed.

◆ htmlCtxtParseDocument()

xmlDoc * htmlCtxtParseDocument ( htmlParserCtxt * ctxt,
xmlParserInput * input )

Parse an HTML document and return the resulting document tree.

Since
2.13.0
Parameters
ctxtan HTML parser context
inputparser input
Returns
the resulting document tree or NULL

◆ htmlCtxtReadDoc()

xmlDoc * htmlCtxtReadDoc ( xmlParserCtxt * ctxt,
const xmlChar * str,
const char * URL,
const char * encoding,
int options )

Parse an HTML in-memory document and build a tree.

See htmlCtxtUseOptions for details.

Parameters
ctxtan HTML parser context
stra pointer to a zero terminated string
URLonly used for error reporting (optional)
encodingthe document encoding (optional)
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlCtxtReadFd()

xmlDoc * htmlCtxtReadFd ( xmlParserCtxt * ctxt,
int fd,
const char * URL,
const char * encoding,
int options )

Parse an HTML from a file descriptor and build a tree.

See htmlCtxtUseOptions for details.

NOTE that the file descriptor will not be closed when the context is freed or reset.

Parameters
ctxtan HTML parser context
fdan open file descriptor
URLonly used for error reporting (optional)
encodingthe document encoding (optinal)
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlCtxtReadFile()

xmlDoc * htmlCtxtReadFile ( xmlParserCtxt * ctxt,
const char * filename,
const char * encoding,
int options )

Parse an HTML file from the filesystem, the network or a user-defined resource loader.

See htmlCtxtUseOptions for details.

Parameters
ctxtan HTML parser context
filenamea file or URL
encodingthe document encoding (optional)
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlCtxtReadIO()

xmlDoc * htmlCtxtReadIO ( xmlParserCtxt * ctxt,
xmlInputReadCallback ioread,
xmlInputCloseCallback ioclose,
void * ioctx,
const char * URL,
const char * encoding,
int options )

Parse an HTML document from I/O functions and source and build a tree.

See htmlCtxtUseOptions for details.

Parameters
ctxtan HTML parser context
ioreadan I/O read function
ioclosean I/O close function
ioctxan I/O handler
URLthe base URL to use for the document
encodingthe document encoding, or NULL
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlCtxtReadMemory()

xmlDoc * htmlCtxtReadMemory ( xmlParserCtxt * ctxt,
const char * buffer,
int size,
const char * URL,
const char * encoding,
int options )

Parse an HTML in-memory document and build a tree.

The input buffer must not contain any terminating null bytes.

See htmlCtxtUseOptions for details.

Parameters
ctxtan HTML parser context
buffera pointer to a char array
sizethe size of the array
URLonly used for error reporting (optional)
encodingthe document encoding (optinal)
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlCtxtReset()

void htmlCtxtReset ( htmlParserCtxt * ctxt)

Reset a parser context.

Same as xmlCtxtReset.

Parameters
ctxtan HTML parser context

◆ htmlCtxtSetOptions()

int htmlCtxtSetOptions ( htmlParserCtxt * ctxt,
int options )

Applies the options to the parser context.

Unset options are cleared.

Since
2.14.0

With older versions, you can use htmlCtxtUseOptions.

Parameters
ctxtan HTML parser context
optionsa bitmask of htmlParserOption values
Returns
0 in case of success, the set of unknown or unimplemented options in case of error.

◆ htmlCtxtUseOptions()

int htmlCtxtUseOptions ( htmlParserCtxt * ctxt,
int options )

Applies the options to the parser context.

The following options are never cleared and can only be enabled:

Deprecated
Use htmlCtxtSetOptions.
  • HTML_PARSE_NODEFDTD
  • HTML_PARSE_NOERROR
  • HTML_PARSE_NOWARNING
  • HTML_PARSE_NOIMPLIED
  • HTML_PARSE_COMPACT
  • HTML_PARSE_HUGE
  • HTML_PARSE_IGNORE_ENC
  • HTML_PARSE_BIG_LINES
Parameters
ctxtan HTML parser context
optionsa combination of htmlParserOption values
Returns
0 in case of success, the set of unknown or unimplemented options in case of error.

◆ htmlElementAllowedHere()

int htmlElementAllowedHere ( const htmlElemDesc * parent,
const xmlChar * elt )
Deprecated
Don't use.
Parameters
parentHTML parent element
eltHTML element
Returns
1

◆ htmlElementStatusHere()

htmlStatus htmlElementStatusHere ( const htmlElemDesc * parent,
const htmlElemDesc * elt )
Deprecated
Don't use.
Parameters
parentHTML parent element
eltHTML element
Returns
HTML_VALID

◆ htmlEncodeEntities()

int htmlEncodeEntities ( unsigned char * out,
int * outlen,
const unsigned char * in,
int * inlen,
int quoteChar )

Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.

Deprecated
Only supports HTML 4.
Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of UTF-8 chars
inlenthe length of in
quoteCharthe quote character to escape (' or ") or zero.
Returns
0 if success, -2 if the transcoding fails, or -1 otherwise The value of inlen after return is the number of octets consumed as the return value is positive, else unpredictable. The value of outlen after return is the number of octets consumed.

◆ htmlEntityLookup()

const htmlEntityDesc * htmlEntityLookup ( const xmlChar * name)

Lookup the given entity in EntitiesTable.

Deprecated
Only supports HTML 4.

TODO: the linear scan is really ugly, an hash table is really needed.

Parameters
namethe entity name
Returns
the associated htmlEntityDesc if found, NULL otherwise.

◆ htmlEntityValueLookup()

const htmlEntityDesc * htmlEntityValueLookup ( unsigned int value)

Lookup the given entity in EntitiesTable.

Deprecated
Only supports HTML 4.

TODO: the linear scan is really ugly, an hash table is really needed.

Parameters
valuethe entity's unicode value
Returns
the associated htmlEntityDesc if found, NULL otherwise.

◆ htmlFreeParserCtxt()

void htmlFreeParserCtxt ( htmlParserCtxt * ctxt)

Free all the memory used by a parser context.

However the parsed document in ctxt->myDoc is not freed.

Parameters
ctxtan HTML parser context

◆ htmlHandleOmittedElem()

int htmlHandleOmittedElem ( int val)

Set and return the previous value for handling HTML omitted tags.

Deprecated
Use HTML_PARSE_NOIMPLIED
Parameters
valint 0 or 1
Returns
the last value for 0 for no handling, 1 for auto insertion.

◆ htmlInitAutoClose()

void htmlInitAutoClose ( void )
Deprecated
This is a no-op.

◆ htmlIsAutoClosed()

int htmlIsAutoClosed ( xmlDoc * doc,
xmlNode * elem )

The HTML DTD allows a tag to implicitly close other tags.

The list is kept in htmlStartClose array. This function checks if a tag is autoclosed by one of it's child

Deprecated
Internal function, don't use.
Parameters
docthe HTML document
elemthe HTML element
Returns
1 if autoclosed, 0 otherwise

◆ htmlIsScriptAttribute()

int htmlIsScriptAttribute ( const xmlChar * name)

Check if an attribute is of content type Script.

Deprecated
Only supports HTML 4.
Parameters
namean attribute name
Returns
1 is the attribute is a script 0 otherwise

◆ htmlNewParserCtxt()

htmlParserCtxt * htmlNewParserCtxt ( void )

Allocate and initialize a new HTML parser context.

This can be used to parse HTML documents into DOM trees with functions like xmlCtxtReadFile or xmlCtxtReadMemory.

See htmlCtxtUseOptions for parser options.

See xmlCtxtSetErrorHandler for advanced error handling.

See htmlNewSAXParserCtxt for custom SAX parsers.

Returns
the htmlParserCtxt or NULL in case of allocation error

◆ htmlNewSAXParserCtxt()

htmlParserCtxt * htmlNewSAXParserCtxt ( const htmlSAXHandler * sax,
void * userData )

Allocate and initialize a new HTML SAX parser context.

If userData is NULL, the parser context will be passed as user data.

Since
2.11.0

If you want support older versions, it's best to invoke htmlNewParserCtxt and set ctxt->sax with struct assignment.

Also see htmlNewParserCtxt.

Parameters
saxSAX handler
userDatauser data
Returns
the htmlParserCtxt or NULL in case of allocation error

◆ htmlNodeStatus()

htmlStatus htmlNodeStatus ( xmlNode * node,
int legacy )
Deprecated
Don't use.
Parameters
nodean xmlNode in a tree
legacywhether to allow deprecated elements (YES is faster here for Element nodes)
Returns
HTML_VALID

◆ htmlParseCharRef()

int htmlParseCharRef ( htmlParserCtxt * ctxt)
Deprecated
Internal function, don't use.
Parameters
ctxtan HTML parser context
Returns
0

◆ htmlParseChunk()

int htmlParseChunk ( htmlParserCtxt * ctxt,
const char * chunk,
int size,
int terminate )

Parse a chunk of memory in push parser mode.

Assumes that the parser context was initialized with htmlCreatePushParserCtxt.

The last chunk, which will often be empty, must be marked with the terminate flag. With the default SAX callbacks, the resulting document will be available in ctxt->myDoc. This pointer will not be freed by the library.

If the document isn't well-formed, ctxt->myDoc is set to NULL.

Since 2.14.0, xmlCtxtGetDocument can be used to retrieve the result document.

Parameters
ctxtan HTML parser context
chunkchunk of memory
sizesize of chunk in bytes
terminatelast chunk indicator
Returns
an xmlParserErrors code (0 on success).

◆ htmlParseDoc()

xmlDoc * htmlParseDoc ( const xmlChar * cur,
const char * encoding )

Parse an HTML in-memory document and build a tree.

Deprecated
Use htmlReadDoc.

This function uses deprecated global parser options.

Parameters
cura pointer to an array of xmlChar
encodingthe encoding (optional)
Returns
the resulting document tree

◆ htmlParseDocument()

int htmlParseDocument ( htmlParserCtxt * ctxt)

Parse an HTML document and invoke the SAX handlers.

This is useful if you're only interested in custom SAX callbacks. If you want a document tree, use htmlCtxtParseDocument.

Parameters
ctxtan HTML parser context
Returns
0, -1 in case of error.

◆ htmlParseElement()

void htmlParseElement ( htmlParserCtxt * ctxt)

This is kept for compatibility with previous code versions.

Deprecated
Internal function, don't use.
Parameters
ctxtan HTML parser context

◆ htmlParseEntityRef()

const htmlEntityDesc * htmlParseEntityRef ( htmlParserCtxt * ctxt,
const xmlChar ** str )
Deprecated
Internal function, don't use.
Parameters
ctxtan HTML parser context
strlocation to store the entity name
Returns
NULL.

◆ htmlParseFile()

xmlDoc * htmlParseFile ( const char * filename,
const char * encoding )

Parse an HTML file and build a tree.

Parameters
filenamethe filename
encodingencoding (optional)
Returns
the resulting document tree

◆ htmlReadDoc()

xmlDoc * htmlReadDoc ( const xmlChar * str,
const char * url,
const char * encoding,
int options )

Convenience function to parse an HTML document from a zero-terminated string.

See htmlCtxtReadDoc for details.

Parameters
stra pointer to a zero terminated string
urlonly used for error reporting (optoinal)
encodingthe document encoding (optional)
optionsa combination of htmlParserOption values
Returns
the resulting document tree.

◆ htmlReadFd()

xmlDoc * htmlReadFd ( int fd,
const char * url,
const char * encoding,
int options )

Convenience function to parse an HTML document from a file descriptor.

NOTE that the file descriptor will not be closed when the context is freed or reset.

See htmlCtxtReadFd for details.

Parameters
fdan open file descriptor
urlonly used for error reporting (optional)
encodingthe document encoding, or NULL
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlReadFile()

xmlDoc * htmlReadFile ( const char * filename,
const char * encoding,
int options )

Convenience function to parse an HTML file from the filesystem, the network or a global user-defined resource loader.

See htmlCtxtReadFile for details.

Parameters
filenamea file or URL
encodingthe document encoding (optional)
optionsa combination of htmlParserOption values
Returns
the resulting document tree.

◆ htmlReadIO()

xmlDoc * htmlReadIO ( xmlInputReadCallback ioread,
xmlInputCloseCallback ioclose,
void * ioctx,
const char * url,
const char * encoding,
int options )

Convenience function to parse an HTML document from I/O functions and context.

See htmlCtxtReadIO for details.

Parameters
ioreadan I/O read function
ioclosean I/O close function (optional)
ioctxan I/O handler
urlonly used for error reporting (optional)
encodingthe document encoding (optional)
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlReadMemory()

xmlDoc * htmlReadMemory ( const char * buffer,
int size,
const char * url,
const char * encoding,
int options )

Convenience function to parse an HTML document from memory.

The input buffer must not contain any terminating null bytes.

See htmlCtxtReadMemory for details.

Parameters
buffera pointer to a char array
sizethe size of the array
urlonly used for error reporting (optional)
encodingthe document encoding, or NULL
optionsa combination of htmlParserOption values
Returns
the resulting document tree

◆ htmlSAXParseDoc()

xmlDoc * htmlSAXParseDoc ( const xmlChar * cur,
const char * encoding,
htmlSAXHandler * sax,
void * userData )

Parse an HTML in-memory document.

If sax is not NULL, use the SAX callbacks to handle parse events. If sax is NULL, fallback to the default DOM behavior and return a tree.

Deprecated
Use htmlNewSAXParserCtxt and htmlCtxtReadDoc.
Parameters
cura pointer to an array of xmlChar
encodinga free form C string describing the HTML document encoding, or NULL
saxthe SAX handler block
userDataif using SAX, this pointer will be provided on callbacks.
Returns
the resulting document tree unless SAX is NULL or the document is not well formed.

◆ htmlSAXParseFile()

xmlDoc * htmlSAXParseFile ( const char * filename,
const char * encoding,
htmlSAXHandler * sax,
void * userData )

parse an HTML file and build a tree.

Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time. It use the given SAX function block to handle the parsing callback. If sax is NULL, fallback to the default DOM tree building routines.

Deprecated
Use htmlNewSAXParserCtxt and htmlCtxtReadFile.
Parameters
filenamethe filename
encodingencoding (optional)
saxthe SAX handler block
userDataif using SAX, this pointer will be provided on callbacks.
Returns
the resulting document tree unless SAX is NULL or the document is not well formed.

◆ htmlTagLookup()

const htmlElemDesc * htmlTagLookup ( const xmlChar * tag)

Lookup the HTML tag in the ElementTable.

Deprecated
Only supports HTML 4.
Parameters
tagThe tag name in lowercase
Returns
the related htmlElemDesc or NULL if not found.

◆ htmlUTF8ToHtml()

int htmlUTF8ToHtml ( unsigned char * out,
int * outlen,
const unsigned char * in,
int * inlen )

Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.

Deprecated
Internal function, don't use.
Parameters
outa pointer to an array of bytes to store the result
outlenthe length of out
ina pointer to an array of UTF-8 chars
inlenthe length of in
Returns
0 if success, -2 if the transcoding fails, or -1 otherwise The value of inlen after return is the number of octets consumed as the return value is positive, else unpredictable. The value of outlen after return is the number of octets consumed.