Class SimpleHtmlSaxParser

Description

Converts HTML tokens into selected SAX events.

Located in /parser.php (line 543)


	
			
Variable Summary
 mixed $_attributes
 mixed $_lexer
 mixed $_listener
 mixed $_tag
Method Summary
 static SimpleLexer &createLexer ( &$parser, SimpleSaxParser $parser)
 static string decodeHtml (string $html)
 static string normalise (string $html)
 SimpleHtmlSaxParser SimpleHtmlSaxParser ( &$listener, SimpleSaxListener $listener)
 boolean acceptAttributeToken (string $token, integer $event)
 boolean acceptEndToken (string $token, integer $event)
 boolean acceptEntityToken (string $token, integer $event)
 boolean acceptStartToken (string $token, integer $event)
 boolean acceptTextToken (string $token, integer $event)
 boolean ignore (string $token, integer $event)
 boolean parse (string $raw)
Variables
mixed $_attributes (line 547)
mixed $_current_attribute (line 548)
mixed $_lexer (line 544)
mixed $_listener (line 545)
mixed $_tag (line 546)
Methods
static createLexer (line 581)

Sets up the matching lexer. Starts in 'text' mode.

  • return: Lexer suitable for this parser.
  • access: public
static SimpleLexer &createLexer ( &$parser, SimpleSaxParser $parser)
  • SimpleSaxParser $parser: Event generator, usually $self.
  • &$parser
static decodeHtml (line 693)

Decodes any HTML entities.

  • return: Outgoing plain text.
  • access: public
static string decodeHtml (string $html)
  • string $html: Incoming HTML.
static normalise (line 706)

Turns HTML into text browser visible text. Images are converted to their alt text and tags are supressed.

Entities are converted to their visible representation.

  • return: Plain text.
  • access: public
static string normalise (string $html)
  • string $html: HTML to convert.
Constructor SimpleHtmlSaxParser (line 555)

Sets the listener.

  • access: public
SimpleHtmlSaxParser SimpleHtmlSaxParser ( &$listener, SimpleSaxListener $listener)
acceptAttributeToken (line 639)

Part of the tag data.

  • return: False if parse error.
  • access: public
boolean acceptAttributeToken (string $token, integer $event)
  • string $token: Incoming characters.
  • integer $event: Lexer event type.
acceptEndToken (line 625)

Accepts a token from the end tag mode.

The element name is converted to lower case.

  • return: False if parse error.
  • access: public
boolean acceptEndToken (string $token, integer $event)
  • string $token: Incoming characters.
  • integer $event: Lexer event type.
acceptEntityToken (line 660)

A character entity.

  • return: False if parse error.
  • access: public
boolean acceptEntityToken (string $token, integer $event)
  • string $token: Incoming characters.
  • integer $event: Lexer event type.
acceptStartToken (line 597)

Accepts a token from the tag mode. If the

starting element completes then the element is dispatched and the current attributes set back to empty. The element or attribute name is converted to lower case.

  • return: False if parse error.
  • access: public
boolean acceptStartToken (string $token, integer $event)
  • string $token: Incoming characters.
  • integer $event: Lexer event type.
acceptTextToken (line 671)

Character data between tags regarded as important.

  • return: False if parse error.
  • access: public
boolean acceptTextToken (string $token, integer $event)
  • string $token: Incoming characters.
  • integer $event: Lexer event type.
ignore (line 682)

Incoming data to be ignored.

  • return: False if parse error.
  • access: public
boolean ignore (string $token, integer $event)
  • string $token: Incoming characters.
  • integer $event: Lexer event type.
parse (line 570)

Runs the content through the lexer which should call back to the acceptors.

  • return: False if parse error.
  • access: public
boolean parse (string $raw)
  • string $raw: Page text to parse.

Documentation generated on Sun, 04 May 2008 09:21:55 -0500 by phpDocumentor 1.3.0