lxml – Extractors for XML or HTML data extracting.

class data_extractor.lxml.AttrCSSExtractor(expr: str, attr: str)

Bases: data_extractor.abc.SimpleExtractorBase

Use CSS Selector for XML or HTML data subelements’ attribute value extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters
  • expr – CSS Selector Expression.

  • attr – Target attribute name.

extract(root: lxml.etree._Element) → List[str]

Extract subelements’ attribute value from XML or HTML data.

Parameters

elementdata_extractor.lxml.Element object.

Returns

Data or subelement.

Raises

data_extractor.exceptions.ExprError – CSS Selector Expression Error.

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
Returns

Data or subelement.

Raises

data_extractor.exceptions.ExtractError – Thrown by extractor extracting wrong data.

class data_extractor.lxml.CSSExtractor(expr: str)

Bases: data_extractor.abc.SimpleExtractorBase

Use CSS Selector for XML or HTML data subelements extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters

expr – CSS Selector Expression.

extract(element: lxml.etree._Element) → List[lxml.etree._Element]

Extract subelements from XML or HTML data.

Parameters

elementdata_extractor.lxml.Element object.

Returns

Data or subelement.

Raises

data_extractor.exceptions.ExprError – CSS Selector Expression Error.

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
Returns

Data or subelement.

Raises

data_extractor.exceptions.ExtractError – Thrown by extractor extracting wrong data.

data_extractor.lxml.Element

alias of lxml.etree._Element

class data_extractor.lxml.TextCSSExtractor(expr: str)

Bases: data_extractor.abc.SimpleExtractorBase

Use CSS Selector for XML or HTML data subelements’ text extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters

expr – CSS Selector Expression.

extract(element: lxml.etree._Element) → List[str]

Extract subelements’ text from XML or HTML data.

Parameters

elementdata_extractor.lxml.Element object.

Returns

Data or subelement.

Raises

data_extractor.exceptions.ExprError – CSS Selector Expression Error.

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
Returns

Data or subelement.

Raises

data_extractor.exceptions.ExtractError – Thrown by extractor extracting wrong data.

class data_extractor.lxml.XPathExtractor(expr: str)

Bases: data_extractor.abc.SimpleExtractorBase

Use XPath for XML or HTML data extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters

expr – XPath Expression.

extract(element: lxml.etree._Element) → Union[List[lxml.etree._Element], List[str]]

Extract subelements or data from XML or HTML data.

Parameters

elementdata_extractor.lxml.Element object.

Returns

Data or subelement.

Raises

data_extractor.exceptions.ExprError – XPath Expression Error.

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
Returns

Data or subelement.

Raises

data_extractor.exceptions.ExtractError – Thrown by extractor extracting wrong data.