Package Home

Zend Framework 2 Documentation (Manual)

PHK Home

File: /_sources/modules/zend.dom.query.txt

Size:6963
Storage flags:no_autoload,compress/gzip (36%)

.. _zend.dom.query:

Zend\\Dom\\Query
================

``Zend\Dom\Query`` provides mechanisms for querying *XML* and (X) *HTML* documents utilizing either XPath or *CSS*
selectors. It was developed to aid with functional testing of *MVC* applications, but could also be used for rapid
development of screen scrapers.

*CSS* selector notation is provided as a simpler and more familiar notation for web developers to utilize when
querying documents with *XML* structures. The notation should be familiar to anybody who has developed Cascading
Style Sheets or who utilizes Javascript toolkits that provide functionality for selecting nodes utilizing *CSS*
selectors (`Prototype's $$()`_ and `Dojo's dojo.query`_ were both inspirations for the component).

.. _zend.dom.query.operation:

Theory of Operation
-------------------

To use ``Zend\Dom\Query``, you instantiate a ``Zend\Dom\Query`` object, optionally passing a document to query (a
string). Once you have a document, you can use either the ``execute()`` or ``queryXpath()`` methods; each method will
return a ``Zend\Dom\NodeList`` object with any matching nodes.

The primary difference between ``Zend\Dom\Query`` and using `DOMDocument`_ + `DOMXPath`_ is the ability to select
against *CSS* selectors. You can utilize any of the following, in any combination:

- **element types**: provide an element type to match: 'div', 'a', 'span', 'h2', etc.

- **style attributes**: *CSS* style attributes to match: '``.error``', '``div.error``', '``label.required``', etc.
  If an element defines more than one style, this will match as long as the named style is present anywhere in the
  style declaration.

- **id attributes**: element ID attributes to match: '#content', 'div#nav', etc.

- **arbitrary attributes**: arbitrary element attributes to match. Three different types of matching are provided:

  - **exact match**: the attribute exactly matches the string: 'div[bar="baz"]' would match a div element with a
    "bar" attribute that exactly matches the value "baz".

  - **word match**: the attribute contains a word matching the string: 'div[bar~="baz"]' would match a div element
    with a "bar" attribute that contains the word "baz". '<div bar="foo baz">' would match, but '<div bar="foo
    bazbat">' would not.

  - **substring match**: the attribute contains the string: 'div[bar*="baz"]' would match a div element with a
    "bar" attribute that contains the string "baz" anywhere within it.

- **direct descendents**: utilize '>' between selectors to denote direct descendents. 'div > span' would select
  only 'span' elements that are direct descendents of a 'div'. Can also be used with any of the selectors above.

- **descendents**: string together multiple selectors to indicate a hierarchy along which to search. '``div .foo
  span #one``' would select an element of id 'one' that is a descendent of arbitrary depth beneath a 'span'
  element, which is in turn a descendent of arbitrary depth beneath an element with a class of 'foo', that is an
  descendent of arbitrary depth beneath a 'div' element. For example, it would match the link to the word 'One' in
  the listing below:

  .. code-block:: html
     :linenos:

     <div>
     <table>
         <tr>
             <td class="foo">
                 <div>
                     Lorem ipsum <span class="bar">
                         <a href="/foo/bar" id="one">One</a>
                         <a href="/foo/baz" id="two">Two</a>
                         <a href="/foo/bat" id="three">Three</a>
                         <a href="/foo/bla" id="four">Four</a>
                     </span>
                 </div>
             </td>
         </tr>
     </table>
     </div>

Once you've performed your query, you can then work with the result object to determine information about the
nodes, as well as to pull them and/or their content directly for examination and manipulation.
``Zend\Dom\NodeList`` implements ``Countable`` and ``Iterator``, and stores the results internally as a
`DOMDocument`_ and `DOMNodeList`_. As an example, consider the following call, that selects against the *HTML*
above:

.. code-block:: php
   :linenos:

   use Zend\Dom\Query;

   $dom = new Query($html);
   $results = $dom->execute('.foo .bar a');

   $count = count($results); // get number of matches: 4
   foreach ($results as $result) {
       // $result is a DOMElement
   }

``Zend\Dom\Query`` also allows straight XPath queries utilizing the ``queryXpath()`` method; you can pass any valid
XPath query to this method, and it will return a ``Zend\Dom\NodeList`` object.

.. _zend.dom.query.methods:

Methods Available
-----------------

The ``Zend\Dom\Query`` family of classes have the following methods available.

.. _zend.dom.query.methods.zenddomquery:

Zend\\Dom\\Query
^^^^^^^^^^^^^^^^

The following methods are available to ``Zend\Dom\Query``:

- ``setDocumentXml($document, $encoding = null)``: specify an *XML* string to query against.

- ``setDocumentXhtml($document, $encoding = null)``: specify an *XHTML* string to query against.

- ``setDocumentHtml($document, $encoding = null)``: specify an *HTML* string to query against.

- ``setDocument($document, $encoding = null)``: specify a string to query against; ``Zend\Dom\Query`` will then
  attempt to autodetect the document type.

- ``setEncoding($encoding)``: specify an encoding string to use. This encoding will be passed to `DOMDocument's
  constructor`_ if specified.

- ``getDocument()``: retrieve the original document string provided to the object.

- ``getDocumentType()``: retrieve the document type of the document provided to the object; will be one of the
  ``DOC_XML``, ``DOC_XHTML``, or ``DOC_HTML`` class constants.

- ``getEncoding()``: retrieves the specified encoding.

- ``execute($query)``: query the document using *CSS* selector notation.

- ``queryXpath($xPathQuery)``: query the document using XPath notation.

.. _zend.dom.query.methods.zenddomnodelist:

Zend\\Dom\\NodeList
^^^^^^^^^^^^^^^^^^^

As mentioned previously, ``Zend\Dom\NodeList`` implements both ``Iterator`` and ``Countable``, and as such can be
used in a ``foreach()`` loop as well as with the ``count()`` function. Additionally, it exposes the following
methods:

- ``getCssQuery()``: return the *CSS* selector query used to produce the result (if any).

- ``getXpathQuery()``: return the XPath query used to produce the result. Internally, ``Zend\Dom\Query`` converts
  *CSS* selector queries to XPath, so this value will always be populated.

- ``getDocument()``: retrieve the DOMDocument the selection was made against.



.. _`Prototype's $$()`: http://prototypejs.org/api/utility/dollar-dollar
.. _`Dojo's dojo.query`: http://api.dojotoolkit.org/jsdoc/dojo/HEAD/dojo.query
.. _`DOMDocument`: http://php.net/domdocument
.. _`DOMXPath`: http://php.net/domxpath
.. _`DOMNodeList`: http://php.net/domnodelist
.. _`DOMDocument's constructor`: http://php.net/domdocument.construct

For more information about the PHK package format: http://phk.tekwire.net