.. _introduction: Introduction ============ `urlwatch` monitors the output of webpages or arbitrary shell commands. Every time you run `urlwatch`, it: - retrieves the output and processes it - compares it with the version retrieved the previous time ("diffing") - if it finds any differences, generates a summary "report" that can be displayed or sent via one or more methods, such as email :ref:`Jobs` ----------- Each website or shell command to be monitored consitutes a "job". The instructions for each such job are contained in a config file in the `YAML format`_, accessible with the ``urlwatch --edit`` command. If you get an error, set your ``$EDITOR`` (or ``$VISUAL``) environment variable in your shell with a command such as ``export EDITOR=/bin/nano``. .. _YAML format: https://yaml.org/spec/ Typically, the first entry ("key") in a job is a ``name``, which can be anything you want and helps you identify what you're monitoring. The second key is one of either ``url``, ``navigate`` or ``command``: - ``url`` retrieves what is served by the web server, - ``navigate`` handles more web pages requiring JavaScript to display the content to be monitored, and - ``command`` runs a shell command. You can then use optional keys to finely control various job's parameters. Finally, you often use the ``filter`` key to select one or more :ref:`filters` to apply to the data after it is retrieved, to: - select HTML: ``css``, ``xpath``, ``element-by-class``, ``element-by-id``, ``element-by-style``, ``element-by-tag`` - make HTML more readable: ``html2text``, ``beautify`` - make PDFs readable: ``pdf2text`` - make JSON more readable: ``format-json`` - make iCal more readable: ``ical2text`` - make binary readable: ``hexdump`` - just detect changes: ``sha1sum`` - edit text: ``grep``, ``grepi``, ``strip``, ``sort`` These :ref:`filters` can be chained. As an example, after retrieving an HTML document by using the ``url`` key, you can extract a selection with the ``xpath`` filter, convert this to text with ``html2text``, use ``grep`` to extract only lines matching a specific regular expression, and then ``sort`` them: .. code-block:: yaml name: "Sample urlwatch job definition" url: "https://example.dummy/" https_proxy: "http://dummy.proxy/" max_tries: 2 filter: - xpath: '//section[@role="main"]' - html2text: method: pyhtml2text unicode_snob: true body_width: 0 inline_links: false ignore_links: true ignore_images: true pad_tables: false single_line_break: true - grep: "lines I care about" - sort: --- If you have more than one job, per `YAML specifications `__, you separate them with a line containing only ``---``. :ref:`Reporters` ---------------- `urlwatch` can be configured to do something with its report besides (or in addition to) the default of displaying it on the console, such as one or more of: - ``email`` (using SMTP) - email using ``mailgun`` - ``slack`` - ``pushbullet`` - ``telegram`` - ``matrix`` - ``pushover`` - ``stdout`` - ``xmpp`` Reporters are configured in a separate file, see :ref:`configuration`.