Introduction

urlwatch monitors the output of webpages or arbitrary shell commands.

Every time you run urlwatch, it:

  • retrieves the output and processes it
  • compares it with the version retrieved the previous time (“diffing”)
  • if it finds any differences, generates a summary “report” that can be displayed or sent via one or more methods, such as email

Jobs

Each website or shell command to be monitored constitutes a “job”.

The instructions for each such job are contained in a config file in the YAML format, accessible with the urlwatch --edit command. If you get an error, set your $EDITOR (or $VISUAL) environment variable in your shell with a command such as export EDITOR=/bin/nano.

Typically, the first entry (“key”) in a job is a name, which can be anything you want and helps you identify what you’re monitoring.

The second key is one of either url, navigate or command:

  • url retrieves what is served by the web server,
  • navigate handles more web pages requiring JavaScript to display the content to be monitored, and
  • command runs a shell command.

You can then use optional keys to finely control various job’s parameters.

Finally, you often use the filter key to select one or more filters to apply to the data after it is retrieved, to:

  • select HTML: css, xpath, element-by-class, element-by-id, element-by-style, element-by-tag
  • make HTML more readable: html2text, beautify
  • make PDFs readable: pdf2text
  • make JSON more readable: format-json
  • make iCal more readable: ical2text
  • make binary readable: hexdump
  • just detect changes: sha1sum
  • edit text: grep, grepi, strip, sort

These filters can be chained. As an example, after retrieving an HTML document by using the url key, you can extract a selection with the xpath filter, convert this to text with html2text, use grep to extract only lines matching a specific regular expression, and then sort them:

name: "Sample urlwatch job definition"
url: "https://example.dummy/"
https_proxy: "http://dummy.proxy/"
max_tries: 2
filter:
  - xpath: '//section[@role="main"]'
  - html2text:
      method: pyhtml2text
      unicode_snob: true
      body_width: 0
      inline_links: false
      ignore_links: true
      ignore_images: true
      pad_tables: false
      single_line_break: true
  - grep: "lines I care about"
  - sort:
---

If you have more than one job, per YAML specifications, you separate them with a line containing only ---.

Reporters

urlwatch can be configured to do something with its report besides (or in addition to) the default of displaying it on the console, such as one or more of:

  • email (using SMTP)
  • email using mailgun
  • slack
  • discord
  • pushbullet
  • telegram
  • matrix
  • pushover
  • stdout
  • xmpp

Reporters are configured in a separate file, see Configuration.