Jobs¶
Jobs are the kind of things that urlwatch(1) can monitor.
The list of jobs to run are contained in the configuration file urls.yaml
,
accessed with the command urlwatch --edit
, each separated by a line
containing only ---
. The command urlwatch --list
prints the name
of each job, along with its index number (1, 2, 3, …) which gets assigned
automatically according to its position in the configuration file.
While optional, it is recommended that each job starts with a name
entry:
name: "This is a human-readable name/label of the job"
The following job types are available:
URL¶
This is the main job type – it retrieves a document from a web server:
name: "urlwatch homepage"
url: "https://thp.io/2008/urlwatch/"
Required keys:
url
: The URL to the document to watch for changes
Job-specific optional keys:
cookies
: Cookies to send with the request (see Advanced Topics)method
: HTTP method to use (default:GET
)data
: HTTP POST/PUT datassl_no_verify
: Do not verify SSL certificates (true/false)ignore_cached
: Do not use cache control (ETag/Last-Modified) values (true/false)http_proxy
: Proxy server to use for HTTP requests (might be http:// or socks5://)https_proxy
: Proxy server to use for HTTPS requests (might be http:// or socks5://)headers
: HTTP header to send along with the requestencoding
: Override the character encoding from the server (see Advanced Topics)timeout
: Override the default socket timeout (see Advanced Topics)ignore_connection_errors
: Ignore (temporary) connection errors (see Advanced Topics)ignore_http_error_codes
: List of HTTP errors to ignore (see Advanced Topics)ignore_timeout_errors
: Do not report errors when the timeout is hitignore_too_many_redirects
: Ignore redirect loops (see Advanced Topics)ignore_incomplete_reads
: Ignore incomplete HTTP responses (see Advanced Topics)
(Note: url
implies kind: url
)
Browser¶
This job type is a resource-intensive variant of “URL” to handle web pages that require JavaScript to render the content being monitored.
The optional playwright package must be installed in order to run Browser jobs
(see Dependencies). You will also need to install the browsers using
playwright install
(see Playwright Installation for details).
name: "A page with JavaScript"
navigate: "https://example.org/"
Required keys:
navigate
: URL to navigate to with the browser
Job-specific optional keys:
wait_until
: Eitherload
,domcontentloaded
,networkidle
, orcommit
(see Advanced Topics)wait_for
: A CSS or XPath selector based on the Playwright Locator: https://playwright.dev/python/docs/locators#locate-by-css-or-xpath spec. The job will wait for the default timeout of 30 seconds.useragent
:User-Agent
header used for requests (otherwise browser default is used)browser
: Eitherchromium
,chrome
,chrome-beta
,msedge
,msedge-beta
,msedge-dev
,firefox
,webkit
(must be installed withplaywright install
)
Because this job uses Playwright to
render the page in a headless browser instance, it uses massively more resources
than a “URL” job. Use it only on pages where url
does not return the correct
results. In many cases, instead of using a “Browser” job, you can use the output
of an API called by the page as it loads, which contains the information you are
you’re looking for by using the much faster “URL” job type.
(Note: navigate
implies kind: browser
)
Shell¶
This job type allows you to watch the output of arbitrary shell commands, which is useful for e.g. monitoring an FTP uploader folder, output of scripts that query external devices (RPi GPIO), etc…
name: "What is in my Home Directory?"
command: "ls -al ~"
Required keys:
command
: The shell command to execute
Job-specific optional keys:
stderr
: Change how standard error is treated, see below
(Note: command
implies kind: shell
)
Configuring stderr
behavior for shell jobs¶
By default urlwatch captures stderr
for error reporting (non-zero exit
code), but ignores the output when the shell job exits with exit code 0.
This behavior can be customized using the stderr
key:
ignore
: Capturestderr
, report on non-zero exit code, ignore otherwise (default)urlwatch
:stderr
of the shell job is sent tostderr
of theurlwatch
process; any error message onstderr
will not be visible in the error message from the reporter (legacy default behavior of urlwatch 2.24 and older)fail
: Treat the job as failed if there is any output onstderr
, even with exit status 0stdout
: Mergestderr
output intostdout
, which means stderr output is also considered for the change detection/diff part of urlwatch (this is similar to2>&1
in a shell)
For example, this job definition will make the job appear as failed, even though the script exits with exit code 0:
command: |
echo "Normal standard output."
echo "Something goes to stderr, which makes this job fail." 1>&2
exit 0
stderr: fail
On the other hand, if you want to diff both stdout and stderr of the job, use this:
command: |
echo "An important line on stdout."
echo "Another important line on stderr." 1>&2
stderr: stdout
Optional keys for all job types¶
name
: Human-readable name/label of the jobtags
: Array of tags, or a single tag as a stringfilter
: Filters (if any) to apply to the output (can be tested with--test-filter
)max_tries
: After this many sequential failed runs, the error will be reported rather than ignoreddiff_tool
: Command to a custom tool for generating diff textdiff_filter
: Filters (if any) to apply to the diff result (can be tested with--test-diff-filter
)treat_new_as_changed
: Will treat jobs that don’t have any historic data asCHANGED
instead ofNEW
(and create a diff for new jobs)compared_versions
: Number of versions to compare for similaritykind
(redundant): Eitherurl
,shell
orbrowser
. Automatically derived from the unique key (url
,command
ornavigate
) of the job typeuser_visible_url
: Different URL to show in reports (e.g. when watched URL is a REST API URL, and you want to show a webpage)enabled
: Can be set to false to disable an individual job (default istrue
)
Setting keys for all jobs at once¶
The main Configuration file has a job_defaults
key that can be used to configure keys for all jobs at once.