_ _ _ ____
_ _ _ __| |_ ____ _| |_ ___| |__ |___ \
| | | | '__| \ \ /\ / / _` | __/ __| '_ \ __) |
| |_| | | | |\ V V / (_| | || (__| | | | / __/
\__,_|_| |_| \_/\_/ \__,_|\__\___|_| |_| |_____|
... monitors webpages for you
urlwatch is intended to help you watch changes in webpages and get notified (via e-mail, in your terminal or through various third party services) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed.
Quick Start¶
- Run
urlwatch
once to migrate your old data or start fresh - Use
urlwatch --edit
to customize your job list (this will create/editurls.yaml
) - Use
urlwatch --edit-config
if you want to set up e-mail sending - Add
urlwatch
to your crontab (crontab -e
) to monitor webpages periodically
The checking interval is defined by how often you run urlwatch
. You
can use e.g. crontab.guru to figure out the
schedule expression for the checking interval, we recommend not more
often than 30 minutes (this would be */30 * * * *
). If you have
never used cron before, check out the crontab command
help.
On Windows, cron
is not installed by default. Use the Windows Task
Scheduler
instead, or see this StackOverflow
question for
alternatives.
The Handbook¶
- Introduction
- Dependencies
- Jobs
- Filters
- Built-in filters
- Picking out elements from a webpage
- Chaining multiple filters
- Extracting only the
<body>
tag of a page - Filtering based on an XPath expression
- Filtering based on CSS selectors
- Using XPath and CSS filters with XML and exclusions
- Limiting the returned items from a CSS Selector or XPath
- Filtering PDF documents
- Sorting of webpage content
- Reversing of lines or separated items
- Watching Github releases
- Remove or replace text using regular expressions
- Using a shell script as a filter
- Converting text in images to plaintext
- Configuration
- Reporters
- Advanced Topics
- Adding URLs from the command line
- Using word-based differences
- Ignoring connection errors
- Overriding the content encoding
- Changing the default timeout
- Supplying cookie data
- Comparing with several latest snapshots
- Receiving a report every time urlwatch runs
- Using Redis as a cache backend
- Watching changes on .onion (Tor) pages
- Watching Facebook Page Events
- Only show added or removed lines
- Pass diff output to a custom script
- Setting the content width for
html2text
(lynx
method) - Comparing web pages visually
- Configuring how long browser jobs wait for pages to load
- Treating
NEW
jobs asCHANGED
- Monitoring the same URL in multiple jobs
- Deprecated Features
- Migration from 1.x