diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/05_Bridge_API/04_WebDriverAbstract.md | 83 | ||||
-rw-r--r-- | docs/05_Bridge_API/05_XPathAbstract.md (renamed from docs/05_Bridge_API/04_XPathAbstract.md) | 0 | ||||
-rw-r--r-- | docs/05_Bridge_API/index.md | 3 |
3 files changed, 85 insertions, 1 deletions
diff --git a/docs/05_Bridge_API/04_WebDriverAbstract.md b/docs/05_Bridge_API/04_WebDriverAbstract.md new file mode 100644 index 00000000..60b5e99d --- /dev/null +++ b/docs/05_Bridge_API/04_WebDriverAbstract.md @@ -0,0 +1,83 @@ +`WebDriverAbstract` extends [`BridgeAbstract`](./02_BridgeAbstract.md) and adds functionality for generating feeds +from active websites that use XMLHttpRequest (XHR) to load content and / or JavaScript to +modify content. +It highly depends on the php-webdriver library which offers Selenium WebDriver bindings for PHP. + +- https://github.com/php-webdriver/php-webdriver (Project Repository) +- https://php-webdriver.github.io/php-webdriver/latest/ (API) + +Please note that this class is intended as a solution for websites _that cannot be covered +by the other classes_. The WebDriver starts a browser and is therefore very resource-intensive. + +# Configuration + +You need a running WebDriver to use bridges that depend on `WebDriverAbstract`. +The easiest way is to start the Selenium server from the project of the same name: +``` +docker run -d -p 4444:4444 --shm-size="2g" docker.io/selenium/standalone-chrome:latest +``` + +- https://github.com/SeleniumHQ/docker-selenium + +With these parameters only one browser window can be started at a time. +On a multi-user site, Selenium Grid should be used +and the number of sessions should be adjusted to the number of processor cores. + +Finally, the `config.ini.php` file must be adjusted so that the WebDriver +can find the Selenium server: +``` +[webdriver] + +selenium_server_url = "http://localhost:4444" +``` + +# Development + +While you are programming a new bridge, it is easier to start a local WebDriver because then you can see what is happening and where the errors are. I've also had good experience recording the process with a screen video to find any timing problems. + +``` +chromedriver --port=4444 +``` + +- https://chromedriver.chromium.org/ + +If you start rss-bridge from a container, then Chrome driver is only accessible +if you call it with the `--allowed-ips` option so that it binds to all network interfaces. + +``` +chromedriver --port=4444 --allowed-ips=192.168.1.42 +``` + +The **most important rule** is that after an event such as loading the web page +or pressing a button, you often have to explicitly wait for the desired elements to appear. + +A simple example is the bridge `ScalableCapitalBlogBridge.php`. +A more complex and relatively complete example is the bridge `GULPProjekteBridge.php`. + +# Template + +Use this template to create your own bridge. + +```PHP +<?php + +class MyBridge extends WebDriverAbstract +{ + const NAME = 'My Bridge'; + const URI = 'https://www.example.org'; + const DESCRIPTION = 'Further description'; + const MAINTAINER = 'your name'; + + public function collectData() + { + parent::collectData(); + + try { + // TODO + } finally { + $this->cleanUp(); + } + } +} + +```
\ No newline at end of file diff --git a/docs/05_Bridge_API/04_XPathAbstract.md b/docs/05_Bridge_API/05_XPathAbstract.md index fd697995..fd697995 100644 --- a/docs/05_Bridge_API/04_XPathAbstract.md +++ b/docs/05_Bridge_API/05_XPathAbstract.md diff --git a/docs/05_Bridge_API/index.md b/docs/05_Bridge_API/index.md index 06445246..ea6fd315 100644 --- a/docs/05_Bridge_API/index.md +++ b/docs/05_Bridge_API/index.md @@ -8,6 +8,7 @@ Base class | Description -----------|------------ [`BridgeAbstract`](./02_BridgeAbstract.md) | This class is intended for standard _Bridges_ that need to filter HTML pages for content. [`FeedExpander`](./03_FeedExpander.md) | Expand/modify existing feed urls -[`XPathAbstract`](./04_XPathAbstract.md) | This class is meant as an alternative base class for bridge implementations. It offers preliminary functionality for generating feeds based on _XPath expressions_. +[`WebDriverAbstract`](./04_WebDriverAbstract) | +[`XPathAbstract`](./05_XPathAbstract) | This class is meant as an alternative base class for bridge implementations. It offers preliminary functionality for generating feeds based on _XPath expressions_. For more information about how to create a new _Bridge_, read [How to create a new Bridge?](./01_How_to_create_a_new_bridge.md)
\ No newline at end of file |