diff options
Diffstat (limited to 'docs/05_Bridge_API/04_WebDriverAbstract.md')
-rw-r--r-- | docs/05_Bridge_API/04_WebDriverAbstract.md | 83 |
1 files changed, 83 insertions, 0 deletions
diff --git a/docs/05_Bridge_API/04_WebDriverAbstract.md b/docs/05_Bridge_API/04_WebDriverAbstract.md new file mode 100644 index 00000000..60b5e99d --- /dev/null +++ b/docs/05_Bridge_API/04_WebDriverAbstract.md @@ -0,0 +1,83 @@ +`WebDriverAbstract` extends [`BridgeAbstract`](./02_BridgeAbstract.md) and adds functionality for generating feeds +from active websites that use XMLHttpRequest (XHR) to load content and / or JavaScript to +modify content. +It highly depends on the php-webdriver library which offers Selenium WebDriver bindings for PHP. + +- https://github.com/php-webdriver/php-webdriver (Project Repository) +- https://php-webdriver.github.io/php-webdriver/latest/ (API) + +Please note that this class is intended as a solution for websites _that cannot be covered +by the other classes_. The WebDriver starts a browser and is therefore very resource-intensive. + +# Configuration + +You need a running WebDriver to use bridges that depend on `WebDriverAbstract`. +The easiest way is to start the Selenium server from the project of the same name: +``` +docker run -d -p 4444:4444 --shm-size="2g" docker.io/selenium/standalone-chrome:latest +``` + +- https://github.com/SeleniumHQ/docker-selenium + +With these parameters only one browser window can be started at a time. +On a multi-user site, Selenium Grid should be used +and the number of sessions should be adjusted to the number of processor cores. + +Finally, the `config.ini.php` file must be adjusted so that the WebDriver +can find the Selenium server: +``` +[webdriver] + +selenium_server_url = "http://localhost:4444" +``` + +# Development + +While you are programming a new bridge, it is easier to start a local WebDriver because then you can see what is happening and where the errors are. I've also had good experience recording the process with a screen video to find any timing problems. + +``` +chromedriver --port=4444 +``` + +- https://chromedriver.chromium.org/ + +If you start rss-bridge from a container, then Chrome driver is only accessible +if you call it with the `--allowed-ips` option so that it binds to all network interfaces. + +``` +chromedriver --port=4444 --allowed-ips=192.168.1.42 +``` + +The **most important rule** is that after an event such as loading the web page +or pressing a button, you often have to explicitly wait for the desired elements to appear. + +A simple example is the bridge `ScalableCapitalBlogBridge.php`. +A more complex and relatively complete example is the bridge `GULPProjekteBridge.php`. + +# Template + +Use this template to create your own bridge. + +```PHP +<?php + +class MyBridge extends WebDriverAbstract +{ + const NAME = 'My Bridge'; + const URI = 'https://www.example.org'; + const DESCRIPTION = 'Further description'; + const MAINTAINER = 'your name'; + + public function collectData() + { + parent::collectData(); + + try { + // TODO + } finally { + $this->cleanUp(); + } + } +} + +```
\ No newline at end of file |