Jump to content

Parsoid/Developer Setup

From mediawiki.org
This page describes Parsoid/JS , which has been replaced by Parsoid/PHP in MW 1.35 and newer.

This page describes installation of Parsoid/JS from source. This is primarily useful for developers of Parsoid, but if there are not prebuilt packages of Parsoid for your operating system, you might also find this useful.

Quick start

node -v # should be version 6.x or higher
git clone https://gerrit.wikimedia.org/r/mediawiki/services/parsoid
cd parsoid
git review -s # optional, see below
npm install
npm test # might as well - requires nsp, eslint to be installed
cp config.example.yaml config.yaml
edit config.yaml
npm start

For backwards compatibility, and to continue supporting non-static configs, localsettings.js can be configured as follows:

edit localsettings.js
// Note that a config.yaml is still required
edit config.yaml
// Add a path to the file as, "localsettings: ./localsettings.js"
// See the comments in config.yaml for details
npm start

See Parsoid/Setup#Configuration for more details on the "edit config.yaml" and "edit localsettings.js" steps. See the Gerrit 'getting started' docs for more help with "git review", which is only necessary if you plan to contribute code changes back to us.

If the above commands don't immediately make sense to you, keep reading for more detailed instructions.

Ensure you have a recent node

Before you install Parsoid, you should ensure that you've got a recent version of node installed.

Parsoid requires node v6.x or higher, and we run v6.9.1 in production.

If you do not have new-enough node installed, follow the instructions at Parsoid/Installing Node and then come back here.

Installation from source on Linux or Mac OS X

Option 1. Clone the mediawiki/services/parsoid/deploy repository

This is perhaps the simplest way to install Parsoid if you just want to play around for a bit and not have to deal with npm install.

$ git clone --recursive https://gerrit.wikimedia.org/r/mediawiki/services/parsoid/deploy

This installs the version of Parsoid that is currently deployed in production alongwith all the node dependencies. The parsoid code itself will be in the src/ subdirectory. The npm modules will be in the node_modules/ subdirectory.

Option 2. Clone the mediawiki/services/parsoid repository

You can install the Parsoid code anywhere, it doesn't have to be installed or run as the root user.

Checkout the sources:

git clone https://gerrit.wikimedia.org/r/mediawiki/services/parsoid

Or if you plan to hack Parsoid, follow the the Gerrit 'getting started' docs and set up git-review in your new checkout. (This will also create an authenticated remote named gerrit in your repository.)

cd parsoid
git-review -s

Check your version of node: type node --version (or nodejs --version on Debian/Ubuntu) and it should print v10.x. (Higher is fine, too.) See Parsoid/Installing Node if that's not right.

Install the JS dependencies.

Run this command in the Parsoid directory (containing package.json):

npm install

Configuration

If you would like to point the Parsoid web service to your own wiki, go to the parsoid directory and edit the config.yaml file.

Use uri parameter to point to the MediaWiki instance(s) you want to use like this:

        mwApis:
        - # This is the only required parameter, the URL of you MediaWiki API endpoint.
          uri: 'http://yoursite.com/w/api.php'
          # The "domain" is used for communication with Visual Editor and RESTBase.
          # It defaults to the hostname portion of the `uri` property below, but you can manually set it to an arbitrary string.
          domain: 'yoursite.com'  # optional

If you would like to point the Parsoid web service to your own wiki using localsettings.js file in the config.yaml file uncomment the localsettings path like this:

        # For backwards compatibility, and to continue to support non-static configs for the time being, optionally provide a path to a localsettings.js file.
        # See localsettings.example.js
        localsettings: ./localsettings.js

and comment mwApis, uri and domain parameters like this:

        #mwApis:
        #- # This is the only required parameter, the URL of you MediaWiki API endpoint.
          #uri: 'http://localhost/w/api.php'
          # The "domain" is used for communication with Visual Editor and RESTBase.
          # It defaults to the hostname portion of the `uri` property below, but you can manually set it to an arbitrary string.
          #domain: 'localhost'  # optional

go to the parsoid directory and create a localsettings.js file based on localsettings.js.example. Use parsoidConfig.setMwApi to point to the MediaWiki instance(s) you want to use like this:

parsoidConfig.setMwApi({ uri: 'http://yoursite.com/w/api.php', domain: 'yoursite.com', prefix: 'someuniqueid' });
Currently Parsoid supports public wikis, and private wikis using cookie forwarding. (See bug T69313 for some more hints on getting this working. Also see this Talk thread for a workaround)

You can then access pages of your wiki in Parsoid with the relative URL '/yoursite.com/v3/page/html/<page-title>/'

See Parsoid/Setup#Configuration for more details.

Parsoid may not be able to communicate with an API if it is behind a local virtual-host. In such cases, use a non-virtual-host URI for the mwApis config values (this will typically be a localhost URI instead).

Run the server

You should be able to start the Parsoid web service from the parsoid directory using:

node bin/server.js

and on ubuntu 14.04 type this in /parsoid directory.

nodejs bin/server.js

This will start the Parsoid HTTP service, its default in localsettings.js is to listen on port 8000. To test it, point your browser to http://localhost:8000/. If you configured Parsoid correctly, you should be able to parse pages via http://localhost:8000/yoursite.com/v3/page/html/<pagename>. Note that this test might also fail, in the case your hosting provider has disabled port 8000 for your account.

Two environment variables are available to control binding to a specific interface and/or port:

export INTERFACE=127.0.0.1
export PORT=8142 
nodejs bin/server.js

Starting the Parsoid service automatically

There are many ways to start services automatically, consult your server's operating system documentation.

Upstart (Ubuntu)

On Ubuntu and other operating systems using Upstart, one approach is

sudo ln -s /lib/init/upstart-job /etc/init.d/parsoid
sudo vi /etc/init/parsoid.conf

where /etc/init/parsoid.conf contains configuration similar to MediaWiki-Vagrant 's parsoid.conf:

# vim: set ft=upstart:

# Upstart job configuration for Parsoid

description "Parsoid HTTP service"

start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [!2345]

setuid "www-data"
setgid "www-data"

env VCAP_APP_PORT="8000"
env NODE_PATH="/js/node_modules"

chdir "/path/to/parsoid"
exec nodejs bin/server.js

respawn

To test your configuration type

init-checkconf /etc/init/parsoid.conf

If the answer is "syntax ok" you can start the service:

sudo service parsoid start

To check, if the service is running, type

service parsoid status

And if you want to stop your node.js-parsoid-server you can do this with

sudo service parsoid stop

You can find more helpful instructions to get node running as server with ubuntu in this article: The Upstart Event System: What It Is And How To Use It

Fedora

On recent versions of Fedora and other operating systems using systemd, use a parsoid.service unit file similar to the following template (modify the file paths as appropriate):

[Unit]
Description=MediaWiki Parsoid web service on node.js
Documentation=https://www.mediawiki.org/wiki/Parsoid
Wants=local-fs.target network.target
After=local-fs.target network.target

[Install]
WantedBy=multi-user.target

[Service]
Type=simple
# or better, use a dedicated user "parsoid"
User=nobody
WorkingDirectory=/path/to/parsoid
EnvironmentFile=-/etc/parsoid/parsoid.env
ExecStart=/usr/bin/nodejs /path/to/parsoid/bin/server.js
PrivateTmp=true
PrivateDevices=true
ProtectSystem=full
ProtectHome=true
NoNewPrivileges=true
CapabilityBoundingSet=
ReadOnlyPaths=/

The optional EnvironmentFile directive above can specify the path to a file similar to the following template:

PORT=8000
NODE_PATH=/path/to/parsoid/node_modules

You can also use PM2 to daemonize the server.js application.

Install using npm:

npm install -g pm2

Start server.js through PM2:

pm2 start /path/to/parsoid/bin/server.js

The parsoid server is now running and managed by PM2. Save the process list:

pm2 save

Now whenever PM2 starts the parsoid server application will run and be managed by PM2. The final step is to have PM2 automatically start on system boot:

# Render startup-script for a specific platform, the [platform] could be one of:
#   ubuntu|centos|redhat|gentoo|systemd|darwin|amazon
$ pm2 startup [platform]

For later Ubuntu releases that use systemd use this as the [platform] rather than 'ubuntu'.

See bug T69313 for packaging plans that should make the general installation easier.

Automatically Starting on macOS

On macOS, you can create a plist. This is an example of a suitable plist (adapt the parsoid and config.yaml paths to your system):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
  <dict>
  <key>Label</key>
  <string>org.mediawiki.parsoid.start</string>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/node</string>
    <string>/path/to/parsoid/bin/server.js</string>
    <string>--config</string>
    <string>/path/to/config.yaml</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
  <key>UserName</key>
  <string>root</string>
  </dict>
</plist>

Save it as org.mediawiki.parsoid.start.plist in /Library/LaunchDaemons and change to root user & wheel group, correct permissions, and add it as a persistent launchd job (so that it will restart on the next reboot).

sudo chown root:wheel /Library/LaunchDaemons/org.mediawiki.parsoid.start.plist
sudo chmod o-w /Library/LaunchDaemons/*
sudo launchctl load -w /Library/LaunchDaemons/org.mediawiki.parsoid.start.plist

Passenger

This is probably not recommended. If using passenger, make sure that num_workers in config.yaml is set to 0, otherwise it won't bind to passenger's socket properly.

Gentoo Linux, Funtoo Linux

emerge -av '>=dev-lang/nodejs-0.10' # install nodejs in the 0.10.x series or higher
git clone https://gerrit.wikimedia.org/r/mediawiki/services/parsoid # check out source
cd parsoid # enter checked-out source
npm install -g # download parsoid's nodejs library dependencies and install system-wide

Add a config.yaml file in the install location (/usr/lib64/node_modules/parsoid), see example file.

As an init.d file, supposing that node is installed in its default location and parsoid installed system-wide (npm install -g)

pidfile="/var/run/parsoid.pid"
command="/usr/bin/node"
command_args="/usr/lib64/node_modules/parsoid/bin/server.js"
command_background="true"

depend() {
   need net
}

A better idea would be to run Parsoid without root permissions. Let's create unprivileged system account:

useradd -r -s /sbin/nologin -d /dev/null -M -c 'Unprivileged system account for Parsoid' parsoid

And our init.d script would change accordingly (assuming default locations, system-wide installation):

#!/sbin/runscript

PARSOID_PIDFILE="/var/run/parsoid.pid"
NODE="/usr/bin/node"
NODE_FOLDER="/usr/lib64/node_modules/parsoid"
NODE_OPTS="/usr/lib64/node_modules/parsoid/bin/server.js"

depend() {
        need net
}

start() {
        ebegin "Starting parsoid"
        start-stop-daemon --start --quiet \
                --pidfile "${PARSOID_PIDFILE}" \
                --chdir ${NODE_FOLDER} \
                --make-pidfile --background \
                --user parsoid --group parsoid \
                --exec ${NODE} -- ${NODE_OPTS}
        eend $?
}

stop() {
        ebegin "Stopping parsoid"
        start-stop-daemon --stop --quiet \
                --pidfile "${PARSOID_PIDFILE}"
        eend $?
}

FreeBSD

startup script from https://www.reddit.com/r/freebsd/comments/4ft79b/best_practice_for_daemonizing_nodejs/d2earq1/

#!/bin/sh
#
# PROVIDE: parsoid
# REQUIRE: LOGIN nginx
# KEYWORD: shutdown

# parsoid_enable="yes"

. /etc/rc.subr

name="parsoid"
rcvar=parsoid_enable

parsoid_pid=/var/run/parsoid.pid

# Can also be -9 etc. -HUP usually will only cause sadness.
sig_stop=-KILL

start_cmd=parsoid_start
stop_cmd=parsoid_kill
#restart_cmd=parsoid_restart    # didn't bother to implement

load_rc_config ${name}

# Set to either 'node' to use default or set EXPLICIT e.g. '/usr/local/bin/node1.10'
command_interpreter="node"
# Command path must always be explicit.
command="/opt/extl/parsoid/api/server.js"
# Contain arguments in quotes, use ' and \"
command_args=""

parsoid_start()
{
    # See daemon(8) for more details.
    daemon -f -p $parsoid_pid $command_interpreter $command $command_args
    # This is another favorite method. Select only one of these.
    #app_root="/opt/extl/parsoid/api"
    #nginx_user="nginx"
    #daemon -p $parsoid_pid -c $app_root -f -r -u $nginx_user $command_interpreter $command $command_args
    if [ $? -ne 0 ]; then
        echo "Error starting Parsoid."
        exit 1
    fi
    echo "Starting Parsoid."
}

parsoid_kill()
{
    echo "Stopping Parsoid."
    kill $sig_stop `cat $parsoid_pid`
}

run_rc_command "$1"

Windows setup

These steps are the same as the installation of Parsoid on Linux:

git clone https://gerrit.wikimedia.org/r/mediawiki/services/parsoid
cd parsoid
npm install

If npm install fails due to npm being an unknown command, try to add the nodesjs folder to the PATH, as explained earlier in the Install prerequisite software section, and run npm install directly from the parsoid folder created by the git clone command.

If the installation fails again, you may try to disable your router firewall.

When the installation is complete, configure parsoid and run (server.js might be located in the folder bin\ and not in api\ in newer versions of Parsoid):

node bin\server.js

To run parsoid in the background, create a cmd file in the parsoid directory called parsoid.bat. Setup a scheduled task to run the task on startup. Alternatively several "run batch file as a service" commands exist on the internet.

@echo off
"%ProgramFiles(x86)%\nodejs\node.exe" bin\server.js

When using the cmd file as scheduled task it might be required to use the full path to the server.js file (e.g. C:\www\parsoid\bin\server.js) instead of bin\server.js.

Git will fail to download if you have a corporate proxy to go through, so you need to do the following first;

git config --global -e

This will launch an editor now press i once in insert mode type;

[http] proxy = http://proxy.company.com:8080

Now finally press ESC and then type :wq and press enter to save changes. Proxy is now enabled.

Windows Server 2008 R2

MediaWiki must be installed, and an extension Visual Editor as well.

Troubleshooting

If things are still not working, then see our troubleshooting page.

See also

References