Labels

R (15) Admin (12) programming (11) Rant (6) personal (6) parallelism (4) HPC (3) git (3) linux (3) rstudio (3) spectrum (3) C++ (2) Modeling (2) Rcpp (2) SQL (2) amazon (2) cloud (2) frequency (2) math (2) performance (2) plotting (2) postgresql (2) DNS (1) Egypt (1) Future (1) Knoxville (1) LVM (1) Music (1) Politics (1) Python (1) RAID (1) Reproducible Research (1) animation (1) audio (1) aws (1) data (1) economics (1) graphing (1) hardware (1)

14 November 2016

Playing with OpenCL

I spent last week reading up on modern C++ developments, including some great essays from Herb Sutter. I was particularly struck by his prescient series on Moore's Law, The Free Lunch Is Over and Welcome to the Jungle. The latter essay portrays all possible computer architectures on a 2D plane of CPU versus memory architecture. The axes are a bit tricky, but the general idea is that a platform at the "origin" is predictable and easy to program for, whereas things get trickier as you move up and/or right.

image

This figure deserves describes everything from cellphones and game consoles to super computers and communications satellites. It also got me wondering how hard a simple "hello world" OpenCL program would be to get running on my Intel-only laptop. Can I do this in a day, or perhaps just an evening?

Personally, I don't want to fuss with hardware right now - I just want to see how the GPU/C++ pieces fit together. Conveniently, my laptop contains a low-end 24-core embedded GPU on its Broadwell chip. Running Debian, I was able to easily download the requisite packages and get started. I quickly discovered that consumer Intel GPUs, at present, do *not* support double anything, making this a less-than-ideal test-bed for scientific programming.

My test case was inspired by a performance issue that I ran into in my work. In a scientific simulation program that I use & help develop, Valgrind revealed that pow(double, double) was taking fully half of the total computational time. Poking around a bit, I see that pow() and log really are quite complex to compute, particularly for doubles (since the total effort is a function of precision). With this in mind, I set up a simple example using both OpenCl and straight C++, and compared timings. Note - I strongly recommend using a sample size of greather than one to draw any conclusions with real-life consequences!

In this example, the vanilla C++ is clean and easy to read, but is ~20x slower than the OpenCL version. Worried about the possibility of "unintended optimizations", I tried using a different kernel function. I used float for both examples to keep the total computational complexity the same. The speed results remained, but the new test revealed different answers. To the best of my understanding, this highlights differences in precision-sensitive operations between the OpenCL and stdc++ platforms. This is a pretty tricky area - just know that it's something to keep an eye on if you require perfect concordance between platforms.

EDIT: I also added an example using Boost.Compute today, which brings the best of both the C++ and OpenCL worlds. Boost.Compute has straighforward docs, and includes a nice closure macro that allows the direct incorporation of C++ code in kernel functions. The resulting code is *way* less verbose than vanilla OpenCL. The only downside is the extra requirement. That and some *very* noisy compiler warnings.

Here's a full example that can be found in my github test code repo (boost not shown, timings comparable with OpenCL):

$make; time ./opencl; time ./straight

g++ -Wall -std=c++11 -lOpenCL -o opencl opencl.cpp
g++ -Wall -std=c++11 -o straight straight.cpp
Using platform: Intel Gen OCL Driver
Using device: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2

 result:
200 201 202 203 204 205 206 207 208 209
99990 99991 99992 99993 99994 99995 99996 99997 99998 99999
./opencl  0.20s user 0.09s system 97% cpu 0.295 total

 result:
200 201 202 203 204 205 206 207 208 209
99990 99991 99992 99993 99994 99995 99996 99997 99998 99999
./straight  6.03s user 0.00s system 99% cpu 6.032 total

Hopefully this example helps you get started experimenting with GPU computing. As Herb Sutter points out, we can expect more and greater hardware parallelism in the near future. Discrete GPUs are now commonly used in scientific computing, and Intel is now selling a massively-multicore add-on card, the Xeon Phi processor. Finally, Floating point precision remains an interesting question to keep an eye on in this domain.

15 February 2016

Shiny on Webfaction: VPS installion without root

I've been using Webfaction (plug) as an inexpensive managed VPN. Part of me wants VPS root access, but I'm mostly happy to leave the administrative details to others. Webfaction seems to be a good example of a common VPS plan: user-only access in a rich development environment. Compilers, zsh, and even tmux are available from the shell, making this a very comfortable dev environment overall.

Most times root doesn't matter, but sometimes it complicates new software installs. I've been looking forwards to testing R's webapp package Shiny, but all of the docs assume root access (and some even state that it's required). I set off without knowing if this would work, attempting to see how far I could get. What follows is a (hopefully) reproducible account of a user-land install of R & Shiny via ssh on a Webfaction slice. To the best of my knowledge, this requires only standard development tools, and so should(??) work.

In the following I use [tab] to indicate hitting tab key for auto-completion. The VPS login username is [user]. [edit] means call your editor of choice (vim, emacs, or, god forbid, nano). This assumes you are using bash (which seems to be the default shell on most VPNs).

Prepare the build environment

## ssh to webhost
## make directories, set paths, etc
## source build dir
mkdir ~/src
## software install dir
mkdir ~/local
## personal content dir
CONTENTDIR=~/var
mkdir $CONTENTDIR
## some hosts have /tmp set noexec?
mkdir src/tmp
## Install software here
INSTPREFIX=$HOME/local

## set paths:  
##
echo 'export PATH=$PATH:~/local/bin:~/local/shiny-server/bin' >> ~/.bashrc
echo 'export TMPDIR=$HOME/src/tmp' >>~/.bashrc

## check that all is well
[edit] ~/.bashrc
## update env
. .bashrc
[Ref: temp dir and R packages]

Install R from source: fast and (mostly) easy

cd ~/src
wget http://cran.us.r-project.org/src/base/R-3/R-3.2.3.tar.gz
tar xzf R-3.2.3.tar.gz
cd R-[tab]
./configure --prefix=$INSTPREFIX
## missing library, search and add directory
CPPFLAGS=/usr/lib/jvm/java/include/ make
make install
cd ~

Prep R environment

## The following commands are in R
install.packages(c('shiny', 'rmarkdown'))
## From the shell:
## on a headless / no-X11 box, need cairo for png
echo "options(bitmapType='cairo')" >> ~/.Rprofile
## check that all is well
[edit] ~/.Rprofile
[Ref: R png without X11]

Install cmake (if needed)

## first install cmake - skip if's already available 
`which cmake`
## nothing?  continue
## NOTE - I'm using the source tarball here, not binaries
wget https://cmake.org/files/v3.4/cmake-3.4.3.tar.gz 
tar xzf cmake-[tab]
cd cmake-[tab]
./configure --prefix=$INSTPREFIX
gmake
make install

Install Shiny Server

## From shell
cd ~/src
git clone https://github.com/rstudio/shiny-server.git
cd shiny-server
cmake -DCMAKE_INSTALL_PREFIX=$INSTPREFIX
make 
## "make install" Complains about no build dir
## I'm not sure what happens here, but this seems to work
PYTHON=`which python`
mkdir build
./bin/npm --python="$PYTHON" rebuild 
./bin/node ./ext/node/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js --python="$PYTHON" rebuild 
make install
[Ref: shiny build docs]

Configure Shiny Server

All of the Shiny Server docs assume the config file is located in /etc/, which I don't have access to. There's _zero_ documentation on running shiny, nor does running shiny-server -h or shiny-server --help provide any indication. Trial and error and reading source code on github finally leads to shiny-server path-to-config-file. So, let's make a shiny site!
## Nest content in ~/var
mkdir $CONTENTDIR/shiny
cp -rp ~/src/shiny-server/samples $CONTENTDIR/shiny/apps
mkdir $CONTENTDIR/shiny/logs
## copy the packaged settings template to the content dir
cp ~/src/shiny-server/config/default.config $CONTENTDIR/shiny/server.conf
[edit] $CONTENTDIR/shiny/server.conf
##
## server.conf content follows:
run_as [user];
## leave location as-is
## substitute var with $CONTENTDIR if needed
    site_dir /home/[user]/var/shiny/apps;
    log_dir /home/[user]/var/shiny/logs;    
## save file
## back at shell, run shiny, put in background
shiny-server ~/var/shiny/server.conf &
[Ref: Shiny-server docs]

Testing

Shiny should give messages about Starting listener on 0.0.0.0:3838. First up, let's use ssh to connect remote port 3838 to a local port. This allows local testing without deployment. As an aside, if you're not using ~/.ssh/config on a local machine to manage keys and hostname shortcuts, you should!
## on local machine:
ssh -nNT -L 9000:127.0.0.1:3838 [user]@webhost
Now, if all went well, you should be able to navigate to the welcome page via browser on local machine:
http://127.0.0.1:9000

Once shiny is working, don't forget to take a look at your logs:
ls -alh $CONTENTDIR/shiny/logs

I had trouble with the packaged rmd example app (which renders a .Rmd file). Reading logs showed install issues with pandoc, and I had to manually fiddle with the links:

ln -s $INSTPREFIX/shiny-server/ext/pandoc/static/pandoc $INSTPREFIX/shiny-server/ext/pandoc/
[Ref: port forwarding]

Wrap-up

For a full production environment, you would want a process monitor to keep shiny-server running, as well a public-facing server. See your webhost's documentation for process monitors. More details on shiny-server and apache are here (I haven't tried these proxy methods).

Finally, a more conventional approach using root access on a VPS (such as DigitalOcean) is available here.

Update - 17 Feb 2016: Deployment Logistics

After a day of kicking the tires, I'm happy to report Shiny-server is working well on Webfaction in production mode. Two points:

Making a webapp. In the Webfaction control panel, I added a custom application. In the following, substitute [appname] for the value entered in the Name field. For App category I selected "Websockets", and then clicked "Save". Copy the port number. Edit the server.conf file from above, replacing the number in listen 3838; with the port number copied from Webfaction. Finally, create a website, add a name (can be the same as [appname] from above), and a domain. It typically takes a few minutes for DNS changes to propagate.

The above steps creates a directory named $HOME/webapps/[appname]. I placed the server.conf file here, created app and log directories, and then updated server.conf to reflect the new locations:

## Create the following directories
## add these paths to server.conf, 
## and don't forget the trailing ; 
mkdir $HOME/[appname]/logs
## shiny app files go here:
mkdir $HOME/[appname]/app

[Ref: Webfaction custom application]
[Ref: Webfaction Applications and Websites]

Running the server. Shiny-server will use a PID file, which makes job-spawning a simple shell script + cron job. If shiny-server is already running, it will recognize the PID file and not start another process. I made the following script:

#!/bin/sh
## executable shell script names $HOME/bin/my.shiny.sh
## make sure to run: chmod +x $HOME/bin/my.shiny.sh
APPROOT=$HOME/webapps/[appname]                                                                               
PIDFN=$APPROOT/shiny-server.pid                   
## using full path                                                        
$HOME/local/shiny-server/bin/shiny-server $APPROOT/server.conf --pidfile=$PIDFN>> $APPROOT/logs/server.log 2>&1 &
Now run crontab -e and add an entry for the script (above):
## try once an hour, on the 10th minute of the hour
10 * * * * /home/[user]/bin/my.shiny.sh
Finally, take a look at memory usage. If you exceed memory limits, Webfaction automatically kills everything. And R's memory use grows with more connections (which themselves persist, because websockets). Webfaction distributes a nice python script that shows per-process and total memory usage.

[Ref: shiny-server systemd script (shows commandline usage)]
[Ref: Webfaction cron]

I should point out that I like Webfaction (plug) well enough to pay them money. Their intro plan is $10/month for 1GB RAM + 100GB full SSD, with a 1-month free trial. I like that the webfaction user-base is big enough that lots of my questions are already answered, but small enough that staff actually answer new questions.

I've done my best to document exactly what I did, but I'm sure there are typos. Let me know if you encounter any issues!