Sunday, February 26, 2012

C++ Iterators Like Python Generators

I just showed a colleague some Python written in the style of Dave Beazley's generators for system programming, and he wondered whether we could use something similar in C++. Copying Python's syntax would be tortuous in C++, but I think we can match the spirit.

Instead of loading a logfile in Python and creating a new copy of it every time the code transforms it, code written as generators works through the logfile line-by-line, sparing memory while separating transformations clearly into separate code segments. Consider a short example:


import re
def matches_numeric(file_lines):
for line in file_lines:
if re.match('^[0-9 \t\.]$',line):
yield line


if __name__ == '__main__':
file_handle=open('infile.txt','r')
data=matches_numeric(file_handle)
columnar=split_columns(data)
checked=basic_checks(columnar)
red_flags=search_error(checked)
for line in red_flags:
print line

This way, the transformations are represented clearly and are easy to mix and match.

C++ doesn't have a yield keyword. It's iterators don't signal that they are complete by throwing StopIteration. Instead the iterator has to match an end-of-stream iterator, so that's what we can construct in C++. This means that the moral, but not syntactic, equivalent of Python generators is a function that returns a pair of iterators.

template<class SOURCE>

boost::array<split_iterator< SOURCE >,2> split_line(boost::array< SOURCE,2>& begin_end) {
boost::array<split_iterator< SOURCE >,2> iters = {{
split_iterator< SOURCE >(begin_end), split_iterator< SOURCE >()
}};
return iters;
}

These iterators are packages in a boost::array, but you could use a std::pair, or not package them, as you please, but the goal is the same, to create a nice way to express a series of transformations.

std::ifstream in_file("z.txt");
auto file_line=file_by_line(in_file);
auto splits=split_line(file_line);
while (splits[0]!=splits[1]) {
for (auto word=begin(*splitted); word!=end(splitted); word++) {
std::cout << *word << ":";
}
std::cout << std::endl;
}

The C++ looks similar to the Python, but each transformation is building on the type of the previous transformation, so it ends up doing type chaining in a less explicit way than boost::accumulators.

The code is on github.

Friday, January 13, 2012

A Quick Check for C++11 Features

The new C++11 features are exciting but I need to know which features my various compilers support, so I wrote an SCons script that tries to compile samples of new features. Seeing the samples compile is quicker, and more understandable to me, than looking up Intel's C++11 list or GCC C++0x support. Plus, I noticed that Intel's online list has almost all Yes's, but they don't list some of the Wikipedia entries that would be a No.

The SCons script, called Cpp11check, is on Github. Edit local.cfg to specify your compiler. Then run "scons" to see a summary or "scons --echo" to see every test snippet it compiles.

Looking at sample output from Intel's C++ 12.1, it looks to me like they concentrated on language features and have yet to include in the std namespace functionality that is in Boost. Seems like a decent choice. I just wish initializer lists worked. {{I, love, those, things.}}

-bash-3.2$ scons
scons: Reading SConscript files ...
INFO:SconsRoot:running with 2 threads
ERROR:SconsRoot:Could not find g++ with a version.
INFO:SconsRoot:Testing C++ compiler: /opt/intel/composer_xe_2011_sp1.6.233/bin/intel64/icpc
Checking whether the C++ compiler worksyes
Checking for c++0x conformance...-std=c++11?...-std=c++0x?...yes
Checking snippet alias templates...yes
Checking snippet alternative function syntax...yes
Checking snippet explicit final...no
Checking snippet explicit override...no
Checking snippet explicitly defaulted special member functions...yes
Checking snippet explicitly deleted member functions...yes
Checking snippet generalized_constant...no
Checking snippet hash tables...no
Checking snippet initializer lists...no
Checking snippet lambda functions...yes
Checking snippet long long int...yes
Checking snippet new string literals...no
Checking snippet nullptr...yes
Checking snippet object construction constructors calling constructors...no
Checking snippet object construction improvement using base constructor...no
Checking snippet polymorphic wrappers for function objects...no
Checking snippet random numbers...no
Checking snippet range-based for-loop...no
Checking snippet regex...no
Checking snippet right angle brackets...yes
Checking snippet shared_ptr...no
Checking snippet sizeof on member objects...yes
Checking snippet static_assert...yes
Checking snippet strongly-typed enum...yes
Checking snippet templates with variable number of values...yes
Checking snippet tuple...no
Checking snippet type inference auto...yes
Checking snippet type inference decltype...yes
Checking snippet type traits metaprogramming...no
Checking snippet uniform initialization...no
Checking snippet unrestricted unions...no
Checking snippet user-defined literals...no
Checking snippet using syntax instead of typedefs...yes
Checking snippet wrapper reference...no
scons: done reading SConscript files.
scons: Building targets ...
scons: `.' is up to date.
scons: done building targets.
Build succeeded.

HTH,
Drew

Monday, October 03, 2011

Installation of Boost.Python on Mac OS X

With the current MacPorts version of Boost 1.47.0, I can't follow the Boost.Python installation instructions. I installed the three relevant ports, boost +python27, boost-build, and boost-jam. The installation instructions recommend using the python/quickstart directory, and the include paths in the Jamfile don't exist. The Jamfile in the port is even missing the "import python" statement necessary to load the python-extension rule. The lesson is that it's OK to give up on a MacPorts installation when you are using an unusual feature.

Install boost-1.47.0 from source. Given that there is already a MacPorts installation, whose default install directory is /opt/local, put the boost_1_47_0 directory directly into /opt as /opt/boost_1_47_0. Follow build instructions to make boost's bjam and b2, so that the whole lot end up in /opt/bin, /opt/include, and /opt/lib. My build command was:

sudo ./bootstrap.sh --with-bjam=/opt/local/bin/bjam --with-toolset=darwin --with-python=/opt/local/bin/python2.7 --prefix=/opt --without-libraries=mpi,regex

I was trying to use the MacPorts bjam, but Boost built its own, anyway, which turns out to be good because it builds the newer b2 version of bjam. Boost.Python defaults to the Mac OS X default Python, so why not specify your favorite version? Then return to the Boost.Python installation instructions. I had to set the path so that the newer Boost is earlier:

export DYLD_LIBRARY_PATH=/opt/lib
export PATH=/opt/bin:$PATH

There are still going to be errors about conflicts with isspace and other functions in localfwd.h. These come from a conflict with newer definitions in pyports.h that are designed to handle UTF-8. I got around these by installing MacPorts port for gcc45. Then, in the Jamroot of the sample directory, add:

using darwin : 4.5.3 : g++-mp-4.5 ;

The darwin import is derived from the gcc import, so you can give it pretty much the same options. In this case, it points directly to g++. Once this is done, you can run "sudo bjam" in the quickstart directory to build Boost.Python. This will build the libboost_python library so you can now run without using sudo in your project's working directory.

Still not done. If you are using Numpy arrays in Boost.Python, then you need to include the correct headers, meaning '#include "numpy/arrayobject.h"'. These are installed in a separate place on my Mac, again likely by MacPorts, but can be found by a change to the python-extension rule.

python-extension myproject : file.cpp
: /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/include ;

How do you find these things? Use the Mac's whole-machine find command on the command-line.

mdfind -name arrayobject.h -onlyin /opt

In the end, the boost-build.jam contains "boost-build /opt/boost_1_47_0/tools/build/v2 ;" and the project Jamroot uses "use-project boost : /opt/boost_1_47_0".

HTH,
Drew

Wednesday, March 09, 2011

skipping incompatible in /usr/lib64

I was building a 32-bit executable on a 64-bit machine, using the -m32 switch, and thought I knew how to deal with this error from the g++ or gcc compiler.

/usr/bin/ld: skipping incompatible /usr/lib64/libelf.so when searching for -lelf
/usr/bin/ld: skipping incompatible /usr/lib64/libelf.a when searching for -lelf
/usr/bin/ld: cannot find -lelf

I checked the link line for any binaries that might be 64-bit by running the file command, as in "file tau_run.o". They all looked 32-bit. I checked my LIBRARY_PATH environment variable, which tells the linker in which directories to find libraries.

The problem turned out to be not that the compiler was looking in the wrong place but that it was looking for the a file that did not exist. The dyninstAPI library has a file called libelf.so.1, but they did not include a link from libelf.so.1 to libelf.so, which is what the compiler wants to find. The compiler was doing a great job of rejecting 64-bit libraries and its error message was actually rather clear that it could not find the -lelf it was looking for.

As usual, HTH.
Drew

Tuesday, February 08, 2011

How Does R Build a Package with C Source Code?

According to Writing R Extensions, I put C files into a package subdirectory and supply a configure.ac and Makevars.in, and the command “R CMD INSTALL” will compile the extensions. I’m working in Linux, so it uses configure somehow, but what exactly does it do? Where do the defaults come from and how can I change them? As of R version 2.12.1, the build scripts are in the R tools package, no longer written in Perl.

There are a lot of files to put in an R package, from help files to a general description of the library. We are concerned with these three, as an example, for a library called foo.
  • foo/configure.ac - Input to create a configure script.
  • foo/src/Makevars.in - The configure script will create Makevars from this file.
  • foo/src/foo.c - This is our C source to compile.
There are three commands R makes available to work with source while you develop it.
  • R CMD build foo - This creates a tar file of the directory suitable for installing later.
  • R CMD INSTALL foo - This compiles and installs foo into a directory.
  • R CMD check foo - This does all kinds of detailed checks on the health of the R package, all listed in Writing R Extensions, and it also calls R CMD install, so it’s a good way to smoke test compilation.
If you call “R CMD check foo”, then it calls the library tools:::.check_packages() which eventually installs the library into a local directory by calling R again:

R CMD INSTALL -l '/home/username/Documents/rlib/foo.Rcheck' --no-html --no-multiarch '/Users/ajd27/Documents/rlib/foo'

As an aside, calling “R CMD blah” sets R environment variables and then invokes a shell script which looks for a script called blah in R’s bin directory. If it doesn’t find one, it just executes whatever you passed it. Try “R CMD ls -la .” or “R CMD env|sort” to see what environment variables R defines.

The install command is implemented in tools:::.install_packages(). The easiest way to see what it does is to look in the R source code, in the src/library/tools/R/install.R. It executes these steps on your behalf.
  1. Define R-specific variables. These are listed below for one sample.
  2. Call autoconf to create foo/configure from foo/configure.ac.
  3. Call foo/configure, whose main goal is to make foo/src/Makevars from foo/src/Makevars.in.
  4. Look for makefiles in foo/src and call make in foo/src to create shared libraries.
We can specify arguments to configure on the INSTALL command line, with --configure-args and --configure-vars. For instance, typing

R CMD INSTALL “--configure-args=--enable-lizards --disable-frogs”

will call

./configure --enable-lizards --disable-frogs

Use of quotation marks varies depending on the shell. The only way R modifies the execution of the configure command is to define the variables listed at the end of this post. The Guide to Writing R Extensions, however, recommends that authors of configure scripts use R to set defaults using R’s config command. Try

R CMD config --help

to see a list of variables R remembers from when it was configured and compiled.

When R calls make, it tacks a few files together. The first is the Makevars that configure just customized. The next is a list of variables, mostly from when R, itself, was configured. The last, shlib, is the target to build a shared library.

make -f Makevars -f /opt/local/lib/R/etc/x86_64/Makeconf -f /opt/local/lib/R/share/make/shlib.mk SHLIB='foo.so' OBJECTS='foo.o'

The only variables not defined explicitly with Makeconf are
  • PKG_CFLAGS - Where includes go.
  • PKG_CPPFLAGS - For the C preprocessor, if relevant.
  • PKG_CXXFLAGS - For the C++ compiler.
  • PKG_OBJCFLAGS - Objective C’s CFLAGS.
  • PKG_OBJCXXFLAGS - Objective C++’s CFLAGS.
  • PKG_LIBS - Where we put libraries and the directories that hold them.
These are the only variables we should bother to define within Makevars.in. Everything else, from CXX to CFLAGS, is already explicitly within Makeconf, so tough cookies if we want to change it, unless we are willing to make a custom target in our own Makefile in the src directory.

For our package, foo, we want to give the person installing the software a way to customize the include directories and library locations, so we probably want to check in our configure.ac for the existence of FOO_CFLAGS and FOO_LIBS and assign those values to PKG_CFLAGS and PKG_LIBS. Using package-specific naming helps when there are multiple packages installed, an using variables at all helps people installing avoid figuring out how to pass command-line arguments to R.

Sample Variables Defined Before Configure and Make

AWK=awk
DYLD_LIBRARY_PATH=/opt/local/lib/R/lib/x86_64
EGREP=/usr/bin/grep -E
LN_S=ln -s
MAKE=make
PAGER=/usr/bin/less
PERL=/opt/local/bin/perl
R_ARCH=/x86_64
R_BROWSER=/opt/local/bin/kfmclient
R_BZIPCMD=/opt/local/bin/bzip2
R_DOC_DIR=/opt/local/lib/R/doc
R_GZIPCMD=/opt/local/bin/gzip
R_HOME=/opt/local/lib/R
R_INCLUDE_DIR=/opt/local/lib/R/include
R_LIBRARY_DIR=/Users/username/Documents/rlib/foo.Rcheck
R_LIBS=/Users/ajd27/Documents/rlib/foo.Rcheck
R_LIBS_SITE=
R_LIBS_USER=~/R/x86_64-apple-darwin10.6.0-library/2.12
R_PACKAGE_DIR=/Users/ajd27/Documents/rlib/foo.Rcheck/foo
R_PACKAGE_NAME=foo
R_PAPERSIZE=letter
R_PDFVIEWER=/opt/local/bin/ggv
R_PLATFORM=x86_64-apple-darwin10.6.0
R_PRINTCMD=lpr
R_RD4DVI=ae
R_RD4PDF=times,hyper
R_SESSION_TMPDIR=/var/folders/4Z/4ZJIk-FGFl4hDOSWoQMBmU+++TI/-Tmp-//Rtmp3r18dL
R_SHARE_DIR=/opt/local/lib/R/share
R_TEXI2DVICMD=/opt/local/bin/texi2dvi
R_UNZIPCMD=/usr/bin/unzip
R_ZIPCMD=/usr/bin/zip
SED=/usr/bin/sed
TAR=/usr/bin/gnutar
TR=/usr/bin/tr
WHICH=/usr/bin/which

Saturday, December 04, 2010

Installing Rmpi under Linux on Ranger at TACC

Rmpi is a package for the R language to allow it to use MPI to run in parallel. Installation on Ranger is complicated by Ranger's selection of compilers, MPI libraries, and default installed packages. I got it to work, so here are hints.

R's default compiler options for installed packages are the same as those used to compile R. The Ranger copy of R was compiled with gcc. MPI libraries on Ranger are compiled with either PGI or Intel compilers, and there can be incompatibilities loading shared libraries from PGI and Intel into gcc code (especially the version of PGI on Ranger). An easy way to get started healthily is to recompile R with the Intel compilers, so "module swap pgi intel" and configure a home directory installation of R using mpicc, mpicxx, and mpif90 with the Known Good Intel options for Ranger, "-O2 -xW -fPIC".

The Rmpi library depends on the R wrapper for the SPRNG library which, of course, depends on SPRNG itself. It also needs gmp. Ranger's gmp library is just fine, so "module add gmp". If you want to know includes and libs for any Ranger library, use "module help ". The SPRNG library has a problem, though, because it wasn't compiled with -fPIC, which is necessary to build shared libraries on this platform.

Rmpi needs SPRNG 2.0, no other version. The Ranger sprng2.0 library is compiled without -fPIC, so we make our own. Download it with "curl http://sprng.cs.fsu.edu/Version2.0/sprng2.0b.tar.gz|tar zxf -". Build the MPI version with the Intel compilers, again, using the Known Good Intel Options, "-O2 -xW -fPIC" and mpicc, mpicxx, mpif90, but add another option Fortran flags to help Fortran compatibility, "-assume 2underscores". This comes from a little bug in the SPRNG autoconf, but it only affects Fortran builds. You have to set this information in sprng2.0/make.CHOICES and sprng2.0/src/make.INTEL.

SPRNG doesn't make a shared executable. No worries. We can rebuild it as a shared library. Go to the directory of libsprng.a, and repack it with a reference to the gmp library.

ar -x libsprng.a
icc -shared -fPIC -Wl,-rpath,$TACC_GMP_LIB -Wl,-zmuldefs *.o -L${TACC_GMP_LIB} -lgmp -o libsprng.so

Were you to forget to add the reference to -lgmp, you would later see that libsprng.so cannot be loaded by R because it cannot find a function __gmp_cmp.

To build rsprng, first download the tar file. Then invoke R's installer with hints about the location of sprng. Mine looked as follows.

~/R/bin/R CMD INSTALL --configure-vars='CFLAGS="-I/opt/apps/gmp/4.2.4/include" LDFLAGS="-L/opt/apps/gmp/4.2.4/lib"' --configure-args='--with-sprng=/share/home/00933/tg459569/sprng' rsprng_1.0.tar.gz

If you get the paths correct, that should build. Now the Rmpi library wants the same attention as Rsprng. First download it. Then let R build it.

~/R/bin/R CMD INSTALL --configure-vars='CFLAGS="-O2 -xW -fPIC" LDFLAGS="-O2 -xW -fPIC"' --configure-args='--with-Rmpi-type=MPICH' Rmpi_0.5-9.tar.gz

When R installs, it will fail its first test because miprun won't work on a Ranger login node. Just go to http://math.acadiau.ca/ACMMaC/Rmpi/index.html and try the tutorial in a batch job.

HTH
Drew

Friday, September 03, 2010

A Standard Way to Use Python's Logging Module

The documentation for Python's logging module shows you its great features, but it doesn't mention standard usage if you are writing a little application or a library.

Make a separate logger for each file, and name the loggers to match the hierarchy of modules. For the file acert/hfesr/Runner.py, it starts with:

import logging
logger = logging.getLogger('acert.hfesr.Runner')
class Runner(object):
def __init__(self):
logger.debug('Initializing Runner')

That's quite simple. Then you can enable and disable logging by file or by module.

The second tip is that, if you have written a library, don't use logging.basicConfig() in that library because it makes logging handlers that are difficult for subsequent client applications to quiet.

HTH

/dev/null for C++ ostream

I often make C++ classes that write to some stream given to them:

Bear::Save(std::ostream& out) {
out << "Fur is " << color << std::endl;
out << "Age is " << age << std::endl;
}


Editing somebody's code today, I needed an ostream equivalent of /dev/null, some stream into which a class could write without printing anything. This can be done by creating a stream buffer that never prints.

class nullbuf : public std::basic_streambuf
{
protected:
virtual int overflow(int c) { return c; }
};

class nullstream : public std::basic_ostream > {
nullbuf _streambuf;
public:
nullstream() :
std::basic_ostream &gt(&_streambuf)
{ clear(); }

};

Using it looks like any other stream:

nullstream dev_null;
dev_null << "Nobody will ever see this.";
ostream* output = new nullstream();
(*output) << "This will be invisible.";

As usual, hope this helps.