Object-oriented Python Interfaces to NAG Routines

30/10/2013

There have been a number of tutorials about how to call a NAG Library from Python: for example, Mike Croucher’s pages over at Walking Randomly or some of the Python posts on the NAG blog. For this post here I’ve dabbled with a complementary example which I’ve developed mainly as a proof-of-concept for how object-oriented wrappers to a NAG Library might look. Potentially this is the kind of interface design that you might want to put on top of Mike Croucher’s ctypes set up. You could then customize it yourself to provide the kind of functionality you need in your own application.

Here’s the code:

#!/usr/bin/env python
"""
An example Python interface to a NAG (Fortran) Library, using ctypes.
"""
class NagFunctionError(Exception):
    "A base NAG exception class."
    def __init__(self, function_name, code, message=None):
        "Here, message - if supplied - should be list."
        self.code = code
        self.function_name = function_name
        self.message = message

    def __str__(self):
        if (self.message is None):
            error_message = []
        else:
            error_message = self.message
        error_message.append('ABNORMAL EXIT, code ' + str(self.code) +
                             ', from NAG function ' + self.function_name + '.')
        return '\n** ' + '\n** '.join(error_message)

class NagLicenceError(NagFunctionError):
    "A NAG exception class for a missing licence."
    def __init__(self, function_name, code):
        super(self.__class__, self).__init__(function_name, code,
                                             ['Function call not licensed.'])

class NagTypeError(NagFunctionError):
    "A NAG exception class for an invalid argument instance."
    def __init__(self, function_name, msg):
        super(self.__class__, self).__init__(function_name, -199, msg)

class NagFunction(object):
    "A root NAG function class."
    from ctypes import c_int, Structure
    # The default error-handling mode for each (Fortran) Library call:
    ifail = c_int(1)

    class Complex(Structure):
        "A class to emulate a complex value using ctypes."
        from ctypes import c_double
        _fields_ = [("re", c_double), ("im", c_double)]

    def check_fortran_ifail(self):
        """
        Takes action based on the Fortran ifail value returned by a NAG Fortran
        Library call.
        """
        if (self.ifail.value == 0):
            return
        elif (self.ifail.value == -399):
            raise NagLicenceError(self.__class__.__name__, self.ifail.value)
        else:
            raise NagFunctionError(self.__class__.__name__, self.ifail.value)

    def check_type(self,
                   argument,
                   argument_name,
                   expected_type):
        "Verifies that argument is an instance of expected_type."
        if (isinstance(argument, expected_type)):
            return
        raise NagTypeError(self.__class__.__name__,
                           ['Invalid type ' + argument.__class__.__name__ +
                            ' for ' + argument_name + '.',
                            'Expected type ' + expected_type.__name__ + '.'])

class NagRootsLambertWComplex(NagFunction):
    "A NAG class for the complex Lambert W function, c05bb."
    def __init__(self, naglib_h):
        super(self.__class__, self).__init__()
        self.fcn_h = naglib_h.c05bbf_
        self.fcn_h.restype = None

    def evaluate(self, branch, offset, z):
        "The wrapper to the NAG Library call."
        from ctypes import byref, c_double, c_int
        self.check_type(branch, 'branch', int)
        self.check_type(offset, 'offset', bool)
        self.check_type(z, 'z', complex)
        branch_f = c_int(branch)
        if (offset):
            offset_f = c_int(1)
        else:
            offset_f = c_int(0)
        z_f = self.Complex(z.real, z.imag)
        w_f = self.Complex()
        resid_f = c_double()
        self.fcn_h(byref(branch_f),
                   byref(offset_f),
                   byref(z_f),
                   byref(w_f),
                   byref(resid_f),
                   byref(self.ifail))
        self.check_fortran_ifail()
        return complex(w_f.re, w_f.im), resid_f.value

def c_example():
    """
    Calls the Lambert W wrapper for a range of input values and prints the
    results.
    """
    def format_complex(z):
        "Formats a complex z for output as an (a, b) pair."
        return ''.join(['(',
                        format(z.real, '15.8e'),
                        ', ',
                        format(z.imag, '15.8e'),
                        ')'])

    from ctypes import cdll
    import sys
    naglib = '/path/to/libnag_nag.so'
    naglib_h = cdll.LoadLibrary(naglib)
    branch = 0
    offset = False
    sys.stdout.write('Lambert W function values for branch = ' + str(branch) +
                     ', offset = ' + str(offset) + ':\n')
    for z in [
            complex(0.5, -1.0),
            complex(1.0, 2.3),
            complex(4.5, -0.1),
            complex(6.0, 6.0)
    ]:
        w, resid = NagRootsLambertWComplex(naglib_h).evaluate(branch, offset, z)
        sys.stdout.write('z = ' + format_complex(z) +
                         ', W(z) = ' + format_complex(w) +
                         '. residual = ' + format(resid, '15.8e') + '\n')

if (__name__=='__main__'):
    c_example()

The Nag*Error classes, as you can see, are independent of the platform and NAG Library language being used. They serve as suggestions for how you could map the mechanism of the Fortran ifail integer or C NagError structure to something Pythonic.

The core NagFunction class would presumably contain all the utility methods used when wrapping the NAG calls. Since the NAG routines are statically typed you may want to provide a unified NAG look-and-feel by checking for correctly-typed input with the check_type method. If not you could just rely on Python’s own checking in the ctypes constructors.

At this point you could perhaps break the provided Library functionality into its conceptual Chapters by using a class hierarchy to reflect the grouping. I’ve chosen to subclass from NagFunction without any further indirection.

The work for wrapping a Library call is done in an evaluate method in a class for the Library routine. For the Lambert W function in this example the wrapping amounts simply to massaging the method’s input to be more C like, calling the Library routine, checking its exit status and then returning the relevant output data as Python objects.

Source code: https://github.com/matcross/blog/tree/master/object-oriented-python-interfaces-to-nag-routines

Customizing svn diff

26/10/2013

A few years ago I wrote about how my discovery of colordiff had improved my svn diff experience. Since then I’ve made a number of other tweaks to better customize my diffing:

#!/bin/sh -u
exclude_file=/path/to/exclude/file
svn diff --no-diff-deleted --diff-cmd diff -x "--ignore-all-space --text --unified=0" $* | \
  filterdiff --verbose --exclude-from-file ${exclude_file} ${tmpfile} | \
  grep -v --file=${exclude_file} | colordiff | less -RS

I never want to see full diffs for deleted files, hence I give the --no-diff-deleted option to svn diff. (I would also use --no-diff-added if it existed!)

I also regularly diff artifacts archived from automatic jobs which build and test NAG Fortran and C Libraries. There are some differences there that, although they need to be archived for possible future reference, I never need to see on a day-to-day basis — diffs for non-repeatable RNGs, for example. I use filterdiff (from patchutils) to remove these from the svn diff output. I have the patterns to exclude listed in a separate file (pointed to by the exclude_file in the script). These patterns are shell wildcards, so look something like *g05kg*e.x for ignoring the results differences from the Fortran (respectively C) example program for the NAG non-repeatable RNG initializer g05kgf (respectively g05kgc).

A block of differences excluded by filterdiff leaves behind the separator Index: filename from svn diff, so in the script above a call to grep filters these out too. Unfortunately that means each exclusion needs to appear twice in the exclude_file: once as a shell wildcard for filterdiff and then once as a basic regular expression for grep: so as both *g05kg*e.x and g05kg.e\.x say. I haven’t yet worked out how to unify this. You also end up with orphaned ==== separator lines as well, but I don’t feel too inconvenienced by this: I just jump between the resulting diffs according to the presence of the ^Index separator for the diffs that haven’t been excluded. These diffs will come out colorized via colordiff and less -R as discussed in the older post.

Note also that Index lines for deleted files, filtered by svn diff --no-diff-deleted, still appear in the final output. I don’t exclude these at the grep step because I still like to see that a file has been deleted, without needing to see what has been deleted. And although the facility exists in the [helpers] section of ~/.subversion/config to set the script above to override Subversion’s diff implementation, I like to run my diffs by invoking this script explicitly and then I can still use plain svn diff as a sanity check or fallback, especially if filtering is not required or whitespace in the diff is significant.

Source code: https://github.com/matcross/blog/tree/master/customizing-svn-diff

Jenkins comes to Nag

15/07/2013

At NAG we build a lot of Fortran and C code as part of our Libraries and Compiler. We have over 40 years experience in software engineering and in numerical analysis and algorithms. However, not being a large organization means that it can be hard to keep up with the vanguard of new SE methodologies and frameworks. There are some approaches though that seem a no brainer to have as part of the workflow in a modern software company. Convergent evolution can lead to their development and adoption without any prompting from the greater community.

For decades we’ve followed and benefited from many of the principles of continuous integration (for example, version control and build automation—but no longer with SCCS or shell scripts!) and we’ve been running a self-hosted and self-maintained manager for automatic builds since the early 2000s. But we began to find the system showing its age and straining at the seams. The build manager uses dumb ssh-ing to launch each build on a slave, so there is no communication channel between a build and the driver. The build indexer has become quite unmaintainable and is, shall we say, rustic: it resembles a web page from 1995.

NAG Library automatic build implementation reports

We don’t have the resources to pump any significant improvements into these systems, so a couple of years ago we had a look around for replacements.

Buildbot is nice. We liked the visualizations it comes with out of the box, and some of us know Python so we felt confident we’d be able to set up the software. There’s no way of configuring by a GUI though, which restricts for us the audience of potential maintainers.

Hudson seemed too Java focused to fit easily into our model.

Then we lost some personnel and so the investigation stalled. It remained 1995 for a few more years.

In 2011 we audited and codified many of our processes; in December 2011 we were awarded ISO 9001 certification. There was a renewed interest in enhancing the reporting that we use for measuring code health. At around this time we also made major changes to our release schedules. We needed the greatest amount of automation, the smoothest development process, the most informative reporting of health that we could manage.

In the time since we looked at Buildbot and Hudson we’d heard good things about Jenkins, a fork of Hudson (but no relation to Leeroy). I tried Jenkins out on some small projects and I saw good things. The GUI makes it a doddle to experiment with a set up. There are lots of nice, relatively-mature and well-maintained plugins for email notifications, tracking violations of user-defined coding standards, reporting on code coverage, Chuck Norris, … So we set up a small group to prototype a new configuration for use by the whole company, and then, all going well, to port over the old system.

The NAG Jenkins is getting off the ground at the moment. Here are some things we’ve implemented and discovered (i.e., been burned by).

  • We want as many of the slave workspaces as possible to be on an NFS disk. In practice this means for all Unix-like slaves. With this configuration we can easily, at a low level, peek at the builds if necessary – using find, or whatever. Initially we went as far as using a single workspace for every such node: something like /users/jenkins/workspace/. This is bad! The Remote FS root for a node must be unique to that node. Otherwise every node that launches will unpack the Jenkins .jar files into the same directory. If there are different Java runtimes accessing these same files then sporadic End of File errors will occur (presumably because of the different runtime states becoming corrupted or confused).
  • As a fun exercise in getting to know more about how Jenkins plugins work and how to develop them, we wrote a warnings parser for the NAG Fortran Compiler and added it to the Jenkins warnings plugin.
  • Some of our projects are pretty monolithic. We were hoping to use the HTML Publisher plugin to publish our build reports for easy access from a job’s page. As far as we could see from its source code, this plugin does recursive directory traversals and accumulates its findings in memory. On some of our older slaves, with the workspace on an NFS disk, this was just taking an intolerable amount of time. As a compromise a post-build Archive the artifacts is good enough for us instead.
  • There were resource problems when trying to report code coverage using gcov, gcovr and Coberatura as described by SEMIPOL. Our gcov-instrumented build generates 20,000 *.gcda files and 23,000 *.gcno files. This is too much data for gcovr to aggregate, which it tries to do by building an XML DOM tree in memory. In the end we just use lcov to scan the gcov files, and then we use the HTML Publisher to publish the report index.
  • Mac slaves need to run headlessly.
  • Controlling Windows slaves as a service requires you to modify the user rights settings (although, as Jenkins tells us on the configure page for such a node, we probably deserve grief if we use this launch method). The Windows user launching the slave needs to be assigned the right to Log on as a service: start secpol and add the user under Local Policies -> User Rights Assignments -> Log on as a service.
  • We have a version-controlled $JENKINS_HOME, which Jenkins itself backs up as a periodic job (although I understand that there’s a plugin which does a similar thing). Inspired by Keeping your configuration and data in Subversion we use a parametrized job that runs

    #!/bin/tcsh
    cd $JENKINS_HOME
    svn add -q --parents *.xml jobs/*/config.xml users/*/config.xml userContent/* >& /dev/null
    svn pd -q svn:mime-type *.xml jobs/*/config.xml users/*/config.xml userContent/*.xml
    echo "warnlog\n*.log\n*.tmp\n*.old\n*.bak\n*.jar\n*.json\n*.lck\n.owner" > myignores
    echo "identity.key\njenkins\njenkins.security*\nlog*\nplugins*\nsecret*\nupdates" >> myignores
    svn ps -q svn:ignore -F myignores .
    rm myignores
    echo "builds\nlast*\nnext*\n*.txt\n*.log\nworkspace*" > myjobignores
    echo "cobertura\njavadoc\nhtmlreports\nncover\ndoclinks" >> myjobignores
    svn ps -q svn:ignore -F myjobignores jobs/*
    rm myjobignores
    svn status | grep '!' | awk '{print $2;}' | xargs -r svn rm
    svn ci --non-interactive --username=jenkins -m "$1"
    if ($status != 0) then
      exit 1
    endif
    svn st
    

    Source code: https://github.com/matcross/blog/tree/master/jenkins-comes-to-nag.

  • The facility for multi-configuration projects seemed attractive for one of our jobs, which parametrizes especially easily over the source-code branch and target architecture. We need to have all builds in the matrix share the same workspace and to build in a shared checkout. By default each build in a matrix project runs in its own workspace, but as with a free-style project this is easy to configure: select Use custom workspace in the Advance Project Options of the matrix job, which will then uncover a Directory for sub-builds field. Also by default the master job performs the designated source-code step (e.g., Emulate clean checkout) but then so does each child job in the matrix! So all of the jobs in our matrix clash as they try to manipulate the checkout at the same time. There doesn’t appear to be a way to customize this short of doing the correct work you want using explicit job commands. So for the time being we’re going to maintain individual and extremely similar jobs for this matrix. With some strong command-line fu we can easily batch modify all the configurations for these jobs when necessary.

Currently we have a roster of 24 Library builds in different configurations, plus a further 20 or so jobs for additional tests of our development infrastructure and for bookkeeping. There are a few next steps we’d like to try out.

  • We want to see how far we can go towards configuring a full implementation of a Library, from a clean checkout all the way through to a final tarball to send for review and QA. The nature of some of the algorithms we are testing makes it difficult to enable entirely-automatic verification, but clearly but we should be able to use Jenkins to create a workable first approximation.
  • Our canonical sanity-checking Library build is made using the NAG Fortran Compiler. All the current Library builds in Jenkins run nightly, so we need to look at making the checking build more continuous, after each commit (modulo some quiet period).
  • Other than the main job page for our projects we don’t have any clever notifications enabled for tracking or visualizing the health of the jobs. Hopefully we can sort out some nice graphics to show on a TV.

We’re pretty excited about what we can do with Jenkins and what we can get from it. For automatic builds at NAG the future is looking bright; the future is looking blue (well, green).

Python arborealis

02/01/2013

I’ve been implementing a k-ary circularly linked tree (that is, a tree where each node could have a previous, a next, an up, or a down node) in Python.

The page Trees – How to Think Like a Computer Scientist is a nice introduction to some of the background concepts.

To __init__ each node of my TreeNode class you just need to provide the node’s contents and its links

class TreeNode:
    
    def __init__(self,
                 body=None,
                 Prev=None,
                 Next=None,
                 Up=None,
                 Down=None):
        self.body = body
        self.Prev = Prev
        self.Next = Next
        self.Up = Up
        self.Down = Down

I thought hard about getting __repr__ and __str__ right for this class (see Built-in Functions – repr, for example), but __repr__ is a bit tricky if you want to avoid too much recursion. I wrote a small getRepr function for doing this on any class. Members that are instances of the parent are recursed to a depth of one.

def getRepr(self,
            depth=0):
    keys = list(self.__dict__.keys())
    keys.sort()

    fields = []

    for key in keys:

        if (isinstance(self.__dict__[key], self.__class__)):

            if (depth < 1):
                value = getRepr(self.__dict__[key], depth=depth+1)
            else:
                value = ''.join(['<',
                                 self.__class__.__name__,
                                 '(...)>'])

        else:
            value = str(self.__dict__[key])

        fields.append(''.join([key,
                               '=',
                               value]))

    return ''.join([self.__class__.__name__,
                    '(',
                    ', '.join(fields),
                    ')'])

Hence my TreeNode.__repr__ is then really simple, and my class’s __str__ just returns the node’s body

    def __repr__(self):
        return getRepr(self)

    def __str__(self):
        return str(self.body)

Traversing a node goes all the way down to the tip of each branch and then backtracks to the point that a sibling exists. By default my traverse method processes each node in print mode.

    def isLeaf(self):
        return (self.Down is None)

    def traverse(self,
                 process_node=None):
        depth = 0
        node = self

        while (node is not None):

            if (process_node is None):
                import sys
                sys.stdout.write(' '*depth*2 + str(node) + '\n')
            else:
                process_node(node)

            if (node.isLeaf()):
            
                while (node is not None and
                       node != self and
                       node.Next is None):
                    node = node.Up
                    depth = depth - 1

                if (node is None or
                    node == self):
                    break

                node = node.Next
            else:
                node = node.Down
                depth = depth + 1

When it comes to actually populating a tree, my specific application is unusual in that I have a tree already, which is output by an external C program. So I currently have no functions for creating (or deleting) trees – I just read in my external tree from a file and set the links for each node accordingly. As an example (albeit somewhat clunky) of setting up a small tree directly, here’s the tree for the pseudocode 7 = 1 + 2 * 3; end (with the correct operator precedence!)

root = TreeNode()
asgn = TreeNode('=')
root.Down = asgn; asgn.Up = root
lhs = TreeNode(7)
asgn.Down = lhs; lhs.Up = asgn
plus = TreeNode('+')
lhs.Next = plus; plus.Prev = lhs; plus.Up = asgn
one = TreeNode(1)
plus.Down = one; one.Up = plus
times = TreeNode('*')
one.Next = times; times.Prev = one; times.Up = plus
two = TreeNode(2)
times.Down = two; two.Up = times
three = TreeNode(3)
two.Next = three; three.Prev = two; three.Up = times
end = TreeNode('end')
asgn.Next = end; end.Prev = asgn
print(repr(root))
root.traverse()

giving

TreeNode(Down=TreeNode(Down=<TreeNode(...)>, Next=<TreeNode(...)>, Prev=None, Up=<TreeNode(...)>, body==),
  Next=None, Prev=None, Up=None, body=None)
None
  =
    7
    +
      1
      *
        2
        3
  end

Source code: https://github.com/matcross/blog/tree/master/python-arborealis

In at the Dep End

30/10/2012

The NAG Fortran Compiler has been updated to Release 5.3.1, which includes a few fixes to the module-dependency analyzer. Fortran modules require compilation before they can be Used so generating the correct dependencies for a (GNU) makefile is pretty vital (and tricky).

Here’s an example project having a non-trivial tree of module dependencies

$ cat main.f90
Program main
  Use module6
  Use module5
  Call msub6
  Call msub5
End Program
$ cat module6.f90
Module module6
Contains
  Subroutine msub6
    Use module7
    Call msub7
  End Subroutine
End Module
$ cat module5.f90
Module module5
Contains
  Subroutine msub5
    Use module4
    Call msub4
  End Subroutine
End Module
...
$ cat module9.f90
Module module9
Contains
  Subroutine msub9
    Print *, "OK9"
  End Subroutine
End Module
$ cat module0.f90
Module module0
Contains
  Subroutine msub0
    Print *, "OK0"
  End Subroutine
End Module

and here’s a dumb Python script to make those files

#!/usr/bin/env python

nfiles = 10

for i in range(nfiles):
    file_fo = open('module' + str(i) + '.f90',
                   'w')
    file_fo.writelines(['Module module' + str(i) + '\n',
                        'Contains\n',
                        '  Subroutine msub' + str(i) + '\n'])

    if (i in [0, nfiles - 1]):
        file_fo.write('    Print *, "OK' + str(i) + '"\n')
    else:

        if (i > nfiles / 2):
            m_no = i + 1
        else:
            m_no = i - 1

        file_fo.writelines(['    Use module' + str(m_no) + '\n',
                            '    Call msub' + str(m_no) + '\n'])

    file_fo.writelines(['  End Subroutine\n',
                        'End Module\n'])
    file_fo.close()

file_fo = open('main.f90',
               'w')
file_fo.writelines(['Program main\n',
                    '  Use module' + str(nfiles / 2 + 1) + '\n',
                    '  Use module' + str(nfiles / 2) + '\n',
                    '  Call msub' + str(nfiles / 2 + 1) + '\n',
                    '  Call msub' + str(nfiles / 2) + '\n',
                    'End Program\n'])
file_fo.close()

To create an accurate view of the project’s dependencies for a makefile you first need a dependency analyzer. With nagfor =depend this is quite easy:

$ nagfor =depend -otype=make *.f90
NAG Fortran Dependency Analyser Release 5.3.1(909)
NI_EQ==
NI_SC=\;
main.o:	main.f90
main.o:	module6.mod
main.o:	module5.mod
module0.o:	module0.f90
module0.mod:	module0.f90
module1.o:	module1.f90
module1.o:	module0.mod
module1.mod:	module1.f90
module2.o:	module2.f90
module2.o:	module1.mod
module2.mod:	module2.f90
module3.o:	module3.f90
module3.o:	module2.mod
module3.mod:	module3.f90
module4.o:	module4.f90
module4.o:	module3.mod
module4.mod:	module4.f90
module5.o:	module5.f90
module5.o:	module4.mod
module5.mod:	module5.f90
module6.o:	module6.f90
module6.o:	module7.mod
module6.mod:	module6.f90
module7.o:	module7.f90
module7.o:	module8.mod
module7.mod:	module7.f90
module8.o:	module8.f90
module8.o:	module9.mod
module8.mod:	module8.f90
module9.o:	module9.f90
module9.mod:	module9.f90

Note that no .mod files are required to pre-exist.

For the project above a simple makefile might look like

$ cat Makefile
all: main.r

main.r: main.exe
        ./$< > $@ 2>&1
        cat $@

SOURCES := $(sort $(wildcard module*.f90))
OBJECTS := $(SOURCES:.f90=.o)

main.exe: main.o $(OBJECTS)
        nagfor $^ -o $@

%.o %.mod: %.f90
        nagfor -c $<

clean:
        rm -f *.r *.exe *.o *.mod

but of course that doesn’t take the module dependencies into account yet, so that trying to make results in something like

$ make
...
Fatal Error: main.f90, line 2: Cannot find module MODULE6
...

There are several great discussions around of advanced auto-dependency generation including one (primarily for C source, although some of the ideas are transferrable to Fortran) by Paul D. Smith that inspired the approach below.

Essentially we use GNU make‘s include statement to build in a ‘pre-pass’ that generates a dependency file for all Fortran source in the project:

DEPS := $(SOURCES:.f90=.P) main.P

%.P: %.f90
        nagfor =depend -otype=make $< -o $@

include Depends

Depends: $(DEPS)
        cat $^ > $@

Thus the file Depends is automatically built first for a clean make and it’s updated and re-included into the makefile if any of the dependent Fortran source changes. It will work with -jN parallel make.

Then we see

$ make
...
 OK9
 OK0

The scheme is also reasonably portable, so that other compilers can be used for building the executable – as long as they output .mod files at all. Of course, you may need to postprocess the output from nagfor =depend for compilers that uppercase the names of modules when they create .mod files. Plus it goes without saying that if you’re using make you’ll probably want to follow the rule of one module per file.

Source code: https://github.com/matcross/blog/tree/master/in-at-the-dep-end

How to set up Fedora for work

14/07/2012

I got a new desktop machine at work. Our helpful sysadmin installed Fedora 17 64-bit for me (as a NIS client and all that jazz). This post is a note-to-self about the additional configuration I had to do to finish getting it ready.

I gave myself sudo privileges: as the local admin user who was added during the install,

sudo visudo
# Added me ALL=(ALL) ALL to the who-what-which section

(Reminder: <ESC>:wq saves and quits in vi! I always forget that syntax…)

Alternatively I guess I could have added myself to the wheel group…

I often need to build 32-bit code, and from this 64-bit environment with gcc -m32 I saw

/usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory

To resolve that I needed to install glibc-devel.i686.

I ran into other missing 32-bit components too, namely

/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/4.7.0/libgcc_s.so when searching for -lgcc_s

which, it turns out, is a somewhat-unhelpful way of saying that I don’t have the 32-bit libgcc package installed. yum provides was helpful here:

sudo yum provides "libgcc_s*"
...
sudo yum install libgcc.i686

Also, when doing a link using -static -lm -lutil, I got other terse messages

#/usr/bin/ld: cannot find -lm
#/usr/bin/ld: cannot find -lutil
#/usr/bin/ld: cannot find -lc

because I didn’t have glibc-static installed.

Other extra packages and setup I needed were:

  • colordiff: I like to have colorised svn diffs;
  • Jenkins. The documented installation instructions are good

    sudo wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo
    sudo rpm --import http://pkg.jenkins-ci.org/redhat/jenkins-ci.org.key
    sudo yum install jenkins

    That gave me a sandbox installation to mess with;
  • sudo gnome-control-center to set VPN routing (using the IPv4 tab) to access the new machine when I’m outside the work firewall. The resulting configuration is written to /etc/sysconfig/network-scripts/route-Wired_connection_1;
  • edited /etc/exports to export my home drive on the machine to the work network (/mydir *.workdomain.co.uk(rw,insecure,async)) and to the VPN IP. Installed nfs-utils and enabled (at boot) and started the mount daemon to complete the export

    sudo yum install nfs-utils
    sudo systemctl enable nfs-mountd.service
    sudo systemctl start nfs-mountd.service
  • additional Gnome configuration:

    sudo yum install gnome-tweak-tool gnome-font-viewer gnome-shell-extension-alternative-status-menu gnome-shell-extension-remove-accessibility-icon gnome-shell-extension-remove-bluetooth-icon gnome-shell-extension-dock gnome-shell-extension-alternate-tab
  • enabled RPM Fusion free and nonfree repositories using RPM Fusion Configuration and enabled MP3 support in Rhythmbox by installing gstreamer-plugins-ugly;
  • installed dependencies for Spark IM Client

    sudo yum install libX11.i686 libXext.i686 libXi.i686 libXp.i686 libXtst.i686 alsa-lib.i686 unixODBC.i686 compat-libstdc++-33.i686

    (although the Spark 2.6.3 RPM has a broken dependency on libodbc and libodbcinst);
  • installed python3, rdesktop, libusb-devel, lcov, octave, mpfr-devel, libmpc-devel.

Fedora-ing on a Rainy Afternoon

01/05/2012

The weather last Sunday afternoon was purgatorially dreary. I decided I’d try installing the Fedora 17 beta on my feeble and unmiraculous Dell Optiplex 260 at home in advance of upgrading other more important machines when the full release is available, so I downloaded the full i386 install DVD .iso. (Actually, being accustomed to having a poor broadband connection I had set this to download the night before.)

I don’t have any blank DVDs to hand at home, but I do have a nice 16GB Sandisk Cruiser Blade USB stick that I wanted to try installing from. (That’s bound to be kinder on the environment too, right?!?)

There is a good page on the Fedora wiki covering how to create and use Live USB that seemed relevant to this exercise. However, I made the initial mistake of only reading half the page and blindly followed the instructions to make a Live USB image using the LiveUSB Creator.

After RTFW (Reading The Fedora Wiki) a little more, I saw that these weren’t the instruction droids I was looking for; I wanted how to make a bootable USB drive to install Fedora instead of using a physical DVD, further down the page. I ran this advice without a hitch.

Nearly done? Well, I’d forgotten that I couldn’t boot the OptiPlex from USB (and began dimly remembering this rigmarole from Fedora-16 time). It was still dank outside, so I had a look around the web and found some great instructions from Dell which made me realise that if I’d bothered to update my BIOS since 2005 I would have support for booting from USB. Oops.

With the BIOS upgraded and the USB drive inserted I was given the option in the boot menu to boot from USB, and it worked, and the installation (well, upgrade really), seemed to go fine, and here I am writing up these notes using Fedora 17 Beta. Ta-da!

Fedora 17 Wallpaper

I’m pretty certain I used to be able to take screenshots differently with Fedora 16 though, and there are some odd messages when I try to shutdown, … Let’s report some bugs!

Installing Oracle Solaris Studio 12.3 (on SPARC)

03/04/2012

I only just noticed that Oracle Solaris Studio 12.3 was released last December. I’ve installed the past few releases on Solaris pretty easily (each Installation Guide is helpful), but I thought it would be good to keep a more permanent record of the process here.

First: what’s my machine and OS?

$ cat /etc/release

shows Solaris 10 on SPARC for me.

As root I downloaded the SVR4 *-pkg.tar.bz2 for the correct system from Oracle Solaris Studio Downloads and, naturally, bunzip2ed and untared it. I did this in /tmp/.

Then I ran the included install_patches.sh script. This gave a warning

For patch 147436-01, required patch 118833-36 does not exist.

The machine I’m installing on is probably about 7 years old, and I don’t think many system patches have been applied to it in the past. According to We Sun Solve!, patch 118833-36 is a massive system patch from 2007, while patch 147436-01 is a smaller fix to the linker from 2011. Hopefully we can live without it! Indeed, there seems to be some self-contradictory information from Oracle in the Installing the Required Oracle Solaris OS Patches section of the Studio 12.3 Installation Guide, which says that

…patch 147436-01…is required only on systems running Oracle Solaris 8/11

even though earlier on the page it doesn’t seem to be talking about this particular patch. Hmm. Otherwise, the patches installed fine.

Anyway, we’re ready to run the Studio installer now. For me, /tmp/ is too small, so I see

The /tmp temporary directory does not have enough free disk space for the installer.

plus Java is installed in a non-standard location, so I also see

Java installation was not found on this computer

All this means I have to invoke the installation script as

$ ./solarisstudio.sh --non-interactive --verbose --tempdir /export/home/OSS12.3 --javahome /opt/jre1.6.0_23

which runs successfully.

Dear disk – don’t die

30/03/2012

My trusty crusty home desktop (a Dell OptiPlex GX260) is nearly ten years old, and it’s been a bit flaky on recent occasions when booting up Fedora 16. One time this week I let it have about six attempts before I gave up; it would get stuck – after quite a while – and say something like

udevadm settle - timeout of 120 seconds reached, the event queue contains:
/sys/devices/pci0001:00/0001:00:02.0/0001:01:0b.1/usb3/3-2 (623)
/sys/devices/pci0001:00/0001:00:02.0/0001:01:0b.1/usb3/3-2/3-2:1.0 (624)
/sys/devices/pci0001:00/0001:00:02.0/0001:01:0b.1/usb3/3-2/3-2:1.0/0003:05AC:1000.0001 (625)
/sys/devices/pci0001:00/0001:00:02.0/0001:01:0b.1/usb3/3-2/3-2:1.0/0003:05AC:1000.0001/hidraw/hidraw0 (626)

(Unfortunately I don’t have the exact details to hand. I’m writing this from memory, but the OP’s question at udev reaching timeout is pretty much what was happening to me.)

The Disk Utility in the Accessories area

Accessories, Disk Utility

told me in the SMART Data section for the drive that there are bad sectors

SMART Data, bad sectors

For any repair work to be able to happen the drive has to be not mounted, so I booted from an old Fedora 15 Beta LiveCD I had lying around. I ran

$ sudo lvdisplay

to find my root partition (since it’s on a logical volume), and then ran

$ e2fsck -c my_root_partition

After several days of usage following this, booting seems a lot more stable. Hopefully there are another few years left in the old girl.

libc ya later, alligator

28/03/2012

Yesterday, like a n00by n00b, I deleted my Fedora’s /lib64/libc.so.6 symbolic link. It was intentional; not an act of deliberate vandalism, but neither in retrospect do I think it would have actually helped me achieve what I was trying to. The details of what I was attempting aren’t important. What shocked me is
how destructive this simple mistake was
: remember – I’ve only deleted the symlink, not the actual target library.

Try it yourself!

$ sudo rm /lib64/libc.so.6
$ ls
ls: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory

It’s all gone wrong. You can get some functionality back by setting LD_PRELOAD, e.g.

$ setenv LD_PRELOAD /lib64/libc-2.13.90.so
$ ls
(Stuff)

Yay. Ish: we really want to restore that little symlink though.

$ su
su: error while loading shared libraries: libc.so.6: cannot open shared object file: No such file or directory

I guess that LD_PRELOAD trick isn’t supposed to work here with stuff like su, otherwise we could load in any old junk library to bypass security, etc.

Is there really no way this can be fixed while the system is still live?

The superuser question How to restore /lib/libc.so.6? helped with the above, but I was hoping to avoid having to continue with the advice there which is to restore using a LiveCD. The problem has such a tantalizingly simple cause I was rooting for a simple solution.

In the end I dug out an old installation CD, but not first without seeing if the system would reboot. Ha!

Kernel panic - not syncing: Attempt to kill init!

and so on and so forth. But in the end sorting the problem out with the installation CD was simple. The rescue option mounted my installation under /mnt/sysimage/, so I just

$ cd /mnt/sysimage/lib64
$ ln -s libc-2.13.90.so libc.so.6

and with a reboot again, everything is back to normal.


Follow

Get every new post delivered to your Inbox.