Autorandr: automatically adjust screen layout

Like many laptop users, I often plug my laptop into different monitor setups (multiple monitors at my desk, projector when presenting, etc.) Running xrandr commands or clicking through interfaces gets tedious, and writing scripts isn't much better.

Recently, I ran across autorandr, which detects attached monitors using EDID (and other settings), saves xrandr configurations, and restores them. It can also run arbitrary scripts when a particular configuration is loaded. I've packed it, and it is currently waiting in NEW. If you can't wait, the deb is here and the git repo is here.

To use it, simply install the package, and create your initial configuration (in my case, undocked):

 autorandr --save undocked

then, dock your laptop (or plug in your external monitor(s)), change the configuration using xrandr (or whatever you use), and save your new configuration (in my case, workstation):

autorandr --save workstation

repeat for any additional configurations you have (or as you find new configurations).

Autorandr has udev, systemd, and pm-utils hooks, and autorandr --change should be run any time that new displays appear. You can also run autorandr --change or autorandr --load workstation manually too if you need to. You can also add your own ~/.config/autorandr/$PROFILE/postswitch script to run after a configuration is loaded. Since I run i3, my workstation configuration looks like this:

 #!/bin/bash

 xrandr --dpi 92
 xrandr --output DP2-2 --primary
 i3-msg '[workspace="^(1|4|6)"] move workspace to output DP2-2;'
 i3-msg '[workspace="^(2|5|9)"] move workspace to output DP2-3;'
 i3-msg '[workspace="^(3|8)"] move workspace to output DP2-1;'

which fixes the dpi appropriately, sets the primary screen (possibly not needed?), and moves the i3 workspaces about. You can also arrange for configurations to never be run by adding a block hook in the profile directory.

Check it out if you change your monitor configuration regularly!

Posted
Fixing a reading light

One of our cats (Haru) chews small wires, and recently chewed through the USB Type A to barrel connector cord for my LED reading light. No biggie, I thought, I'll just buy a replacement for it, and move on. But wait, an entirely new reading light is just as cheap! I'll buy that so I'll have two, and I won't have to worry about buying the wrong connector.

When the new reading light arrived, I found out it now used a micro USB connector instead of the barrel, and more importantly, wouldn't run off of the battery. Some disassembly, and the reason became pretty obvious. The battery was slightly bulgy, had almost no resistance, and had zero voltage. All signs of a battery which shorted out at some point very early in its lifetime.

Luckily, I had a working battery from the older light... so why not swap? Some dodgy soldering work later, and voila! One working light with more universal connectors and some extra parts. My rudimentary soldering and electronic troubleshooting skills keep coming in handy.

Posted
Core Transcriptome of Mammalian Placentas

Our paper which describes the components of the placenta transcriptome which are conserved among all placental mammals in Placenta just came out today. More importantly than the results and the text of the paper, though, is the fact that all of the code and results of this paper, from the very first work I did two years ago to its publication today is present in git, and (in theory) reproducible.

You can see where our paper was rejected from Genome Biology and Genes and development and radically refocused before submission to Placenta. But more importantly, you can know where every single result which is mentioned in the paper came from, the precise code to generate it, and how we came to the final paper which was published. [And you've also got all of the hooks to branch off from our analysis to do your own analysis based on our data!]

This is what open, reproducible science should look like.

Shrinking lists of gene names in R

I've been trying to finish a paper where I compare gene expression in 14 different placentas. One of the supplemental figures compares median expression in gene trees across all 14 species, but because tree ids like ENSGT00840000129673 aren't very expressive, and names like "COL11A2, COL5A3, COL4A1, COL1A1, COL2A1, COL1A2, COL4A6, COL4A5, COL7A1, COL27A1, COL11A1, COL4A4, COL4A3, COL3A1, COL4A2, COL5A2, COL5A1, COL24A1" take up too much space, I wanted a function which could collapse the gene names into something which uses bash glob syntax to more succinctly list the gene names, like: COL{11A{1,2},1A{1,2},24A1,27A1,2A1,3A1,4A{1,2,3,4,5,6},5A{1,2,3},7A1}.

Thus, a crazy function which uses lcprefix from Biostrings and some looping was born:

collapse.gene.names <- function(x,min.collapse=2) {
    ## longest common substring
    if (is.null(x) || length(x)==0) {
        return(as.character(NA))
    }
    x <- sort(unique(x))
    str_collapse <- function(y,len) {
        if (len == 1 || length(y) < 2) {
            return(y)
        }
        y.tree <-
            gsub(paste0("^(.{",len,"}).*$"),"\\1",y[1])
        y.rem <-
            gsub(paste0("^.{",len,"}"),"",y)
        y.rem.prefix <-
            sum(combn(y.rem,2,function(x){Biostrings::lcprefix(x[1],x[2])}) >= 2)
        if (length(y.rem) > 3 &&
            y.rem.prefix >= 2
            ) {
            y.rem <- 
                collapse.gene.names(y.rem,min.collapse=1)
        }
        paste0(y.tree,
               "{",paste(collapse=",",
                         y.rem),"}")
    }
    i <- 1
    ret <- NULL
    while (i <= length(x)) {
        col.pmin <-
            pmin(sapply(x,Biostrings::lcprefix,x[i]))
        collapseable <-
            which(col.pmin > min.collapse)
        if (length(collapseable) == 0) {
            ret <- c(ret,x[i])
            i <- i+1
        } else {
            ret <- c(ret,
                     str_collapse(x[collapseable],
                                  min(col.pmin[collapseable]))
                     )
            i <- max(collapseable)+1
        }
    }
    return(paste0(collapse=",",ret))
}
H3ABioNet Hackathon (Workflows)

I'm in Pretoria, South Africa at the H3ABioNet hackathon which is developing workflows for Illumina chip genotyping, imputation, 16S rRNA sequencing, and population structure/association testing. Currently, I'm working with the imputation stream and we're using Nextflow to deploy an IMPUTE-based imputation workflow with Docker and NCSA's openstack-based cloud (Nebula) underneath.

The OpenStack command line clients (nova and cinder) seem to be pretty usable to automate bringing up a fleet of VMs and the cloud-init package which is present in the images makes configuring the images pretty simple.

Now if I just knew of a better shared object store which was supported by Nextflow in OpenStack besides mounting an NFS share, things would be better.

You can follow our progress in our git repo: [https://github.com/h3abionet/chipimputation]

Bioinformatic Supercomputer Wishlist

Many bioinformatic problems require large amounts of memory and processor time to complete. For example, running WGCNA across 10⁶ CpG sites requires 10⁶ choose 2 or 10¹³ comparisons, which needs 10 TB to store the resulting matrix. While embarrassingly parallel, the dataset upon which the regressions are calculated is very large, and cannot fit into main memory of most existing supercomputers, which are often tuned for small-data fast-interconnect problems.

Another problem which I am interested in is computing ancestral trees from whole human genomes. This involves running maximum likelihood calculations across 10⁹ bases and thousands of samples. The matrix itself could potentially take 1 TB, and calculating the likelihood across that many positions is computationally expensive. Furthermore, an exhaustive search of trees for 2000 individuals requires 2000!! comparisons, or 10²⁸⁶⁸; even searching a small fraction of that subspace requires lots of computational time.

Some things that a future supercomputer could have that would enable better solutions to bioinformatic problems include:

  1. Fast local storage
  2. Better hierarchical storage with smarter caching. Data should ideally move easily between local memory, shared memory, local storage, and remote storage.
  3. Fault-tolerant, storage affinity aware schedulers.
  4. GPUs and/or other coprocessors with larger memory and faster memory interconnects.
  5. Larger memory (at least on some nodes)
  6. Support for docker (or similar) images.
  7. Better bioinformatics software which can actually take advantage of advances in computer architecture.
Essential Data Science: Git

Having a new student join me to work in the lab reminded me that I should collect some of the many resources around for getting started in bioinformatics and any data-based science in general. So towards this end, one of the first essential tools for any data scientist is a knowledge of git.

Start first with Code School's simple introduction to git which gives you the basics of using git from the command line.

Then, check out set of lectures on Git and GitHub which goes into setting up git and using it with github. This is a set of lectures which was used in a Data Science course.

Finally, I'd check out the set of resources on github for even more information, and then learn to love the git manpages.

Introducing dqsub

I've been using qsub for a while now on the cluster here at the IGB at UofI. qsub is a command line program which is used to submit jobs to a scheduler to eventually be run on one (or more) nodes of a cluster.

Unfortunately, qsub's interface is horrible. It requires that you write a shell script for every single little thing you run, and doesn't do simple things like providing defaults or running multiple jobs at once with slightly different arguments. I've dealt with this for a while using some rudimentary shell scripting, but I finally had enough.

So instead, I wrote a wrapper around qsub called dqsub.

What used to require a complicated invocation like:

echo -e '#!/bin/bash\nmake foo'| \
 qsub -q default -S /bin/bash -d $(pwd) \
  -l mem=8G,nodes=1:ppn=4 -;

can now be run with

dqsub --mem 8G --ppn 4 make foo;

Want to run some command in every single directory which starts with SRX? That's easy:

ls -1 SRX*|dqsub --mem 8G --ppn 4 --array chdir make bar;

Want instead to behave like xargs but do the same thing?

ls -1 SRX*|dqsub --mem 8G --ppn 4 --array xargs make bar -C;

Now, this wrapper isn't complete yet, but it's already more than enough to do what I require, and has saved me quite a bit of time already.

You can steal dqsub for yourself

Feel free to request specific features, too.

Adding a Table of Contents to PDFs from R

I routinely generate very large PDFs from R which have hundreds (or thousands) of pages, and navigating these pages can be very difficult. Unfortunately, neither R's pdf() nor its cairopdf() drivers support creating Table of Contents (or Index) while plots are being written out. In the case of cairo, the underlying library doesn't support it either, so this isn't something that can easily be added to R directly. I had been thinking about sitting down for months and writing the support into cairo and R's cairo package... but real life kept getting in the way.

Fast forward to a week ago, when I realized that pdftk does support dumping the table of contents and updating the table of contents using dump_data_utf8 and update_info_utf8! Armed with that knowledge, and a bit of hackery, we can save an index, and then update the pdf once it's been closed.

The R code then looks like the following:

 ..device.set.up <- FALSE
 ..current.page <<- 0

 save.bookmark <- function(text,bookmarks=list(),level=1,page=NULL) {
     if (!..device.set.up) {
         Cairo.onSave(device = dev.cur(),
                      onSave=function(device,page){
                          ..current.page <<- page
                      })
         ..device.set.up <<- TRUE
     }
     if (missing(page)|| is.null(page)) {
         page <- ..current.page+1
     }
     bookmarks[[length(bookmarks)+1]] <-
         list(text=text,
              level=level,
              page=page)
     return(bookmarks)
 }

 write.bookmarks <- function(pdf.file,bookmarks=list()) {
     pdf.bookmarks <- ""
     for (bookmark in 1:length(bookmarks)) {
         pdf.bookmarks <-
             paste0(pdf.bookmarks,
                    "BookmarkBegin\n",
                    "BookmarkTitle: ",bookmarks[[bookmark]]$text,"\n",
                    "BookmarkLevel: ",bookmarks[[bookmark]]$level,"\n",
                    "BookmarkPageNumber: ",bookmarks[[bookmark]]$page,"\n")
     }
     temp.pdf <- tempfile(pattern=basename(pdf.file))
     temp.pdf.info <- tempfile(pattern=paste0(basename(pdf.file),"info_utf8"))
     cat(file=temp.pdf.info,pdf.bookmarks)
     system2("pdftk",c(pdf.file,'update_info_utf8',temp.pdf.info,'output',temp.pdf))
     if (file.exists(temp.pdf)) {
         file.rename(temp.pdf,pdf.file)
     } else {
         warning("unable to properly create bookmarks")
     }
 }

and can be used like so:

 cairopdf(file="testing.pdf")
 bookmarks <- list()
 bookmarks <- save.bookmark("First plot",bookmarks)
 plot(1:5,6:10)
 bookmarks <- save.bookmark("Second plot",bookmarks)
 plot(6:10,1:5)
 dev.off()
 write.bookmarks("testing.pdf",bookmarks)

et voila. Bookmarks and a table of contents for PDFs.

This basic methodology can be extended to any language which writes PDFs and does not have a built-in method for generating a Table of Contents. Currently, the usage of Cairo.onSave is a horrible hack, and may conflict with anything else which uses the onSave hook, but hopefully R will report the current page number from Cairo in the future.

Posted
Adding a newcomer (⎈) tag to the BTS

Some of you may already be aware of the gift tag which has been used for a while to indicate bugs which are suitable for new contributors to use as an entry point to working on specific packages. Unfortunately, some of us (including me!) were unaware that this tag even existed.

Luckily, Lucas Nussbaum clued me in to the existence of this tag, and after a brief bike-shed-naming thread, and some voting using pocket_devotee we decided to name the new tag newcomer, and I have now added this tag to the BTS documentation, and tagged all of the bugs which were user tagged "gift" with this tag.

If you have bugs in your package which you think are ideal for new contributors to Debian (or your package) to fix, please tag them newcomer. If you're getting started in Debian, and working on bugs to fix, please search for the newcomer tag, grab the helm, and contribute to Debian.

This blog is powered by ikiwiki.