Category Archives: Uncategorized

Fixing a Chronically Corrupt Git Index File

A while back I was working on a project where the website was under version control using git. Whenever I would try to run git status before checking out new changes, I would always get an error:

$ git status
error: bad index file sha1 signature 
fatal: index file corrupt 
fatal: 'git status --porcelain' failed

A quick internet search brought up what I thought was the solution:

$ rm -f .git/index
$ git reset

I say I thought this was the solution. But it wasn’t. I’d try to run git status after this, and the index file would still be corrupt. I struggled with this for a long time, until I finally realized what the problem was. It turned out that there were two folders in the git repo that contained .git directories—in other words, what we had was nested git repositories. This wasn’t intentional, and I had no idea that these .git directories were there. Once they were deleted, everything worked again.

TL;DR

Check to make sure that none of the non-submodule directories under your project have .git directories in them.

The Living Computer

I’m a programmer, but I’m also a nature lover, and I enjoy learning more about all of the sciences, especially biology. Recently, I’ve come to realize how much programming and biology share in common.

The basic building block of life is the cell. Actually, cells don’t have to just be building blocks. Single-celled creatures are just one single cell. And yet they have to confront all of the same basic challenges to life that you and I do.

Are Cells Computers?

Are cells living computers? No. They are so much more than that. But, just like you and I have a brain that has amazing computational power, cells have some aspects that are computer-like as well. Cells don’t have brains of course, or anything analogous to a nervous system. But they do have something else, an aspect that we don’t even understand yet in regard to the brain. They have software. Actually, we can go further than that. Cells have a complete OS.

There are several programming languages involved; one of the most well known is DNA. “But wait, isn’t DNA for storing information?” Thanks for asking! Actually, yes, you are correct, DNA is used by the cell to store huge volumes of information, which includes the blueprint not only for the cell’s structure, but also for its development. In your cells’ nuclei is all of the information needed to construct and maintain your body. How much is this? Over 3.2 billion base-pairs of DNA.

The Biotic Byte

Let’s convert that number to something more familiar. Instead of base-pairs, we could use bytes. Let’s take a minute to talk about bytes, just to show that this is a valid comparison. Bytes are actually groups of smaller units, called bits. Bits are binary; they can only be one of two things, a zero or a one. A byte is a string of exactly 8 bits. There are 2^8 or 256 different possible combinations of 8 bits, and so there are 256 unique bytes.

A strand of DNA is made up of base-pairs. These are in groups of three, called codons. We can think of these codons like bytes, and like bytes they are also made up of smaller units, the base-pairs. Unlike bits, which come in only two types, DNA is made bases that come in 4 different letters, A, C, T, and G. That means that twice as much information can be stored in a single letter as can be represented by a bit. So 4 letters of DNA can store the same amount of information as one byte.

Now that we know how to convert codons to bytes, we can do the math. We have 3.2 billion base-pairs or letters, so to get the number of bytes we just divide by 4: 3.2 billion / 4 ≈ 0.8 billion. So the size of the human genome is approximately 800 million bytes, or 763 megabytes.

Now think of this: Each cell in your body has two copies of the genome (except for red blood cells, which have none). And it’s estimated that there are 37.2 trillion cells in the average adult human body. Even if we assume that 17 trillion of these are red blood cells, that means that your body contains 23 trillion gigabytes of DNA. That could also be written as 22 million petabytes, or 21 zettabytes. To put this in perspective, the world’s total effective two-way telecommunications capacity was “only” 65,000 petabytes per-day in 2007. At that rate, to transmit all of the information encoded on all of the DNA in your body, it would take almost a whole year.

A year. And yet all of that information fits inside of you. Despite the fact that the strands of DNA in a single cell would stretch out to about 2 m (6 ft) long if laid end to end, in the nucleus they packed into a whopping diameter of just 6-10 millionths of a meter. That means all of the DNA in your body could fit into a 22 cm (8.5 in) cube. Let’s compare that size to how much room it would take to store the same amount of information on computers. Let’s imagine we put it all onto 1 terabyte hard drives that measure 3 in by 4 in by 0.5 in. They would make a cube about 424 ft (130 m) on a side. A building of that size would have a volume of 76 million cu ft, which would make it the eighth largest building in the world.

Not Just For Information Storage

DNA is obviously an extremely efficient medium of information storage. We’ve looked at it from the angle of just how much your body contains. But we can also look at it from the other angle. A single copy of the entire human genome takes up only 0.8 gigabytes. Compare that with the raw size of OS X Yosemite, which is 5.18 gigabytes. Windows 8 requires about 6–8 gigabytes. In other words, modern computer operating systems take almost 10 times as much code as it takes to create and run your body.

DNA is like a computer program but far, far more advanced than any software ever created.—Bill Gates, founder of Microsoft, in The Road Ahead

The really amazing thing about DNA—and this is what I started out to say a while back—is that it isn’t just a blueprint. Most of it doesn’t encode genes. Not even close. The protein-coding portion takes up less than 2% of your DNA, or about 15 megabytes. So what does the rest of the DNA do? Lot’s of things, actually. It does so much, in fact, that we aren’t even beginning to understand it all. But we do know enough to know that DNA is far more than a blueprint. Is it a computer program? Sort of. It really goes beyond that, but that’s the closest thing to it we’ve ever created.

Beyond Programming

As a programmer, it is amazing how much DNA is like a programming language. However, it is even more amazing how much DNA goes beyond modern programming.

How can DNA program for so much in such little space? We can’t yet fully answer that question, but we’re starting to find clues. One is that DNA isn’t just one programming language. It is several, all at once. The same DNA strand can code for several different codes, in both directions. I can’t imagine trying to write code that has to do one thing when read forwards and another when read backwards. Most of our languages couldn’t possibly do that, because of their syntax. They are inherently one-way.

Take PHP for example. Its syntax requires the code to be interpreted from left to right. It’s not just that you couldn’t interpret it backwards as PHP, but it would be really difficult even to create a language with inverted PHP syntax. The same goes for JavaScript.

Of course, some languages are simpler (like BASIC), and could potentially work forwards and backwards. These languages are also far less human-readable. They are already hard for us to grok as it is, so how in the world would we ever be able to write meaningful two-way code like that? It might seem like it would be easy to do, if we just wrote the one-way code and used computer algorithms to compress it into two-way code. But that’s far easier said than done.

The Modular Genome

Among programming best practices is that of writing modular code. Instead of creating one huge, garbled, interconnected whole, a project can be split into discrete parts that are interoperable.

While I was contemplating writing this post, I happened to come across an article that revealed that some genomes are like this. Actually, all genomes are modular, in the sense that they are made up of discrete genes. But what has been discovered in this case is something different. The DNA isn’t just modular, it is actually split into discrete packages.

The genome of the unicellular ciliate Stylonychia lemnae is really astounding. These creatures actually maintain two copies of their genome in separate nuclei. In one nucleus, called the micronucleus, all of the DNA is stored in a single chromosome. In the other nucleus the DNA is split into thousands of different chromosomes. More than 16 thousand, to be exact. This type of nucleus is much larger than the other, and is called the macronucleus.

The moment I read this, I thought of packagist.org. Thousands of different discrete modules maintained in a single repository. Actually though, it is much more like the plugin repository on WordPress.org, which isn’t just a listing directory, but actually holds all of the code for the 37,000+ plugins in a single SVN repository.

The fascinating thing is that the macronulceus is about 10 times larger than the micronucleus. In effect, this means that the copy of the genome which is used in genetic transmission is kept under 10x compression. 10x! It is amazing that the genome can be compressed this much, and yet still be usable for genetic recombination.

Compile-time Optimization

Languages like PHP get compiled into machine code. Some compilers have features that modify the compiled code in various ways to try to improve its performance. This is called compile-time optimization. It’s usually not trivial to do this, because the compiler is risking the possibility of introducing a bug instead of an optimization. It can also mean compilation itself is much less performant, because the compiler has to run sophisticated algorithms over the code.

In the genome, we might think of the transcription of DNA to RNA as compilation. It’s been known for some time that the nucleus sometimes makes modifications to the RNA after transcription. That’s kind of like compile-time optimization. But in fact, it is much more than that. Sometimes the changes are very simple, and affect just a single base. It’s been recently discovered that this type of RNA editing may be very common. But it has also been known for some time that much more complex forms of RNA editing occur as well. This is called alternative splicing, and it involves taking a gene and splitting it into its modular components. These are then rearranged from their usual configuration, with some being doubled or removed. Then they might be combined with pieces of a completely different gene.

This goes beyond our conventional compile-time optimizations. It’d be like compiling two different components of a program, breaking them down into smaller pieces, and rearranging them to create something entirely new.

Living Programmer

As a programmer, all of this is fascinating. I can sit here and write computer programs because of the trillions of programs being run inside of my body’s cells. This naturally leads us to a question: where did those programs come from? Who wrote them?

You might answer, “I don’t know.” But a staunch evolutionist will tell you that is the wrong answer. (Unless you catch him off guard.) They will tell you no-one wrote the program. As a programmer, that’s unbelievable. As a programmer, I know that programs don’t just happen, they take intelligence. And just being “smart” isn’t enough: you have to have skill too, you have to know the language. Even with high intelligence and superb skill, how often do we get it right the first time? How often do we have to do lot’s of testing to make sure the thing really works?

Yet evolutionists would have us believe that the unimaginable complexity of the genome happened by accident, that a programming language just created itself, and that, over time, a program was shaped through typos in the code.

Of course, as a programmer, I know that is ludicrous. One typo or mistake can easily kill a program. Even if a typo isn’t syntactically invalid, it can still cause the program to stop working properly. And even if that doesn’t happen, it’s still highly probable that a small bug has been introduced by it—and those small bugs are the real killers. You can argue that natural selection will, in effect, “weed out” those really bad bugs. And that’s true (though the reproduction rate isn’t high enough to sustain that level of mutation for millions of years). But you can’t say that about the small bugs. They’re little changes that don’t really seem to have much effect—most of the time. Instead, they’ll build up in the population until it is driven to the point of extinction.

Just imagine a program you’ve written being eroded this way over time. Before long, it would cease to do anything useful at all.

As a programmer, it is obvious: someone programmed me. And not just anyone either. Someone who has unbelievable intelligence, skill, and artistry. Someone who can build something infinitely more complex than Microsoft Windows, using less code, and even have that thing reproduce itself. Do you know anyone like that? It clearly wasn’t one of us. It clearly wasn’t any other form of biological life either (from here or elsewhere), because all life is based on programs. All life requires a Programmer.

As one living programmer, let me ask you: have you met the Programmer of all life? Have you met the living Programmer?

A Week with HackerOne

About three months ago I signed up with HackerOne, and created a bug bounty program for my WordPoints plugin. I’m writing this post to document my experience with HackerOne, for anyone else who may be thinking of using it.

When you first create your program, it is private. This gives you time to tweak things and gain some familiarity with the system. You also have the opportunity to invite up to 100 of the top hackers to participate and the private pre-launch program. I did invite all 100 (though not all at once), but there wasn’t any activity. That is probably because I hadn’t set a minimum bounty amount yet.

Last Friday I decided to make the program public. This timing roughly corresponded with the release of WordPoints 1.7.0, which included some security fixes that I’d discovered on my own.

What should you expect when you launch publicly? I got 15 bug reports in the first 24 hours, and about a third of them were probably in the first couple hours after launching.

The reason for the immediate spike in activity (of which there had been none previously), is probably due at least in part to my having set a minimum bounty (though this was only $25).

Of those 15 reports, most of them were low quality. The reporter obviously hadn’t read the program description, and didn’t know what kind of bugs I was looking for and what sort of vulnerabilities I would consider invalid. Of those 15, only two were vulnerabilities that actually needed fixing. I’ve received 4 more reports this week, but none of them have been valid source code vulnerabilities either.

So, now you know what you can expect with your first week after launching a bug bounty program on HackerOne. I suspect that if you wanted to avoid the first-minute slew of reports, you could wait until later to set a minimum bounty amount.

All in all, I am very pleased with HackerOne. The UI is great and has the tools you need to respond quickly. I think also that report quality will probably increase as better researchers join in in an unhurried manner. Well, at least if I decide to increase the bounty in the future. :-)

The end of PHP 5.3

PHP 5.3 is dead.

In some ways it’s sad when a version of PHP dies. But in other ways, it’s great. I mean, more time can go into making it a better language instead of backporting security fixes to old versions. Really, the only thing that makes it sad is that so many people will be using these dead and insecure versions for years to come.

The other day I wrote a post about how WordPress supports PHP back to 5.2. In fact, until recently most WordPress sites were running on 5.2. Now, it’s an even split between that and 5.3:

WordPress PHP versions pie chart

Only 21.8% of sites are now running on a living version of PHP.

I think that it is interesting to compare these stats to the number of sites running WordPress versions that receive security updates:

WordPress versions pie chart

39% of sites are running a living version of WordPress.

Why is there such a discrepancy here? Obviously, WordPress is web software, whereas PHP is a programming language, but is there another reason that more WordPress sites are up to date? I doubt that users are more conscientious about updating than web hosts are.

It’s not only that more WordPress sites are up to date, but more of them are running the latest version, 3.9. Whereas PHP 5.5 has a much smaller slice than 5.4.

I think the reason for this difference is probably that WordPress is easier to update than PHP. Not just that there is a difference in technical skill required, but that updating PHP is more likely to break things than updating WordPress. WordPress always aims for complete backward compatibility, but there are sometimes changes in PHP that can definitely break things. It could be that the difference here is the obvious: the less painful it is to update, the more people will do it.

WordPress Documentation Lookup Within TextWrangler

I use TextWrangler as my code editor. This is one reason why.

Sometimes you want to look up something in the WordPress documentation, straight from your code editor. And with WordPress, that might be a function, a filter, or an action. It’s easy to add that feature to TextWrangler, just by adding an AppleScript file to the scripts folder (~/Library/Application Support/TextWrangler/Scripts/). Then you can access it from the scripts menu, which is the one with the little script icon. (To open the scripts folder, click on “Open Scripts Folder” under the scripts menu).

Here’s what you can put in the file to look up a function on the WordPress codex:

[applescript]
tell application "TextWrangler"
set function to selection of window 1 as string
if function = "" then
set function to text returned of (display dialog "WordPress Function:" default answer "")
end if

if function is not "" then
set target_URL to "http://codex.wordpress.org/Function_Reference/" & function
open location target_URL
end if
end tell
[/applescript]

Simple, huh?

This will open a dialog prompt for you to type whatever function you want to look up, or if you have already selected the function’s name on the page, it will use that. Either way, the codex page for that function will open in your favorite browser.

You can easily customize this script to look up anything you want anywhere you want. Just change the target URL and dialog text accordingly.

Here’s a zip of my folder which includes filter and action look-up in addition to the function look-up. You can copy the entire folder to TextWrangler’s scripts folder, and it will create a new menu item named WordPress.