The Living Computer

I’m a programmer, but I’m also a nature lover, and I enjoy learning more about all of the sciences, especially biology. Recently, I’ve come to realize how much programming and biology share in common.

The basic building block of life is the cell. Actually, cells don’t have to just be building blocks. Single-celled creatures are just one single cell. And yet they have to confront all of the same basic challenges to life that you and I do.

Are Cells Computers?

Are cells living computers? No. They are so much more than that. But, just like you and I have a brain that has amazing computational power, cells have some aspects that are computer-like as well. Cells don’t have brains of course, or anything analogous to a nervous system. But they do have something else, an aspect that we don’t even understand yet in regard to the brain. They have software. Actually, we can go further than that. Cells have a complete OS.

There are several programming languages involved; one of the most well known is DNA. “But wait, isn’t DNA for storing information?” Thanks for asking! Actually, yes, you are correct, DNA is used by the cell to store huge volumes of information, which includes the blueprint not only for the cell’s structure, but also for its development. In your cells’ nuclei is all of the information needed to construct and maintain your body. How much is this? Over 3.2 billion base-pairs of DNA.

The Biotic Byte

Let’s convert that number to something more familiar. Instead of base-pairs, we could use bytes. Let’s take a minute to talk about bytes, just to show that this is a valid comparison. Bytes are actually groups of smaller units, called bits. Bits are binary; they can only be one of two things, a zero or a one. A byte is a string of exactly 8 bits. There are 2^8 or 256 different possible combinations of 8 bits, and so there are 256 unique bytes.

A strand of DNA is made up of base-pairs. These are in groups of three, called codons. We can think of these codons like bytes, and like bytes they are also made up of smaller units, the base-pairs. Unlike bits, which come in only two types, DNA is made bases that come in 4 different letters, A, C, T, and G. That means that twice as much information can be stored in a single letter as can be represented by a bit. So 4 letters of DNA can store the same amount of information as one byte.

Now that we know how to convert codons to bytes, we can do the math. We have 3.2 billion base-pairs or letters, so to get the number of bytes we just divide by 4: 3.2 billion / 4 ≈ 0.8 billion. So the size of the human genome is approximately 800 million bytes, or 763 megabytes.

Now think of this: Each cell in your body has two copies of the genome (except for red blood cells, which have none). And it’s estimated that there are 37.2 trillion cells in the average adult human body. Even if we assume that 17 trillion of these are red blood cells, that means that your body contains 23 trillion gigabytes of DNA. That could also be written as 22 million petabytes, or 21 zettabytes. To put this in perspective, the world’s total effective two-way telecommunications capacity was “only” 65,000 petabytes per-day in 2007. At that rate, to transmit all of the information encoded on all of the DNA in your body, it would take almost a whole year.

A year. And yet all of that information fits inside of you. Despite the fact that the strands of DNA in a single cell would stretch out to about 2 m (6 ft) long if laid end to end, in the nucleus they packed into a whopping diameter of just 6-10 millionths of a meter. That means all of the DNA in your body could fit into a 22 cm (8.5 in) cube. Let’s compare that size to how much room it would take to store the same amount of information on computers. Let’s imagine we put it all onto 1 terabyte hard drives that measure 3 in by 4 in by 0.5 in. They would make a cube about 424 ft (130 m) on a side. A building of that size would have a volume of 76 million cu ft, which would make it the eighth largest building in the world.

Not Just For Information Storage

DNA is obviously an extremely efficient medium of information storage. We’ve looked at it from the angle of just how much your body contains. But we can also look at it from the other angle. A single copy of the entire human genome takes up only 0.8 gigabytes. Compare that with the raw size of OS X Yosemite, which is 5.18 gigabytes. Windows 8 requires about 6–8 gigabytes. In other words, modern computer operating systems take almost 10 times as much code as it takes to create and run your body.

DNA is like a computer program but far, far more advanced than any software ever created.—Bill Gates, founder of Microsoft, in The Road Ahead

The really amazing thing about DNA—and this is what I started out to say a while back—is that it isn’t just a blueprint. Most of it doesn’t encode genes. Not even close. The protein-coding portion takes up less than 2% of your DNA, or about 15 megabytes. So what does the rest of the DNA do? Lot’s of things, actually. It does so much, in fact, that we aren’t even beginning to understand it all. But we do know enough to know that DNA is far more than a blueprint. Is it a computer program? Sort of. It really goes beyond that, but that’s the closest thing to it we’ve ever created.

Beyond Programming

As a programmer, it is amazing how much DNA is like a programming language. However, it is even more amazing how much DNA goes beyond modern programming.

How can DNA program for so much in such little space? We can’t yet fully answer that question, but we’re starting to find clues. One is that DNA isn’t just one programming language. It is several, all at once. The same DNA strand can code for several different codes, in both directions. I can’t imagine trying to write code that has to do one thing when read forwards and another when read backwards. Most of our languages couldn’t possibly do that, because of their syntax. They are inherently one-way.

Take PHP for example. Its syntax requires the code to be interpreted from left to right. It’s not just that you couldn’t interpret it backwards as PHP, but it would be really difficult even to create a language with inverted PHP syntax. The same goes for JavaScript.

Of course, some languages are simpler (like BASIC), and could potentially work forwards and backwards. These languages are also far less human-readable. They are already hard for us to grok as it is, so how in the world would we ever be able to write meaningful two-way code like that? It might seem like it would be easy to do, if we just wrote the one-way code and used computer algorithms to compress it into two-way code. But that’s far easier said than done.

The Modular Genome

Among programming best practices is that of writing modular code. Instead of creating one huge, garbled, interconnected whole, a project can be split into discrete parts that are interoperable.

While I was contemplating writing this post, I happened to come across an article that revealed that some genomes are like this. Actually, all genomes are modular, in the sense that they are made up of discrete genes. But what has been discovered in this case is something different. The DNA isn’t just modular, it is actually split into discrete packages.

The genome of the unicellular ciliate Stylonychia lemnae is really astounding. These creatures actually maintain two copies of their genome in separate nuclei. In one nucleus, called the micronucleus, all of the DNA is stored in a single chromosome. In the other nucleus the DNA is split into thousands of different chromosomes. More than 16 thousand, to be exact. This type of nucleus is much larger than the other, and is called the macronucleus.

The moment I read this, I thought of packagist.org. Thousands of different discrete modules maintained in a single repository. Actually though, it is much more like the plugin repository on WordPress.org, which isn’t just a listing directory, but actually holds all of the code for the 37,000+ plugins in a single SVN repository.

The fascinating thing is that the macronulceus is about 10 times larger than the micronucleus. In effect, this means that the copy of the genome which is used in genetic transmission is kept under 10x compression. 10x! It is amazing that the genome can be compressed this much, and yet still be usable for genetic recombination.

Compile-time Optimization

Languages like PHP get compiled into machine code. Some compilers have features that modify the compiled code in various ways to try to improve its performance. This is called compile-time optimization. It’s usually not trivial to do this, because the compiler is risking the possibility of introducing a bug instead of an optimization. It can also mean compilation itself is much less performant, because the compiler has to run sophisticated algorithms over the code.

In the genome, we might think of the transcription of DNA to RNA as compilation. It’s been known for some time that the nucleus sometimes makes modifications to the RNA after transcription. That’s kind of like compile-time optimization. But in fact, it is much more than that. Sometimes the changes are very simple, and affect just a single base. It’s been recently discovered that this type of RNA editing may be very common. But it has also been known for some time that much more complex forms of RNA editing occur as well. This is called alternative splicing, and it involves taking a gene and splitting it into its modular components. These are then rearranged from their usual configuration, with some being doubled or removed. Then they might be combined with pieces of a completely different gene.

This goes beyond our conventional compile-time optimizations. It’d be like compiling two different components of a program, breaking them down into smaller pieces, and rearranging them to create something entirely new.

Living Programmer

As a programmer, all of this is fascinating. I can sit here and write computer programs because of the trillions of programs being run inside of my body’s cells. This naturally leads us to a question: where did those programs come from? Who wrote them?

You might answer, “I don’t know.” But a staunch evolutionist will tell you that is the wrong answer. (Unless you catch him off guard.) They will tell you no-one wrote the program. As a programmer, that’s unbelievable. As a programmer, I know that programs don’t just happen, they take intelligence. And just being “smart” isn’t enough: you have to have skill too, you have to know the language. Even with high intelligence and superb skill, how often do we get it right the first time? How often do we have to do lot’s of testing to make sure the thing really works?

Yet evolutionists would have us believe that the unimaginable complexity of the genome happened by accident, that a programming language just created itself, and that, over time, a program was shaped through typos in the code.

Of course, as a programmer, I know that is ludicrous. One typo or mistake can easily kill a program. Even if a typo isn’t syntactically invalid, it can still cause the program to stop working properly. And even if that doesn’t happen, it’s still highly probable that a small bug has been introduced by it—and those small bugs are the real killers. You can argue that natural selection will, in effect, “weed out” those really bad bugs. And that’s true (though the reproduction rate isn’t high enough to sustain that level of mutation for millions of years). But you can’t say that about the small bugs. They’re little changes that don’t really seem to have much effect—most of the time. Instead, they’ll build up in the population until it is driven to the point of extinction.

Just imagine a program you’ve written being eroded this way over time. Before long, it would cease to do anything useful at all.

As a programmer, it is obvious: someone programmed me. And not just anyone either. Someone who has unbelievable intelligence, skill, and artistry. Someone who can build something infinitely more complex than Microsoft Windows, using less code, and even have that thing reproduce itself. Do you know anyone like that? It clearly wasn’t one of us. It clearly wasn’t any other form of biological life either (from here or elsewhere), because all life is based on programs. All life requires a Programmer.

As one living programmer, let me ask you: have you met the Programmer of all life? Have you met the living Programmer?

Travis CI, Composer, and PHP 5.2

Once I’ve written some PHP unit tests for my plugins, I like to make sure I put them to good use. I develop the plugins on GitHub, so with the right tools, it’s easy to set up Travis CI to run my tests. This will let me run the tests against all of the PHP versions I need too without the hassle of trying to do this locally.

The only problem is that WordPress still supports PHP 5.2, and while I want to run my tests against that version, I’m using composer to install some of my dev dependencies. And as you probably know, composer requires PHP 5.3. So I searched around the internet to see if anyone had a solution to this dilemma. I did find one project on GitHub, but it requires you to have a separate config file for PHP 5.2, and doesn’t appear to be maintained at this time.

What I was really hoping for was a way to run composer using PHP 5.3 even when the tests are running on 5.2, since all of the PHP versions are installed on the Travis test box. I couldn’t find any helpful information about switching PHP versions on Travis, but with a little research into phpenv (which Travis uses to manage the PHP environment), I was able to figure something out.

It’s actually as easy as this:

phpenv global 5.3
composer install
phpenv global "$TRAVIS_PHP_VERSION"

Just drop that into the before_install section of your .travis.yml, and you’re ready to go!

Improving Plugin Security One Day at a Time

As a security-conscious developer, I like to inspect a WordPress plugin’s source code before I install it on one of my sites. I didn’t do this, formerly. But after the first time I did (and found a vulnerability), I believe more strongly in its importance.

I also keep tabs on WordPress related security reports on sites like Packet Storm, Secunia, Exploit Database, and Bugtraq. I use IFTTT for this:

IFTTT Recipe: Email me WordPress security reports from Packet Storm connects feed to email

IFTTT Recipe: Email me WordPress security reports on Secunia connects feed to email

IFTTT Recipe: Email me WordPress security reports on Exploit DB connects feed to email

IFTTT Recipe: Email me WordPress security reports from Bugtraq connects feed to email

Over the last year I’ve come to realize something: almost every plugin has vulnerabilities. Okay, all code has vulnerabilities. But almost every plugin has glaringly obvious vulnerabilities, or at least that can be found without a great deal of effort.

That seems a little bit scary. Of course, many of these aren’t particularly serious. But they demonstrate that the people creating the plugins often don’t understand basic WordPress security.

I know I’m not the only one who realizes this. Probably, most experienced WordPress developers come to this realization sooner or later. But just realizing it and thinking, “I wish it weren’t so,” isn’t going to make things any better. So I’ve decided to do something about it. I’m going to improve plugin security one day at a time. I’m going to try to do something, every day, that will make WordPress plugins more secure.

How will I do this? In many ways. I said that I’m not the only one who understands the situation, and I’m not the only one who’s doing things about it either. The folks on the plugin review team for WordPress.org try to catch the vulnerabilities when a plugin is first submitted to the repo. They also handle security reports and make sure they make it to the plugin authors. There are also the great folks on the WordPress docs team contributing to the plugin developer handbook. Hopefully the security-related things in there will help to educate the next generation of plugin developers about these issues better right from the start.

One of the greatest things I can do is to try to help educate plugin devs about security. I’ll also try to make sure that reports of vulnerabilities make their way to the plugin developer. And I’ll continue to review plugins that I use, and report vulnerabilities that I find. I might then move on to investigating other popular plugins as well. I have even pondered creating a PHP source code security scanner, but that would be quite a project. (There are many of these out there, but none of them are intelligent enough for me.)

Regardless of how, I want to try to do a little something every day to improve WordPress plugin security. If just a few folks did this, how different might things look in a few years? We’ll just have to wait and see.

A Week with HackerOne

About three months ago I signed up with HackerOne, and created a bug bounty program for my WordPoints plugin. I’m writing this post to document my experience with HackerOne, for anyone else who may be thinking of using it.

When you first create your program, it is private. This gives you time to tweak things and gain some familiarity with the system. You also have the opportunity to invite up to 100 of the top hackers to participate and the private pre-launch program. I did invite all 100 (though not all at once), but there wasn’t any activity. That is probably because I hadn’t set a minimum bounty amount yet.

Last Friday I decided to make the program public. This timing roughly corresponded with the release of WordPoints 1.7.0, which included some security fixes that I’d discovered on my own.

What should you expect when you launch publicly? I got 15 bug reports in the first 24 hours, and about a third of them were probably in the first couple hours after launching.

The reason for the immediate spike in activity (of which there had been none previously), is probably due at least in part to my having set a minimum bounty (though this was only $25).

Of those 15 reports, most of them were low quality. The reporter obviously hadn’t read the program description, and didn’t know what kind of bugs I was looking for and what sort of vulnerabilities I would consider invalid. Of those 15, only two were vulnerabilities that actually needed fixing. I’ve received 4 more reports this week, but none of them have been valid source code vulnerabilities either.

So, now you know what you can expect with your first week after launching a bug bounty program on HackerOne. I suspect that if you wanted to avoid the first-minute slew of reports, you could wait until later to set a minimum bounty amount.

All in all, I am very pleased with HackerOne. The UI is great and has the tools you need to respond quickly. I think also that report quality will probably increase as better researchers join in in an unhurried manner. Well, at least if I decide to increase the bounty in the future. :-)

Creating Your Own WordPress Unit Test Factories

WordPress has these things in its PHPUnit test library called factories. Their purpose is to allow you to easily create things, like posts.

You might wonder why that’d be so helpful, since after all, WordPress already provides functions like wp_insert_post(). If you are wondering that, maybe you haven’t written very many unit tests.

The problem with wp_instert_post() et al. is that you have to make up a lot of the post’s attributes, like its title and content. While this can be amusing, it can quickly become boring and time consuming. This is especially so when those fields don’t matter in your test in the first place.

WordPress’s solution to this is to provide these factories in its test cases. When your test case extends WP_UnitTestCase, you have access to the factory property, which is a WP_UnitTest_Factory instance. The factory itself has several properties, like post, which is a WP_UnitTest_Factory_For_Post instance.

So you can create a post just by calling $this->factory->post->create(). You don’t have to worry about the post’s attributes, because they will be generated as needed. And if you do need to set the title, for example, you can easily do that:

$this->factory->post->create( array( 'post_title' => 'My Title' ) );

There are other factories as well, for users, attachments, comments, etc. They pretty much cover everything you’d want a factory for in WordPress.

But sometimes a plugin has its own entities that it needs to create in its unit tests. WooCommerce orders, for example. This can be achieved by creating custom factories. You just need to create your own child of WP_UnitTest_Factory_For_Thing, which all of the factories extend. It has just three abstract methods that you’ll need to create: create_object, update_object, and get_object_by_id. It’s pretty simple to implement these, and they do exactly what you’d expect based on their names.

__construct()

Oh, did I forget to mention the constructor? That is actually one of the most important parts. In your constructor is where you have the opportunity to set up the default values for each of the entities’ properties. For example, in the post factory, the constructor looks like this:

	function __construct( $factory = null ) {
		parent::__construct( $factory );
		$this->default_generation_definitions = array(
			'post_status' => 'publish',
			'post_title' => new WP_UnitTest_Generator_Sequence( 'Post title %s' ),
			'post_content' => new WP_UnitTest_Generator_Sequence( 'Post content %s' ),
			'post_excerpt' => new WP_UnitTest_Generator_Sequence( 'Post excerpt %s' ),
			'post_type' => 'post'
		);
	}

The post status and post type default to scalar values, so that shouldn’t be unfamiliar to you. The really interesting part here is the WP_UnitTest_Generator_Sequences. As you can see, these are constructed with a string that contains a %s placeholder. The string will be used as the content for the default created posts, but the placeholder will be replaced with an integer. That number is from an iterator in the generator that gets incremented each time the field needs to be generated. So the first post title generated will be ‘Post title 1’ and the second will be ‘Post title 2’. This means that the generated fields will be unique, which can be especially good when debugging.

create_object()

The create_object() method is called by the higher-level methods create(), create_and_get(), and create_many(). It is passed an array of arguments, and is expected to return the ID of the object that’s created, or false or a WP_Error if it fails. In the post factory, it looks like this:

	function create_object( $args ) {
		return wp_insert_post( $args );
	}

That’s very simple, isn’t it? The $args have already been merged with the defaults we defined in the constructor before they get passed in, so all we need to do is insert the post.

update_object()

The update_object() method is called by the create() method, strangely enough. It is used to update the object’s fields after applying any callbacks that have been registered (that’s another story for another time). It gets passed the ID of the object to update, and an array of fields to update. The return value should again be false or a WP_Error object.

In the post factory it looks like this:

	function update_object( $post_id, $fields ) {
		$fields['ID'] = $post_id;
		return wp_update_post( $fields );
	}

Again, really simple. We just set the ID as one of the fields, since that’s the way wp_update_post() takes its arguments.

get_object_by_id()

The get_object_by_id() method is called by create_and_get() to retrieve the object once it has been created. It is passed an ID, and is expected to return that object.

In the post factory, it looks like this:

	function get_object_by_id( $post_id ) {
		return get_post( $post_id );
	}

There’s not much to see here either. Just retrieve the post and return it.

Conclusion

All in one piece, the post factory looks like this:

class WP_UnitTest_Factory_For_Post extends WP_UnitTest_Factory_For_Thing {

	function __construct( $factory = null ) {
		parent::__construct( $factory );
		$this->default_generation_definitions = array(
			'post_status' => 'publish',
			'post_title' => new WP_UnitTest_Generator_Sequence( 'Post title %s' ),
			'post_content' => new WP_UnitTest_Generator_Sequence( 'Post content %s' ),
			'post_excerpt' => new WP_UnitTest_Generator_Sequence( 'Post excerpt %s' ),
			'post_type' => 'post'
		);
	}

	function create_object( $args ) {
		return wp_insert_post( $args );
	}

	function update_object( $post_id, $fields ) {
		$fields['ID'] = $post_id;
		return wp_update_post( $fields );
	}

	function get_object_by_id( $post_id ) {
		return get_post( $post_id );
	}
}

You might be amazed that this little bit of code could really be so helpful, but I assure you that it is. It is fully worth creating factories for custom objects if you need to use them much in your tests. They may not all be as simple as this one, but that is only another reason to build it and keep your code DRY.