Whack Data     About     Archive     RSS

Simulate Draining the Ocean in Scala with GeoTrellis

When I first saw Randall Munroe’s Drain the Oceans in his book What If? I was blown away. I could tell immediately that it wasn’t guesswork and certainly wasn’t done with a trivial approach like finding all areas on earth below some elevation, or else all the water would have been drained. I had a hunch about how it was done, and recently I followed that hunch and wrote some software that generated the above animation.

In this post, I’m going to give a overview of how it was done. To make it more digestible, I’ll break it into two parts: the first part will describe how it was done without using code, and the second part will go a bit deeper into the technical side for those looking to repeat it. If I don’t explain something clearly, just leave a comment below and I’ll try to clarify it.

If you'd like to dive straight into the source ahead of time, see: https://github.com/Brideau/DrainTheOcean

Without Code

To get an intuition for how this works, imagine a bath tub that you fill with objects like this, so that some of the objects above the surface of the water but have small lakes, some are fully submerged, and others are surrounded by water like an island:

When you initially pull the plug, what water drains? Well, if you’re looking at it from the top-down, it’s any water on the surface that you could reach with a toy boat starting over top of the drain. Any water that is “in-land” and unreachable would become landlocked and stay:

So, the water that drains needs to have two properties: it needs to be at the same elevation or below the current water level, and it needs to be “reachable” from the drain.

Once the water in the tub has drained 10cm, what does the new surface look like? Well any water that doesn’t drain stays, and new land will be exposed as the water goes down. Combine those two facts and you get this:

Finally, if you wanted to create an animation that showed the bath tub draining, but didn’t have a video camera handy (work with me here…), you’d let the water drain 1cm at a time and snap a photograph each time. You’d just stitch together the photos like a movie and play it forward. That’s really all there is to it.

To estimate the time it would take to drain, you first calculate the approximate volume of water that would drain at each 1cm step. That would just be 1cm * the surface area that drained. You can get the flow rate by using the Bernoulli incompressible flow equation, which applies in scenarios where a tub drains into outside air and takes into account how deep the water is in the tub, since that affects the pressure.

To get time, you just divide the volume you calculated by the rate. If you do this for each step, you can add up the time as you go and include it on each photograph in your video to show how the time progresses as the water decreases.

Now, apply this to the actual planet and you get the fancy video I posted above. Instead of “objects in a bathtub” you can use the actual surface of the Earth above and below water as compiled in a data set like __. And instead of dropping the water level by 1cm at each step, you can drop it by 10m (or less if you can afford a faster computer or have more time).

Is the real world more complicated? It sure is. There’s weather and tides and plate tectonics, though it’s debatable how much these matter. There’d also be some friction in the drain you’d need to account for, and the “place” where it’s draining to might not have the same pressure as the air near Earth’s surface, and some lakes lose more water to evaporation than they get from rivers and rainfall, among other things. But my goal here wasn’t to build a perfect physics model of the earth. It was to get a rough approximation to a visualization I’d seen in a book as a side project.

That all being said, there’s nothing preventing someone taking this model and extending it to account for these things. It’s extremely flexible, and isn’t stretched to its limits by any means. Let’s get into that next.

With Code

For this section, I’ll cover things at a relatively high level as it’d be impossible to both explain how this was done and give an introduction to geospatial analysis in a reasonably short blog post. If you want more detail about how something was done, comment below and I can link you to some resources.

To begin, we need to an elevation surface for Earth in the form of a geospatial raster image. There’s a number of these out there, but I went with the ___ because it had a good resolution and included elevations both above and below the water surface. If you’re unfamiliar with raster files, they are basically images, but instead of recording color on each pixel, they record some piece of information - in this case, elevation.

Next, we need to know where there’s water on the earth currently. You can get that from the Natural Earth data sets for oceans and lakes. As a first step, I filtered out many of the smaller lakes by loading the data set into QGIS, querying all those with an _____ larger than a certain value, and saving this as a new file.


I then used the gdal_rasterize tool to convert each of these from vector format to raster, so that they could be used with the elevation data. In this case, instead of recording elevation, each pixel records a value of 0 or 2 representing the absence or presence of water (I used 2 instead of 1 because I’ll use 1 later to represent the presence of land). To make sure that the pixels of this image aligned with those of the elevation raster, I looked at the metadata of the elevation raster and copied its parameters to use with the gdal_rasterize tool.


The get a raster that combined both oceans and lakes, I used GeoTrellis to generate a new raster that contains water if either of these rasters had water at a given pixel.

To build the draining simulation, we need to figure out a way to see if a certain “pixel” of water would drain. You can do this in two stages: 1) find all parts of the earth less than or equal to the elevation you’re currently at, and 2) of the areas you found in step 1, find out which would be “reachable” from the drain the Marianas trench.

The first step is easy. You can use GeoTrellis to generate a new raster that categorizes each pixel into 0 or 1 based on if the elevation at that pixel is above or below a threshold:


The second part is trickier. What you need to do is use a Flood Fill algorithm to find out what pixels are reachable from a given location. If you ever used MS Paint’s paint bucket tool, you have experience with one of these: you click a point, and it fills all the area that is touching that point that is the same color as it with paint. It’s the same idea here, only you’re “painting” the raster above that was categorized into elevations above and below a given level.

Unfortunately, GeoTrellis doesn’t come with one of these so I had to write one myself, which you can see here. It was based on the work of __ and has a few properties that make it different from many examples you’ll see online:

  • it does not use recursion, which means you can use it on images that are very large without having stack overflow issues
  • it wraps around the image on the left and right side, so that it fills areas on the opposite side of the map, which in reality would be geographically next to each other
  • it is an 8-way flood fill, so it fills pixels that touch on the corners


Ryan Brideau