# TerraFlow Background

Flow Direction | Flow Accumulation | Scalability

Terrains are represented as digital elevation models (DEMs). Typically, a DEM represents the spatial distribution of elevations in a terrain, but in general it can represent any of a number of attributes. Much of the terrain data encountered in GIS applications is obtained from remote sensing devices and is in raster or grid form, in which the coordinates of the data correspond to a uniform grid, and the elevations (or other data) are given for each cell in the grid. In addition to grids, the two other ways of representing DEMs are triangulated irregular networks (TINs) and contour lines. Among the terrain models, grids are the most commonly used because of their simplicity and because data is readily available in this form. On the other hand, TINs often use less space than grid-based DEMs.

A grid-based DEM is a 2D-array of elevations sampled uniformly from the terrain.We refer to each value in the grid as a cell. The size of a cell in the grid is the resolution at which the grid was sampled, and its perimeter is four times its size. The neighbors of a grid cell p are the 8 (or fewer, if the cell is on the edge of the grid) cells around p. The gradient of the terrain towards each neighbor is denoted as tan beta, and can be computed as the ratio between the height difference of the cells and the horizontal distance between them. A downslope neighbor is a neighbor with strictly lower elevation. The steepest downslope neighbor is the downslope neighbor with the largest gradient.

An index of the terrain is a function computed for each cell of the terrain. Some basic indices in terrain analysis are the flow directions and flow accumulation. The flow directions of a cell correspond to the directions in which water would flow if poured at that cell onto the terrain. Flow accumulation depends on the flow directions, which need to be known prior to computing flow accumulation.

Once flow direction and flow accumulation have been computed, many other indices of the terrain can be estimated based on them, including watersheds, drainage network and topographic convergence index. The drainage network consists of all grid cells for which the flow accumulation is higher than a certain threshold, and the topographic convergence index (TCI), which quantifies the liklihood of saturation, is defined as the logarithm of the ratio of the flow accumulation to the local slope at a cell.

#### Flow Direction

The flow directions of a cell correspond to the directions in which water would flow if poured at that cell onto the terrain. The flow routing problem can be defined as the problem of assigning flow directions to all cells in the DEM such that: (1) flow directions do not induce any cycles; (2) every cell has a flow path off the edge of the terrain. Although different flow models may define flow direction differently, the flow directions of a cell are always towards neighbor cells which are downslope or at the same height. In other words, water cannot go uphill.

The single-flow direction (SFD) model defines the flow direction to be the direction of the steepest downslope neighbor. The multi-flow direction (MFD) model allows multiple flow directions for a cell, most commonly to all downslope neighbors. The single flow approach has the disadvantage that it discretizes the flow into only one 8 possible directions. Multiple flow routing has the disadvantage that the flow from a cell is dispersed to all neighbors of lower elevation, resulting in a more diffuse flow of water.

Flow direction to steepest downslope neighbor (SFD). Flow direction to all downslope neighbors (MFD).  A cell with downslope neighbors may determine its flow direction(s) by simply examining adjacent cells. A cell without any strictly downslope neighbors is flat. It can not determine its flow direction just by looking at adjacent cells; a more complicated procedure is required. A cell is called flat if: (1) it has height less than or equal to all its neighbors; or (2) it has a neighbor of same height which satisfies (1). A flat area is a maximal set of adjacent flat cells. Note that one-cell-wide ridges are not flat areas, but one-cell-wide channels are. The flat area has a spill-point if it contains a cell which has a downslope neighbor.

The intuition is that flow should be routed such that water flows towards the spill-points of the flat area. If the flat area has no spill-points then we call it a sink. For instance, the bottom of a bowl is a sink. Sinks are dead-ends for flow because it cannot be routed further. The most commonly used approach is to preprocess the terrain in order to ``fill'' all the sinks such that each cell has a valid flow path off the border of the terrain. Thus sinks become lakes. This step is part of flow routing process and is refered to as flooding.

#### Flow Accumulation

The flow accumulation of a terrain is an index which estimates the surface runoff for each cell in the terrain. The flow accumulation of a cell is defined as the total area of the grid cells which flow through that cell per unit width of contour. To compute the flow accumulation we assume that every grid cell initially has one unit of flow (water) and that the flow, initial as well as incoming, at a grid cell is distributed according to the precomputed flow directions. The flow accumulation can then be computed as the total amount of water flowing through each grid cell of the terrain multiplied by some constant factor.

Kaweah Basin DEM. Kaweah Basin Flow Accumulation.  #### Scalability

The availability of geospatial data for larger areas at higher resolutions exposes scalability problems with existing GIS algorithms. When processing such large amounts of data (bigger than main memory of the machine being used) the Input/Output communication (or I/O) between fast internal memory and slow external storage such as disks, rather than internal computation time, becomes the bottleneck in the computation. Unfortunately, most GIS algorithms, including the ones in commercial GIS packages, are designed to minimize internal computation time and consequently they often do not scale to large datasets.

<laura@cs.duke.edu>