Working with Spatial Data

Thông tin tài liệu

C H A P T E R 10    Working with Spatial Data The addition of spatial capabilities was one of the most exciting new features introduced in SQL Server 2008. Although generally a novel concept for many SQL developers, the principles of working with spatial data have been well established for many years. Dedicated geographic information systems (GISs), such as ARC/INFO from ESRI, have existed since the 1970s. However, until recently, spatial data analysis has been regarded as a distinct, niche subject area, and knowledge and usage of spatial data has remained largely confined within its own realm rather than being integrated with mainstream development. The truth is that there is hardly any corporate database that does not store spatial information of some sort or other. Customers’ addresses, sales regions, the area targeted by a local marketing campaign, or the routes taken by delivery and logistics vehicles all represent spatial data that can be found in many common applications. In this chapter, I’ll first describe some of the fundamental principles involved in working with spatial data, and then discuss some of the important features of the geometry and geography datatypes, which are the specific datatypes used to represent and perform operations on spatial data in SQL Server. After demonstrating how to use these methods to answer some common spatial questions, I’ll then concentrate on the elements that need to be considered to create high-performance spatial applications.  Note Working with spatial data presents a unique set of challenges, and in many cases requires the adoption of specific techniques and understanding compared to other traditional datatypes. If you’re interested in a more thorough introduction to spatial data in SQL Server, I recommend reading Beginning Spatial with SQL Server 2008, one of my previous books (Apress, 2008). Modeling Spatial Data Spatial data describes the position, shape, and orientation of objects in space. These objects might be tangible, physical things, like an office building, railroad, or mountain, or they might be abstract features such as the imaginary line marking the political boundary between countries or the area served by a particular store. SQL Server adopts a vector model of spatial data, in which every object is represented using one or more geometries—primitive shapes that approximate the shape of the real-world object they represent. There are three basic types of geometry that may be used with the geometry and geography datatypes: Point, LineString, and Polygon: 283 CHAPTER 10  WORKING WITH SPATIAL DATA • A Point is the most fundamental type of geometry, representing a singular location in space. A Point geometry is zero-dimensional, meaning that it has no associated area or length. • A LineString is comprised of a series of two or more distinct points, together with the line segments that connect those points together. LineStrings have a length, but no associated area. A simple LineString is one in which the path drawn between the points does not cross itself. A closed LineString is one that starts and ends at the same point. A LineString that is both simple and closed is known as a ring. • A Polygon consists of an exterior ring, which defines the perimeter of the area of space contained within the polygon. A polygon may also specify one or more internal rings, which define areas of space contained within the external ring but excluded from the Polygon. Internal rings can be thought of as “holes” cut out of the Polygon. Polygons are two-dimensional—they have a length measured as the total length of all defined rings, and also an area measured as the space contained within the exterior ring (and not excluded by any interior rings).  Note The word geometry has two distinct meanings when dealing with spatial data in SQL Server. To make the distinction clear, I will use the word geometry (regular font) as the generic name to describe Points, LineStrings, and Polygons, and geometry (code font) to refer to the geometry datatype. Sometimes, a single feature may be represented by more than one geometry, in which case it is known as a GeometryCollection. GeometryCollections may be homogenous or heterogeneous. For example, the Great Wall of China is not a single contiguous wall; rather, it is made up of several distinct sections of wall. As such, it could be represented as a MultiLineString—a homogenous collection of LineString geometries. Similarly, many countries, such as Japan, may be represented as a MultiPolygon—a GeometryCollection consisting of several polygons, each one representing a distinct island. It is also possible to have a heterogeneous GeometryCollection, such as a collection containing a Point, three LineStrings, and two Polygons. Figure 10-1 illustrates the three basic types of geometries used in SQL Server 2008 and some examples of situations in which they are commonly used. Having chosen an appropriate type of geometry to represent a given feature, we need some way of relating each point in the geometry definition to the relevant real-world position it represents. For example, to use a Polygon geometry to represent the US Department of Defense Pentagon building, we need to specify that the five points that define the boundary of the Polygon geometry relate to the location of the five corners of the building. So how do we do this? You are probably familiar with the terms longitude and latitude, in which case you may be thinking that it is simply a matter of listing the relevant latitude and longitude coordinates for each point in the geometry. Unfortunately, it’s not quite that simple. 284 CHAPTER 10  WORKING WITH SPATIAL DATA Figure 10-1. Different types of geometries and their common uses What many people don’t realize is that any particular point on the earth’s surface does not have only one unique latitude or longitude associated with it. There are, in fact, many different systems of latitude and longitude, and the coordinates of a given point on the earth will vary depending on which system is used. Furthermore, latitude and longitude coordinates are not the only way of expressing positions on the earth—there are other types of coordinates that define the location of an object without using latitude and longitude at all. In order to understand how to specify the coordinates of a geometry, we first need to examine how different spatial reference systems work. 285 CHAPTER 10  WORKING WITH SPATIAL DATA Spatial Reference Systems A spatial reference system is a system designed to unambiguously identify and describe the location of any point in space. This ability is essential to enable spatial data to store the coordinates of geometries used to represent features on the earth. To describe the positions of points in space, every spatial reference system is based on an underlying coordinate system. There are many different types of coordinate systems used in various fields of mathematics, but when defining geospatial data in SQL Server 2008, you are most likely to use a spatial reference system based on either a geographic coordinate system or a projected coordinate system. Geographic Coordinate Systems In a geographic coordinate system, any position on the earth’s surface can be defined using two angular coordinates: • The latitude coordinate of a point measures the angle between the plane of the equator and a line drawn perpendicular to the surface of the earth at that point. • The longitude coordinate measures the angle in the equatorial plane between a line drawn from the center of the earth to the point and a line drawn from the center of the earth to the prime meridian. Typically, geographic coordinates are measured in degrees. As such, latitude can vary between –90° (at the South Pole) and +90° (at the North Pole). Longitude values extend from –180° to +180°. Figure 10-2 illustrates how a geographic coordinate system can be used to identify a point on the earth’s surface. Projected Coordinate Systems In contrast to the geographic coordinate system, which defines positions on a three-dimensional, round model of the earth, a projected coordinate system describes positions on the earth’s surface on a flat, two-dimensional plane (i.e., a projection of the earth’s surface). In simple terms, a projected coordinate system describes positions on a map rather than positions on a globe. If we consider all of the points on the earth’s surface to lie on a flat plane, we can define positions on that plane using familiar Cartesian coordinates of x and y (sometimes referred to as Easting and Northing), which represent the distance of a point from an origin along the x axis and y axis, respectively. Figure 10-3 illustrates how the same point illustrated in Figure 10-2 could be defined using a projected coordinate system. 286 CHAPTER 10  WORKING WITH SPATIAL DATA Figure 10-2. Describing a position on the earth using a geographic coordinate system Figure 10-3. Describing a position on the earth using a projected coordinate system 287 CHAPTER 10  WORKING WITH SPATIAL DATA Applying Coordinate Systems to the Earth A set of coordinates from either a geographic or projected coordinate system does not, on its own, uniquely identify a position on the earth. We need to know additional information, such as where to measure those coordinates from and in what units, and what shape to use to model the earth. Therefore, in addition to specifying the coordinate system used, every spatial reference system must also contain a datum, a prime meridian, and a unit of measurement. Datum A datum contains information about the size and shape of the earth. Specifically, it contains the details of a reference ellipsoid and a reference frame, which are used to create a geodetic model of the earth onto which a coordinate system can be applied. The reference ellipsoid is a three-dimensional shape that is used as an approximation of the shape of the earth. Although described as a reference ellipsoid, most models of the earth are actually an oblate spheroid—a squashed sphere that can be exactly mathematically described by two parameters—the length of the semimajor axis (which represents the radius of the earth at the equator) and the length of the semiminor axis (the radius of the earth at the poles), as shown in Figure 10-4. The degree by which the spheroid is squashed may be stated as a ratio of the semimajor axis to the difference between the two axes, which is known as the inverse-flattening ratio. Different reference ellipsoids provide different approximations of the shape of the earth, and there is no single reference ellipsoid that provides a best fit across the whole surface of the globe. For this reason, spatial applications that operate at a regional level tend to use a spatial reference system based on whatever reference ellipsoid provides the best approximation of the earth’s surface for the area in question. In Britain, for example, this is the Airy 1830 ellipsoid, which has a semimajor axis of 6,377,563m and a semiminor axis of 6,356,257m. In North America, the NAD83 ellipsoid is most commonly used, which has a semimajor axis of 6,378,137m and a semiminor axis of 6,356,752m. The reference frame defines a set of locations in the real world that are assigned known coordinates relative to the reference ellipsoid. By establishing a set of points with known coordinates, these points can then be used to correctly line up the coordinate system with the reference ellipsoid so that the coordinates of other, unknown points can be determined. Reference points are normally places on the earth’s surface itself, but they can also be assigned to the positions of satellites in stationary orbit around the earth, which is how the WGS84 datum used by global positioning system (GPS) units is realized. Prime Meridian As defined earlier, the geographic coordinate of longitude is the angle in the equatorial plane between the line drawn from the center of the earth to a point and the line drawn from the center of the earth to the prime meridian. Therefore, any spatial reference system must state its prime meridian—the axis from which the angle of longitude is measured. It is a common misconception to believe that there is a single prime meridian based on some inherent fundamental property of the earth. In fact, the prime meridian of any spatial reference system is arbitrarily chosen simply to provide a line of zero longitude from which all other coordinates of longitude can be measured. One commonly used prime meridian passes through Greenwich, London, but there are many others. If you were to choose a different prime meridian, the value of every longitude coordinate in a given spatial reference system would change. 288 CHAPTER 10  WORKING WITH SPATIAL DATA Figure 10-4. Properties of a reference ellipsoid Projection A projected coordinate reference system allows you to describe positions on the earth on a flat, two- dimensional image of the world, created as a result of projection. There are many ways of creating such map projections, and each one results in a different image of the world. Some common map projections include Mercator, Bonne, and equirectangular projections, but there are many more. It is very important to realize that, in order to represent a three-dimensional model of the earth on a flat plane, every map projection distorts the features of the earth in some way. Some projections attempt to preserve the relative area of features, but in doing so distort their shape. Other projections preserve the properties of features that are close to the equator, but grossly distort features toward the poles. Some compromise projections attempt to balance distortion in order to create a map in which no one 289 CHAPTER 10  WORKING WITH SPATIAL DATA aspect is distorted too significantly. The magnitude of distortion of features portrayed on the map is normally related to the extent of the area projected. For this reason, projected spatial reference systems tend to work best when only applied to a single country or smaller area, rather than a full world view. Since the method of projection affects the features on the resulting map image, coordinates from a projected coordinate system are only valid for a given projection. Spatial Reference Identifiers The most common spatial reference system in global usage uses a geographic coordinate based on the WGS84 datum, which has a reference ellipsoid of radius 6,378,137m and an inverse-flattening ratio of 298.257223563. Coordinates are measured in degrees, based on a prime meridian of Greenwich. This system is used by handheld GPS devices, as well as many consumer mapping products, including Google Earth and Bing Maps APIs. Using the Well-Known Text (WKT) format, which is the industry standard for such information (and the system SQL Server uses in the well_known_text column of the sys.spatial_references table), the properties of this spatial reference system can be expressed as follows: GEOGCS[ "WGS 84", DATUM[ "World Geodetic System 1984", ELLIPSOID[ "WGS 84", 6378137, 298.257223563 ] ], PRIMEM["Greenwich", 0], UNIT["Degree", 0.0174532925199433] ] Returning to the example at the beginning of this chapter, using this spatial reference system, we can describe the approximate location of each corner of the US Pentagon building as a pair of latitude and longitude coordinates as follows: 38.870, -77.058 38.869, -77.055 38.871, -77.053 38.873, -77.055 38.872, -77.058 Note that, since we are describing points that lie to the west of the prime meridian, the longitude coordinate in each case is negative. Now let’s consider another spatial reference system—the Universal Transverse Mercator (UTM) Zone 18N system, which is a projected coordinate system used in parts of North America. This spatial reference system is based on the 1983 North American datum, which has a reference ellipsoid of 6,378,137m and an inverse-flattening ratio of 298.257222101. This geodetic model is projected using a transverse Mercator projection, centered on the meridian of longitude 75°W, and coordinates based on the projected image are measured in meters. The full properties of this system are expressed in WKT format as follows: 290 CHAPTER 10  WORKING WITH SPATIAL DATA PROJCS[ "NAD_1983_UTM_Zone_18N", GEOGCS[ "GCS_North_American_1983", DATUM[ "D_North_American_1983", SPHEROID[ "GRS_1980", 6378137, 298.257222101 ] ], PRIMEM["Greenwich",0], UNIT["Degree", 0.0174532925199433] ], PROJECTION["Transverse_Mercator"], PARAMETER["False_Easting", 500000.0], PARAMETER["False_Northing", 0.0], PARAMETER["Central_Meridian", -75.0], PARAMETER["Scale_Factor", 0.9996], PARAMETER["Latitude_of_Origin", 0.0], UNIT["Meter", 1.0] ] Using this spatial reference system, the same five points of the Pentagon building can instead be described using the following coordinates: 321460, 4304363 321718, 4304246 321896, 4304464 321728, 4304690 321465, 4304585 Comparing these results clearly demonstrates that any coordinate pair only describes a unique location on the earth when stated with the details of the coordinate system from which they were obtained. However, it would be quite cumbersome if we had to write out the full details of the datum, prime meridian, unit of measurement, and projection details every time we wanted to quote a pair of coordinates. Fortunately, there is an established set of spatial reference identifiers (SRIDs) that provide a unique integer code associated with each spatial reference system. The two spatial reference systems used in the preceding examples are represented by SRID 4326 and SRID 26918, respectively. Every time you state an item of spatial data using the geography or geometry types in SQL Server 2008, you must state the corresponding SRID from which the coordinate values were obtained. What’s more, since SQL Server does not provide any mechanism for converting between spatial reference systems, if you want to perform any calculations involving two or more items of spatial data, each one must be defined using the same SRID. If you don’t know the SRID associated with a set of coordinates—say, you looked up some latitude and longitude coordinates from a web site that didn’t state the system used—the chances are more than likely that they are geographic coordinates based on SRID 4326, the system used by GPSs. 291 CHAPTER 10  WORKING WITH SPATIAL DATA  Note To find out the SRID associated with any given spatial reference system, you can use the search facility provided at www.epsg-registry.org . Geography vs. Geometry Early Microsoft promotional material for SQL Server 2008 introduced the geography datatype as suitable for “round-earth” data, whereas the geometry datatype was for “flat-earth” data. These terms have since been repeated verbatim by a number of commentators, with little regard for explaining the practical meaning of “flat” or “round.” A simple analogy might be that, in terms of geospatial data, the geometry datatype operates on a map, whereas the geography datatype operates on a globe. With that distinction in mind, one obvious difference between the datatypes concerns the types of coordinates that can be used with each: • The geography datatype requires data to be expressed using latitude and longitude coordinates, obtained from a geographic coordinate system. Furthermore, since SQL Server needs to know the parameters of the ellipsoidal model onto which those coordinates should be applied, all geography data must be based on one of the spatial reference systems listed in the sys.spatial_reference_systems system table. • The geometry datatype operates on a flat plane, which makes it ideal for dealing with geospatial data from projected coordinate systems, including Universal Transverse Mercator (UTM) grid coordinates, national grid coordinates, or state plane coordinates. However, there are occasions when you may wish to store latitude and longitude coordinates using the geometry datatype, as I’ll demonstrate later this chapter. The geometry datatype can also be used to store any abstract nonspatial data that can be modeled as a pair of floating point x, y coordinates, such as the nodes of a graph. This distinction between coordinate types is not the only property that distinguishes the two datatypes. In the following sections I’ll analyze some of the other differences in more detail.  Note Both the flat plane used by the geometry datatype and the curved ellipsoidal surface of the geography datatype are two-dimensional surfaces, and a position on those surfaces can be described using exactly two coordinates (latitude and longitude for the geography datatype, or x and y for the geometry datatype). SQL Server 2008 also allows you to store Z and M coordinates, which can represent two further dimensions associated with each point (typically, Z is elevation above the surface, and M is a measure of time). However, while these values can be stored and retrieved, none of the methods provided by the geography or geometry datatypes account for the value of Z and M coordinates in their calculations. 292 [...]... associated with each point can be obtained using the Lat and Long properties of the location column instead Querying Spatial Data Now that we’ve imported a good-sized set of sample data, we can start to write some spatial queries against it, but there are a few important considerations when writing queries that involve geometry or 302 CHAPTER 10 WORKING WITH SPATIAL DATA geography data Because spatial data. .. subject, or refer to Books Online To see how the database engine uses the grid to satisfy a spatial query, consider the example from the previous section to retrieve all those points contained within the bounding box of a map: 313 CHAPTER 10 WORKING WITH SPATIAL DATA Figure 10-11 The multilevel grid used by spatial indexes 314 CHAPTER 10 WORKING WITH SPATIAL DATA DECLARE @BoundingBox geography; SET @BoundingBox... importing that data in the first place Since SQL Server cannot import invalid geography data, you may have to rely on external tools to validate and fix any erroneous data prior to importing it Creating Spatial Data The first challenge presented to many users new to the spatial features in SQL Server 2008 is how to get spatial data into the database Unfortunately, the most commonly used spatial format,... to preview the data in the file, which should appear as shown in Figure 10-7 Then click Advanced from the left pane On the Advanced pane, click each column name in turn, and configure the column properties to match the values shown in Table 10-1 299 CHAPTER 10 WORKING WITH SPATIAL DATA Figure 10-7 Previewing data downloaded from the Geonames web site 300 CHAPTER 10 WORKING WITH SPATIAL DATA Table 10-1... conventional datatypes In fact, columns of geometry and geography data can only be added to a spatial index, and spatial indexes can only be used for those two types of data To understand why, in this section I’ll first provide an overview of spatial indexing, and then look at some of the ways of optimizing a spatial index to provide optimal performance for your spatial applications How Does a Spatial Index... geometry datatype will become less accurate than results obtained using the geography datatype over large distances In global spatial applications, the geography datatype is a more suitable choice, as there are few projected systems that can be used for general global purposes with sufficient accuracy For storing spatial data contained within a single country or smaller area, the geometry datatype... allCountries WITH( INDEX(idxallCountries)) WHERE location.Filter(@SearchArea) = 1; SELECT TOP 3 * FROM @Candidates ORDER BY Distance; 305 CHAPTER 10 WORKING WITH SPATIAL DATA Note Notice that in the preceding example query, I explicitly included an index hint, WITH( INDEX(idxallCountries)), to ensure that the query optimizer chooses a plan using the idxallCountries spatial index The cost-based estimates for spatial. .. functionality afforded by the geometry datatype offsets its associated loss of accuracy 312 CHAPTER 10 WORKING WITH SPATIAL DATA However, you should exercise great caution when using the geometry datatype to store geographic data in this way, because you may receive surprising results from certain operations Consider the fact that the STArea() and STLength() methods of the geometry datatype return results in the... developers we have to reluctantly accept that spatial data, like any other data, is rarely as perfect as we would like This means that you will frequently encounter invalid data where, for example, Polygons do self-intersect Rather perversely, perhaps, the geometry datatype, which conforms to OGC standards, is also the datatype that provides options for dealing with data that fails to meet those standards... intended to contain, and including everything else In geometry, data ring orientation is not significant, as there is no ambiguity as to the area contained within a Polygon ring on a flat plane 295 CHAPTER 10 WORKING WITH SPATIAL DATA A final technical difference concerns invalid geometries In an ideal world, we would always want our spatial data to be “valid”—that is, it meeting all the OGC specifications . CHAPTER 10  WORKING WITH SPATIAL DATA Figure 10-7. Previewing data downloaded from the Geonames web site 300 CHAPTER 10  WORKING WITH SPATIAL DATA Table. to examine how different spatial reference systems work. 285 CHAPTER 10  WORKING WITH SPATIAL DATA Spatial Reference Systems A spatial reference system

Ngày đăng: 05/10/2013, 08:48

Xem thêm: Working with Spatial Data, Working with Spatial Data

Working with Spatial Data

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan