Geoscheme: A Database

The Geoscheme overview showing the 22 of the 26 different areas.

The purpose of this weblog post is to introduce the concept of a database primary key, Without it, a database will not work because data conflicts will arise. A primary key is a single attribute or group of attributes that can uniquely identify an individual record. Part of the challenge of database design, is determining what to use as a primary key.

Often, the solution is simply to generate a number in sequence. In many cases this is effective. In other cases, it might be more expedient to use an existing code, that can access more information, when that is needed. In many cases, geographical information is wanted, that is not contained in a database. such as the name of a mayor of a city, or a chronological list of mayors.

Geoscheme is my way of organizing geographical data. The land area of the Earth is divided into 26 different regions, labelled A to Z. Each region is populated by countries that can be further sub-divided. This process can continue into smaller and smaller units.

When a data set about a geographic area is being assembled for inclusion in a database, it is important to assess existing keys to see if one distinguishes itself from others. For geographical jurisdictions, this can be an agreed upon international code, for subdivisions of countries national or other codes may be appropriate.

I have used Geoscheme for so many years, that its origins are lost in the depths of time. Needless to say, the map with regions did not originate with me. I have simply appropriated it for my own purposes. The alphabetic coding, on the other hand, is something I recall creating.

Africa: [A] Southern Africa; [B] Eastern Africa; [C] Middle Africa; [D] Western Africa; [E] Northern Africa. Europe: [F] Southern Europe; [G] Western Europe; [H] Northern Europe; [I] Eastern Europe + North Asia. Asia: [J] Western Asia; [K] Central Asia; [L] Southern Asia; [M] Eastern Asia; [N] South-Eastern Asia. Oceania: [O] Australia and New Zealand; [P] Melanesia; [Q] Micronesia; [R] Polynesia. Americas: [S] Northern America; [T] Caribbean; [U] Central America; [V] South America. Other: [W] Antarctica; [X] Atlantic Ocean; [Y] Pacific Ocean; [Z] Indian Ocean.

My current work with Geoscheme is the collection of outline maps and flags for most countries, often using Fortnight Insider as a source of black on white maps. It provides answers to the Worldle game, that I lost interest in playing, that offer white on black maps.

ISO 3166-1 alpha-3 codes are currently used in Geoscheme as a primary key for countries. These three-letter country codes are defined in the ISO 3166 standard published by the International Organization for Standardization (ISO), to represent countries, dependent territories, and special areas of geographical interest. There is also a two-letter coding system, referred to as ISO 3166-1 alpha-2 codes. The 3 letter codes give a better visual association between the codes and the country names than the two-letter codes. There is also a purely numeric code that offers no visual association. ISO 3166 became a standard in 1974. It is updated at irregular intervals. Some of the codes used are: CAN (Canada); MUS (Mauritius); NOR (Norway); TWN (Taiwan); UKA (Ukraine); URY (Uruguay); USA (United States of America).

Since ISO does not allow duplicate 3166 codes to be used, there are no issues using them as primary keys.

As noted, it is possible to expand these areas to use codes to define areas that are smaller than an unit with a 3-letter code. ISO 3166-2 defines codes for identifying the principal subdivisions. These use two-letter country codes, as well as two-character subdivisions. Thus, the province of British Columbia in Canada is CA-BC; The Moka district in Mauritius is MU-MO; Trøndelag county in Norway is NO-50. It was the result of an amalgamation of North (-17) and South (-16 ) Trøndelag on 2018-01-01; the state of Michigan in the United States is US-MI.

This approach can also work at lower levels. Inderøy municipality has its own municipality number. These representations are not written in stone. Municipality numbers were first introduced with the Norwegian census of 1946. Even municipalities that had dissolved before then, were given municipality numbers, that could be used for statistical purposes. Municipality numbers use four digits, with the first two being the county number.

Inderøy municipality was officially founded in 1837. The municipalities of Hustad and Røra were established on 1907-01-01 when the old municipality of Inderøy was divided into three municipalities: Røra (population: 866) with municipality number 1730, in the southeast, Hustad (population: 732) with municipality number 1728, in the north, and Inderøy (population: 2 976) with municipality number 1730, in the west. In 1912, Hustad changed its name to Sandvollan, but retained municipality number 1728. During the 1960s, there were many municipal mergers across Norway. On 1962-01-01, the three neighboring municipalities of Røra (population: 1003), Sandvollan (population: 750), and Inderøy (population: 3 194) to form a new, larger municipality of Inderøy.

Mosvik and Verran formed a municipality in 1867 that lasted until 1901, when Verran (population: 1 456) became its own municipality. Mosvik (population: 969) had retained the old municipality number, 1723. Adding to the confusion, 1968, the Framverran area on the south side of the Verrasundet strait (population: 395) was transferred from Verran municipality to Mosvik municipality. When Mosvik (population: 811) joined Inderøy in 2012, this newest iteration of Inderøy were assigned municipality number 1756. This lasted until 2018, when it became municipality number 5053.

These amalgamations, splits and transfers are mentioned in detail, because this is the reality of geography in the world. Situations change, and people interested in geographic realities have to be aware of the changes and their consequences. One cannot assume that boundaries are fixed.

Since primary keys are generally confined to database operations, there is no problem making artificial constructs as keys. One example is combining a 3-letter country code with a 2-letter subdivision code, even if this is not an acceptable international standard.

Geographical information about a country/ sub-division can contain a variety of information, that have to be formatted correctly. A jurisdiction name, or the name of its capital are generally a sequence of letters. Its population and its area in square kilometers are often integers. Typically when information about a country is assembled it occupies a single row in a table, but where every column will be formatted to accommodate the data collected.

Some people ask, why not just use longitude and latitude as a primary key? In such a system, the prime meridian and the equator dividing the world into four Eurocentric mathematical quadrants. So that: lines of longitude north of the equator are positive (+) from 0 at the equator to 90° at the north pole, while those south of the equator are negative (-) from 0 at the equator to 90° at the south pole; lines of latitude east of the prime meridian are positive (+) from 0 to 180° in the middle of the Pacific ocean , while those west of it are negative (-) from 0 to 180°at that same position in the middle of the Pacific ocean. One of the major problems with a geographical jurisdiction, is that it occupies an area not a point. So point data is uninteresting, and difficult to specify.

Another approach is to codify a small area. Because of radio interference issues, amateur radio operators are less interested in precision than a short code that gives an approximate position that is gudenuf. John Morris G4ANB originally devised such a system and it was adopted at a meeting of the International Amateur Radio Union (IARU) Very High Frequency (VHF) Working Group in Maidenhead, England, in 1980. The Maidenhead locator has an interesting historical development. A sub-square can be described using two letters, then two digits, ending with two more letters. Two points within the same Maidenhead sub-square are always less than 10.4 km (6.5 mi) apart, which means a Maidenhead locator can give adequate precision from only six easily transmissible characters. There is no guarantee that a Maidenhead sub-square will be located in the same country. EN82lh is such an example. In the north of this map, one finds Detroit, Michigan, USA while the south of the map is in Windsor, Ontario, Canada.

This map shows the EN82 Maidenhead square, that is divided between USA (more to the left) and Canada (more to the right). A blue line indicates the boundary. The EN82lh sub-square is found where lower parts of the letters etro are found in Detroit. The map origin is at the bottom left, with the first square labelled aa. The first lower-case letter (l) indicates the position of a sub-square to the right of origin, while the second lower-case letter (h) indicates its position above the origin. So the sub-square lh occupies the 12th column from the left, and the 8th row from the bottom.

Another approach is to use what3words, which has given every 3m square (9 m2) in the world a unique 3 word address. The words are randomly assigned, but will always remain the same.

Cliff Cottage is located at 63° 50′ 31.596” N and 11° 5′ 26.178” E which converts to 63.8421098 N and 11.0906046 E in decimal format. It occupies Maidenhead sub-square JP53nu. Its What3words are casual.year.messaging (in the middle of the living room), conqueror.lawn.consented (in the middle of the kitchen), popular.feuds.positives (in Trish’s work room) and hides.lake.proclaims (in my work area). The multiplicity of codes for a single dwelling creates its own problems.

While a well designed database-engine can ease the workload of creating data-structures and algorithms, and running a database, database administrators study the types of data that are needed. Some of the most difficult decisions involve finding ways to structure the database content so that a collection of data values, relationships between them, and operations/ manipulations/ functions that can be applied to them, work for the benefit of users. Once that is done, users can concentrate their time on adding/ editing/ deleting data that can go inside a database, and transforming data into valuable information.

Leave a Reply

Your email address will not be published.