Markup language

In computer text processing, a markup language is a system for annotating a document in a way that is visually distinguishable from the content. It is used only to format the text, so that when the document is processed for display, the markup language does not appear. The idea and terminology evolved from the "marking up" of paper manuscripts (i.e., the revision instructions by editors), which is traditionally written with a red pen or blue pencil on authors' manuscripts. Such "markup" typically includes both content corrections (such as spelling, punctuation, or movement of content), and also typographic instructions, such as to make a heading larger or boldface.

In digital media, this "blue pencil instruction text" was replaced by tags which ideally indicate what the parts of the document are, rather than details of how they might be shown on some display. This lets authors avoid formatting every instance of the same kind of thing redundantly (and possibly inconsistently). It also avoids the specification of fonts and dimensions which may not apply to many users (such as those with different-size displays, impaired vision and screen-reading software).

Early markup systems typically included typesetting instructions, as troff, TeX and LaTeX do, while Scribe and most modern markup systems name components, and later process those names to apply formatting or other processing, as in the case of XML.

Some markup languages, such as the widely used HTML, have pre-defined presentation semantics—meaning that their specification prescribes some aspects of how to present the structured data on particular media. HTML, like DocBook, Open eBook, JATS and countless others, is a specific application of the markup meta-languages SGML and XML. That is, SGML and XML enable users to specify particular schemas, which determine just what elements, attributes, and other features are permitted, and where.

One extremely important characteristic of most markup languages is that they allow mixing markup directly into text streams. This happens all the time in documents: A few words in a sentence must be emphasized, or identified as a proper name, defined term, or other special item. This is quite different structurally from traditional databases, where it is by definition impossible to have data that is (for example) within a record, but not within any field. Likewise, markup for natural language texts must maintain ordering: it would not suffice to make each paragraph of a book into a "paragraph" record, where those records do not maintain order.