The Unicode Standard, Version 4.0
|
| List Price: | CDN$ 81.99 |
| Price: | CDN$ 51.65 & eligible for FREE Super Saver Shipping on orders over $39. Details |
Availability: Not yet published
Ships from and sold by Amazon.ca
6 new or used available from CDN$ 37.18
Average customer review:Product Description
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. Unicode is changing all that! The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc. It is supported in many operating systems, all modern browsers, and many other products.
Product Details
- Amazon Sales Rank: #629424 in Books
- Published on: 2003-09-06
- Original language: English
- Binding: Hardcover
- 1504 pages
Editorial Reviews
From the Inside Flap
This book, The Unicode Standard, Version 4.0, is the authoritative source of information on the Unicode character encoding standard.
- extensive additions of CJK characters to cover dictionaries and historic usage
- many new symbols for mathematical and technical publication
- substantially improved specification of conformance requirements, incorporating the character encoding model
- encoding of supplementary characters
- formalized policies for stability of the standard
- clarification of semantics of special characters, including the byte order mark
- major expansion of Unicode Character Database properties and of specifications for text boundaries and casing
- more minority scripts, including Limbu, Tai Le, Osmanya, and Philippine scripts
- more historic scripts, including Linear B, Cypriot, and Ugaritic
- tightened definition of encoding terms, including UTF-32
- U+0416 is the Unicode code point for the character named .
- The range U+0900-U+097F contains 128 Unicode code points.
- The Plane 16 private use characters are in the range 100000..10FFFD.
- A literal code point
- A range of literal code points
- A set of code points having a given Unicode character property value, as defined in the Unicode Character Database (see PropertyAliases.txt and PropertyValueAliases.txt)
- Non-boolean properties given as an expression = or A, , such as "General_Category=Titlecase_Letter"
- Boolean properties given as an expression = true or
- A, true, such as "Uppercase=true"
- Combinations of logical operations on classes
:= "" item_list ""
:= "" property ("=" "A,") property_value ""
item_list := item (","? item)?
item := code_point // either literal or escaped
:= code_point - code_point // inclusive range
Whenever any character could be interpreted as a syntax character, it must be escaped. Where no ambiguity would result (with normal operator precedence), extra square brackets can be discarded. If a space character is used as a literal, it is escaped. Examples are found in Table 0-2, Character Class Examples.Symbols MeaningFor more information about character classes, see Unicode Technical Report #18, "Unicode Regular Expression Guidelines."OperatorsOperators used in this standard are listed in Table 0-3, Operators.0.4 ResourcesThe Unicode Consortium provides a number of online resources for obtaining information and data about the Unicode Standard, as well as updates and corrigenda. They are listed below.Box 391476
Mountain View, CA 94039-1476
USA
Please check the Web site for up-to-date contact information, including telephone, fax, and courier delivery address.
0321185781P05142003
From the Back Cover
The authoritative guide to universal character encoding
The official way to implement ISO/IEC 10646
The key to advancing global interoperability in information technology products
The Unicode Standard provides a unique code number for every character in electronic text, no matter what the platform, no matter what the application, no matter what the language. It is required for XML and is at the core of modern software products. Unicode 4.0 contains 96,248 characters covering languages of the world. The Unicode Standard contains extensive descriptions of each writing system, as well as definitions of character properties and detailed conformance requirements. It is the complete and definitive user's guide for novices and experts alike.
This edition, The Unicode Standard, Version 4.0, adds 47,188 new characters for minority and historic scripts, several sets of symbols, and a very large collection of additional CJK ideographs. It provides updated specifications covering structure, conformance, character behavior and semantics, as well as implementation guidelines, detailed discussions of writing systems, comprehensive charts, and an extensive glossary. The accompanying CD-ROM includes the text of all the Unicode Standard Annexes and the entire Unicode Character Database.
0321185781B07232003
About the Author
The Unicode Consortium is a non-profit organization founded to develop, extend, and promote the use of the Unicode Standard. The membership of the Consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. The Unicode Consortium actively cooperates with many of the leading standards development organizations, including ISO/IEC JTC1, W3C, IETF, and ECMA.
0321185781AB07232003
Customer Reviews
New version of one of the most-used standards
One reason for the wide acceptance of the Unicode standard is that the Unicode consortium has made it so freely available. There's no point in my discussing in detail what is in this volume when you can peruse PDF files of the entire work on the Unicode website (minus only chapter division graphics).
Browse through the book just like you would in a bookstore or library. Print out parts of it or all of it for free if you want. Well, it is free if you don't count the cost of paper (about 1500 sheets or twice that for simplex printing), cost of a binder (or maybe two binders) and the time you would have to spend punching the holes.
If you are mainly or only interested in particular sections of the standard then printing only those sections may be a reasonable thing to do.
On the other hand the price is *very* reasonable for an 8½" × 11" hardbound book with 1,462 pages. If it's the sort of book you know you want for browsing and for reference then it is likely you will want it in this nicely bound copy.
Like the previously published versions of the Unicode standard, this book is a beautiful book that is useful to those who don't need or want to get into the technical details of character properties and rules for bi-directional display and other necessary rules for displaying the characters. But for the actual use of many characters you will have to consult other lists outside the Unicode book or files, e.g. dictionaries and grammars of various languages or explanations of symbols used in various fields of mathematics.
Language and writing systems are messy and inconsistant and handling them systematically and coherently cannot be made easy. Accordingly the rules and explanations in this standard are by necessity often long and involved and couched in technical language. It can't be avoided that, for example, one must sometimes distinguish carefully between _characters_, _glyphs_, _graphemes_, _grapheme clusters_, _ligatures_ and _digraphs_ and whether one character is a _canonical equivalent_ of another character or sequence of characters or a _compatibility equivalent_ of another character or sequence of characters or just similar to another character or sequence of characters.
The Unicode character set is still a work in progress. Version 4.0 may not even approach the half-way mark in encoding every character that has been used in normal text records by human beings for which a meaning is known. No-one has ever tried to produce a list of characters on this scale before. No-one yet knows how many distinct characters there are.
But 4.0 covers 96,382 characters from *almost* every script currently used for modern languages and from some ancient scripts as well including Ugaritic cuneiform, Cretan Linear B and the ancient Cypriot syllabary. (Sumerian/Akkadian cuneiform is being worked on and Egyptian hieroglyphics will eventually follow.)
Included are a plethora of technical symbol characters including mathematical characters, chess pieces, die faces, characters needed for modern western music notation, characters needed for Byzantine music notation, ornamental dingbats and so much more. All of it is now at the fingertips of every computer user -- that is if fonts that contain the characters are installed.
Finding fonts that display some of these characters is still a problem. :-(
But it would be a worse problem if these characters weren't assigned to a common character set. The past practice of numerous special fonts for various symbols and scripts which disagreed with one another on how the characters were encoded produced a horrible mess.
Large as it is, with 40% more pages than version 3.0, the book doesn't contain the whole standard. Increasingly as the standard has expanded tabular material has been dropped from the printed volumes and replaced with references to data files available on the website or on the CD that comes with the book.
The end of section 3.2 specifies six files found as Annexes on the website and on the CD which "are essential parts of version 4.0" including an explanation of the bidirectional algorithm which appeared in the printed text for earlier releases. And there are many mentions in the printed standard of other files available on the CD or website. A binder containing printouts of this material is necessary if you want a truly complete hardcopy of the entire 4.0 standard.
Unfortunately the 4.0 HTML files are carelessly laid down on the CD with external links pointing to files on the Unicode website and not to the corresponding files on the CD. Graphics are sometimes missing though the only file I think this matters with is StandardizedVariants.html which has a number of variant character images. (The data in this short file should have been in the book).
If you work online you probably won't notice anything wrong but you also are likely not to notice that after clicking on a link you are viewing a file from the Unicode website instead of a file on the CD. That may matter in the future if you need to reference a 4.0 file and don't observe that the file you are actually looking at is from the website and is a "latest version" file that has been updated beyond 4.0. If you are working offline you can avoid this, but it is annoying to have to manually search for the file by name because the link fails.
Also, although the Readme.txt file on the CD mentions "mapping tables" and files with "the extension .UNI", these useful conversion tables which were included on the CD's with previous releases are missing on the 4.0 CD. But they are available on the website.
This is a minor caveat. I suspect most people will use the website in any case rather than the CD.
An indispensable resource
This book is one that every programmer should have access to. Packed with all of information concerning the latest standards, with explanations, this is the reference that I use whenever I need data regarding Unicode mappings. I recommend it to all of my students and have asked all libraries where I have influence to add it to their collection.
There is also a CD included with the book. It contains a database of the current and all past versions of the Unicode mappings, a series of Unicode technical reports and an installable version of the Unibook Character Browser, a small utility for viewing character charts and properties. Invaluable if you prefer electronic versions of the data.
All the Languages of Man
Anyone dealing with XML or java soon runs into Unicode because this is the standard for representing characters in electronic form in those computer languages. Java, for instance, was designed from its inception to use Unicode. Earlier computer languages like C and C++ can have routines added to handle these, while C# uses XML and hence Unicode.
But chances are, when you deal with Unicode, you only deal with a subset. Often only a small subset at that, unless you are using Chinese/Japanese. Typically you work with ascii and the codes for your spoken language if that is not a Western European language. Very few of us deal with much more than this.
Which illustrates the appeal of the book. The Big Picture. ALL of Unicode. The breadth is stunning. It shows the written form of every major spoken language and many minor ones. Has the pictograms for Chinese [of course]. But also the symbols for Khmer, Canadian Aboriginal, Tamil, Syraic, et cetera, et cetera. Thumbing through this, you may encounter languages that you did not even know existed. It is one thing to say that we live in a multilingual world. But it is another to actually see it expressed comprehensively at the most basic level.
There are two audiences for this book. The first is any computer person who has to deal with issues of internationalisation.
But another audience is every Department of Languages or Cultural Anthropology in a university. If this describes your background, then you should know that you do not need facility in computing to appreciate the significance of this book. You can use it as a standard reference, akin to the Oxford English Dictionary vis-a-vis the English language. Look, ignore the computer stuff in the text. Yes, you can do this. The book groups related languages into common chapters. The explanatory text is lucid and the graphics for the languages lets you easily cross compare. Of course, at a higher level of meaning like sentences, you will need specialised texts in those languages. But to understand a language, you need to start at its letters or pictograms.
Think of this book as an index into all the languages of man.
