Introduction to Uniscribe
来源:互联网 发布:算法概论中文版 pdf 编辑:程序博客网 时间:2024/06/14 12:45
Introduction to Uniscribe
Design & Implementation of a Win32 Text Editor
Uniscribe is a low-level Win32 API that provides a high degree of control over the processing and display of Unicode text. The API is designed to provide a generic interface to all forms of Unicode text (complex or otherwise), and transparently handles properties such as bi-directional text and combining characters sequences.
Uniscribe is a single DLL called USP10.DLL
, which contains all of the Uniscribe APIs. This DLL is present on Windows 2000 and above, or any computer with Internet Explorer 5.0 (or greater) installed. Two Platform SDK files (USP10.H
and USP10.LIB
) are provided by Microsoft to allow an application to make use of this complex-script support. An important point about Uniscribe is that it doesn't just handle complex-scripts - it can be used to process and displayall Unicode text - so can be used as a direct replacement for existing text-output routines such asDrawText
and TextOut
.
The Uniscribe API is divided into two categories - the low-level API itself, and a wrapper library calledScriptString which hides much of the complexities of dealing with Uniscribe directly. The purpose of this tutorial is to give a brief introduction to the world of Uniscribe before we start delving in properly.
Uniscribe in Windows
When I first started Neatpad I was unfamiliar as to exactly what Uniscribe entailed, and it was only after researching Unicode that I fully appreciated the issues surrounding the display of Unicode text. Although Uniscribe occupies it's own section within the MSDN documentation (here), other than the occasional reference it is very easy to miss unless you already know of it's existence.
MSDN states that since Windows 2000, the ExtTextOut
function (and others likeTextOut
, DrawText
etc) have been extended to support complex scripts. Although this is true, it gives the impression that an application can callExtTextOutW
(the Unicode version) at any time with a buffer of UTF-16 text and it will always display correctly.
Unless Windows has been configured to do so, functions such as ExtTextOut
do not automatically support complex scripts. The image above shows the "Regional and Language Options" dialog. The two settings which have been highlighted are not normally enabled by a default installation of American/English Windows.
Enabling complex-script support installs a number of extra libraries, after whichExtTextOut
will use the Uniscribe when necessary to display complex scripts.
BOOL
ExtTextOut (
HDC
hdc,
// handle to DC
int
X,
// x-coordinate of reference point
int
Y,
// y-coordinate of reference point
UINT
fuOptions,
// text-output options (ETO_GLYPH_INDEX etc)
RECT * lprc,
// optional dimensions
LPCTSTR
lpString,
// string
UINT
cbCount,
// number of characters in string
INT
* lpDx
// array of spacing values
);
ExtTextOut
is most commonly used to display a string of text. However it can do much more than this. When theETO_GLYPH_INDEX
and ETO_PDY
options are specified, ExtTextOut
can be used to display a buffer of glyphs instead of characters. This feature of ExtTextOut
is used when displaying a string containing complex-scripts, as the diagram below illustrates.
Text drawing in Windows 2000 and above
For any complex string containing complex scripts, ExtTextOut makes use of Uniscribe to display it. Uniscribe breaks the string down into groups of glyphs and then re-callsExtTextOut
, this time with the ETO_GLYPH_INDEX
option, and a buffer of glyph-indices instead of the original character values. For regular Unicode text which doesn't require any special processing,ExtTextOut
behaves exactly the same as it did under previous Windows versions.
You may be wondering why Uniscribe is necessary if routines such as DrawText
andTextOut
can for the most part render complex scripts quite sucessfully. For applications which just output single strings of text, Uniscribe is not necessary.
It is only when a string must be broken up (for the purposes of styling/formatting) that Uniscribe is required. It is just not possible to split a Unicode string into sections (as we have been with Neatpad up 'til now). Doing so breaks all kinds of things such as contextual shaping behaviours and bidirectional support. A modern text-editor simply must support Unicode and all the various scripts that come along with that - we have no other choice than to move to Uniscribe.
The ScriptString API
The ScriptString
API is designed for applications which want to display text in a single font and colour. Notepad (and the standard Windows EDIT control) is a prime example of theScriptString
API. One of the nice features of this API is that it allows you to display a string of text, with a portion of that string optionally displayed as 'selected'. This is actually a very nice touch as it saves a tremendous amount of effort.
The ScriptStringAnalyze
function is the starting point with Uniscribe. It is a pretty intimidating function to look at. However its purpose is used to perform shaping and glyph-generation for any string of Unicode text, and returns aSCRIPT_STRING_ANALYSIS
structure when complete.
HRESULT
WINAPI ScriptStringAnalyse (
HDC
hdc,
void
* pString,
int
cString,
int
cGlyphs,
int
iCharset,
DWORD
dwFlags,
int
iReqWidth,
SCRIPT_CONTROL * psControl,
SCRIPT_STATE * psState,
int
* piDx,
SCRIPT_TABDEF * pTabdef,
BYTE
* pbInClass,
SCRIPT_STRING_ANALYSIS * pssa
);
SCRIPT_STRING_ANALYSIS
is an opaque structure - there is no documention which details what it contains. This is not important though as this structure is simply passed to the rest of the ScriptString API without requiring any further knowledge.
HRESULT
WINAPI ScriptStringOut (
SCRIPT_STRING_ANALYSIS ssa,
int
iX,
int
iY,
UINT
uOptions,
RECT * prc,
int
iMinSel,
int
iMaxSel,
BOOL
fDisabled
);
ScriptStringOut
is used to display a string of text that was previously analyzed. Note that a text-string is not specified with this call - only theSCRIPT_STRING_ANALYSIS
structure is passed which contains all the necessary information to display the original string.
HRESULT
WINAPI ScriptStringXtoCP (
SCRIPT_STRING_ANALYSIS ssa,
int
iX,
int
* piCh,
int
* piTrailing
);
ScriptStringXtoCP
is an interesting function. It provides a mechanism for caret and mouse positioning within a string of Unicode text.
HRESULT
WINAPI ScriptStringCPtoX (
SCRIPT_STRING_ANALYSIS ssa,
int
icp,
BOOL
fTrailing,
int
* pX
);
ScriptStringCPtoX
is the counterpart to ScriptXtoCP
. It performs the opposite task - converting a string-position to a display-coordinate.
HRESULT
WINAPI ScriptStringFree(
SCRIPT_STRING_ANALYSIS * pssa
);
When an application has finished displaying the string the ScriptStringFree
function can be used to clean up. There are more ScriptString functions than what I have listed here, but with just these five an application can implement the front-end to a fully-functional text-editor with minimal effort.
The image above shows a simple application I wrote which demonstrates the ScriptString API. The source-code and demo executable can be downloaded at the top of this article.
An oddity of ScriptString is this: ScriptStringOut fails if the device-context used to render is not the same as the one used when analyzing the string with ScriptAnalyze!
Introducing UspLib
The main problem with the ScriptString API is its inability to display text in more than one font and colour. This makes it particularly unsuitable for our purposes with Neatpad. Our only option is to make use of the low-level Uniscribe functions directly.
USPLib is a library I have written to provide a far richer capability than ScriptString can offer. This new library provides a wrapper around the low-level Uniscribe API that we will be discussing over the next couple of tutorials. UspLib is very similar in approach to the ScriptString Uniscribe wrapper, but goes alot further in terms of text-colouring and formatting.
USPDATA * USP_Allocate();
The first API is USP_Allocate
. This function returns a pointer to aUSPDATA
object which must be used for subsequent UspLib operations.
BOOL
USP_Analyze (
USPDATA * uspData,
HDC
hdc,
WCHAR
* wstr,
int
wlen,
ATTR * attrRunList,
UINT
flags,
USPFONT * uspFont
);
USP_Analyze
is similar to ScriptStringAnalyze
. The difference is, a string of text can bere-analyzed using an existing USPDATA
object.
void
USP_ApplyAttributes (
USPDATA * uspData,
ATTR * attrRunList
);
Once a string has been analyzed (i.e. itemized and shaped etc.), colour-attributes can be reapplied at any time using theUSP_ApplyAttributes
. The font-information stored in the ATTR run-list is ignored.
void
USP_ApplySelection (
USPDATA * uspData,
int
selStart,
int
selEnd
);
USP_ApplySelection
performs a similar task to USP_ApplyAttributes
. However this time only the selection-flags are modified in theUSPDATA
object.
int
USP_TextOut (
USPDATA * uspData,
HDC
hdc,
int
xpos,
int
ypos,
RECT * rect);
USP_TextOut
is the counterpart to ScriptStringOut
. It takes as input theUSPDATA
object which was previously analyzed, and draws it to the specified location. Any fonts, colours and selection-highlights are applied to the text as it is drawn.
void
USP_Free(USPDATA * uspData);
USP_Free
should be called then the USPDATA
object is no longer needed. Over the course of the next two or three tutorials I will be detailing how I have implemented UspLib, and will provide details and examples of using Uniscribe directly.
I have designed UspLib in isolation from Neatpad. My intention is that it is a completely stand-along library, which can be used by any application to add complex-script support. It should certainly be possible to import UspLib into your projects and use it straight away, because it contains no dependencies other than the Uniscribe DLL.
Further Reading
There is very little information available about Uniscribe other than what is available in MSDN.
Uniscribe Platform SDK Reference
Supporting Multilanguage Text Layout and Complex Scripts with Windows NT 5.0
Globalization Step-by-Step - Complex Scripts Awareness
Windows Glyph Processing
There is also the CSSamp example program from Platform SDK, in the Samples sub-directory:\PlatformSDK\Samples\winui\globaldev\CSSamp
Alternatives to Uniscribe
Not every editor uses Uniscribe. If open-source is your thing then there are currently two very impressive efforts available which offer a very strong alternative to Uniscribe. There is also an equivalent version of Uniscribe available for Apple's OSX called ATSUI.
International Components for Unicode (ICU) is IBM's open-source Unicode support library. It contains alot of functionality, from character-conversions, analysis, searching and layout.
Pango is an open-source library for laying out and rendering Unicode text. It appears to sit on top of the GTK display library and can specify either Cairo or Win32 (Uniscribe) rendering back-ends. It offers a more complete solution than Uniscribe and appears to be very well designed and implemented. However Pango is UTF-8 based so this may be a consideration if the rest of your application is UTF-16.
Apple Type Services For Unicode Imaging (ATSUI) is Apple's own version of Uniscribe, although it appears to be higher-level than Microsoft's effort. A brief look at the documentation for ATSUI indicated a much easier-to-use design, and substantially better documentation than Microsoft had managed for Uniscribe.
Coming up in Part 12
This was just a short introduction to Uniscribe - hopefully you are a little more aware of what Uniscribe is capable of, and have downloaded and tested the ScriptString sample program.
Part 12 will focus on the first two Uniscribe functions: ScriptItemize
andScriptLayout
. There is alot of detail to cover with just these two APIs and it won't be until Part 13 that we actually see any text being drawn this way with Neatpad.
Lastly, I've not had much feedback in the last few months about Neatpad - did you read this tutorial and find it useful?
- Introduction to Uniscribe
- Uniscribe
- Uniscribe Mysteries
- Introduction to CGI Variables
- Introduction to Rave Reports
- Introduction to RSS
- Introduction to NMock
- An Introduction to Struts
- Links:Introduction To TDD
- Introduction to Smartphone
- Introduction to C# interface
- 第一章: Introduction to Objects
- Introduction to IoC
- An introduction to LaTeX2e
- About Introduction to Algorithms
- Introduction to MySQL Cluster
- Introduction To Alpha Blending
- An Introduction To Ajax
- Introduction to Unicode
- Unicode Text Processing
- VS2012 error C2664: “std::make_pair”:无法将左值绑定到右值引用
- Transparent Text
- 代码手写UI,xib和StoryBoard间的博弈,以及Interface Builder的一些小技巧
- Introduction to Uniscribe
- 【编程感悟】——算法才是硬道理
- 无法获得锁 /var/lib/dpkg/lock - open (11: 资源临时不可用)
- 我的博客--群星闪烁地球旋转动画特效
- Uniscribe Mysteries
- android小知识点(一)
- week plan: 2014.11.22-1014.11.29
- More Uniscribe Mysteries
- Drawing styled text with Uniscribe