MissingCodeRune is the rune returned when there is no matching glyph. It was previously '?'.
const ( // MissingCodeRune replaces runes that can't be decoded. . MissingCodeRune = '\ufffd' // � // MissingCodeString replaces strings that can't be decoded. MissingCodeString = string(MissingCodeRune) )
func ExpandLigatures(runes []rune) string
ExpandLigatures returns `runes` as a string with ligatures expanded
func FromFontDifferences(diffList *core.PdfObjectArray) (map[CharCode]GlyphName, error)
FromFontDifferences converts `diffList` (a /Differences array from an /Encoding object) to a map representing character code to glyph mappings.
func GlyphToRune(glyph GlyphName) (rune, bool)
GlyphToRune returns the rune corresponding to glyph `glyph` if there is one. TODO: Can we return a string here? e.g. When we are extracting text, we want to get "ffi"
rather than 'ffi'. We only need a glyph ➞ rune map when we need to convert back to glyphs. We are currently applying RuneToString to the output of functions that call GlyphToRune. While this gives the same result, it makes the calling code complex and fragile.
TODO: Can we combine all the tables glyphAliases, glyphlistGlyphToRuneMap,
texGlyphlistGlyphToStringMap, additionalGlyphlistGlyphToRuneMap and ".notdef"?
func RegisterSimpleEncoding(name string, fnc func() SimpleEncoder)
RegisterSimpleEncoding registers a SimpleEncoder constructer by PDF encoding name.
func RuneToString(r rune) string
RuneToString converts rune `r` to a string. It unpacks `ligatures`.
CMapEncoder encodes/decodes strings based on CMap mappings.
type CMapEncoder struct {
// contains filtered or unexported fields
}
func NewCMapEncoder(baseName string, codeToCID, cidToUnicode *cmap.CMap) CMapEncoder
NewCMapEncoder returns a new CMapEncoder based on the predefined encoding `baseName`. If `codeToCID` is nil, Identity encoding is assumed. `cidToUnicode` must not be nil.
func (enc CMapEncoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise.
func (enc CMapEncoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (enc CMapEncoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (enc CMapEncoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise.
func (enc CMapEncoder) String() string
String returns a string that describes `enc`.
func (enc CMapEncoder) ToPdfObject() core.PdfObject
ToPdfObject returns a PDF Object that represents the encoding.
CharCode is a character code used in the specific encoding.
type CharCode uint16
GID is a glyph index.
type GID uint16
GlyphName is a name of a glyph.
type GlyphName string
func RuneToGlyph(r rune) (GlyphName, bool)
RuneToGlyph is the reverse of the table lookups in GlyphToRune.
IdentityEncoder represents an 2-byte identity encoding. NOTE: In many cases this is just used to encode/decode to glyph index and does not have a unicode
meaning, except via the ToUnicode maps.
TODO: The use of runes as indicators for glyph indices and not-utf8 runes is not good and confusing.
Might be better to combine the Identity encoder with a ToUnicode map and keep track of the actual runes and character codes, CMaps together.
type IdentityEncoder struct {
// contains filtered or unexported fields
}
func NewIdentityTextEncoder(baseName string) *IdentityEncoder
NewIdentityTextEncoder returns a new IdentityEncoder based on predefined encoding `baseName` and difference map `differences`.
func (enc *IdentityEncoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise. TODO: Here the `r` is not necessarily an actual rune but a glyph index (unless both).
func (enc *IdentityEncoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (enc *IdentityEncoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (enc *IdentityEncoder) GlyphToRune(glyph GlyphName) (rune, bool)
GlyphToRune returns the rune corresponding to glyph name `glyph`. The bool return flag is true if there was a match, and false otherwise.
func (enc *IdentityEncoder) RegisteredRunes() []rune
RegisteredRunes returns the slice of runes that have been registered as used by the encoder.
func (enc *IdentityEncoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise. TODO: Here the `r` is an actual rune.
func (enc *IdentityEncoder) RuneToGlyph(r rune) (GlyphName, bool)
RuneToGlyph returns the glyph name for rune `r`. The bool return flag is true if there was a match, and false otherwise.
func (enc *IdentityEncoder) String() string
String returns a string that describes `enc`.
func (enc *IdentityEncoder) ToPdfObject() core.PdfObject
ToPdfObject returns a nil as it is not truly a PDF object and should not be attempted to store in file.
SimpleEncoder represents a 1 byte encoding.
type SimpleEncoder interface { TextEncoder BaseName() string Charcodes() []CharCode }
func ApplyDifferences(base SimpleEncoder, differences map[CharCode]GlyphName) SimpleEncoder
ApplyDifferences modifies or wraps the base encoding and overlays differences over it.
func NewCustomSimpleTextEncoder(encoding, differences map[CharCode]GlyphName) (SimpleEncoder, error)
NewCustomSimpleTextEncoder returns a simpleEncoder based on map `encoding` and difference map `differences`.
func NewMacExpertEncoder() SimpleEncoder
NewMacExpertEncoder returns a SimpleEncoder that implements MacExpertEncoding.
func NewMacRomanEncoder() SimpleEncoder
NewMacRomanEncoder returns a SimpleEncoder that implements MacRomanEncoding.
func NewPdfDocEncoder() SimpleEncoder
NewPdfDocEncoder returns a SimpleEncoder that implements PdfDocEncoding.
func NewSimpleTextEncoder(baseName string, differences map[CharCode]GlyphName) (SimpleEncoder, error)
NewSimpleTextEncoder returns a simpleEncoder based on predefined encoding `baseName` and difference map `differences`.
func NewStandardEncoder() SimpleEncoder
NewStandardEncoder returns a SimpleEncoder that implements StandardEncoding.
func NewSymbolEncoder() SimpleEncoder
NewSymbolEncoder returns a SimpleEncoder that implements SymbolEncoding.
func NewWinAnsiEncoder() SimpleEncoder
NewWinAnsiEncoder returns a simpleEncoder that implements WinAnsiEncoding.
func NewZapfDingbatsEncoder() SimpleEncoder
NewZapfDingbatsEncoder returns a SimpleEncoder that implements ZapfDingbatsEncoding.
TextEncoder defines the common methods that a text encoder implementation must have in UniDoc.
type TextEncoder interface { // String returns a string that describes the TextEncoder instance. String() string // Encode converts the Go unicode string to a PDF encoded string. Encode(str string) []byte // Decode converts PDF encoded string to a Go unicode string. Decode(raw []byte) string // RuneToCharcode returns the PDF character code corresponding to rune `r`. // The bool return flag is true if there was a match, and false otherwise. // This is usually implemented as RuneToGlyph->GlyphToCharcode RuneToCharcode(r rune) (CharCode, bool) // CharcodeToRune returns the rune corresponding to character code `code`. // The bool return flag is true if there was a match, and false otherwise. // This is usually implemented as CharcodeToGlyph->GlyphToRune CharcodeToRune(code CharCode) (rune, bool) // ToPdfObject returns a PDF Object that represents the encoding. ToPdfObject() core.PdfObject }
TrueTypeFontEncoder handles text encoding for composite TrueType fonts. It performs mapping between character ids and glyph ids. It has a preloaded rune (unicode code point) to glyph index map that has been loaded from a font. Corresponds to Identity-H CMap and Identity encoding.
type TrueTypeFontEncoder struct {
// contains filtered or unexported fields
}
func NewTrueTypeFontEncoder(runeToGIDMap map[rune]GID) *TrueTypeFontEncoder
NewTrueTypeFontEncoder creates a new text encoder for TTF fonts with a runeToGlyphIndexMap that has been preloaded from the font file. The new instance is preloaded with a CMapIdentityH (Identity-H) CMap which maps 2-byte charcodes to CIDs (glyph index).
func (enc *TrueTypeFontEncoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise.
func (enc *TrueTypeFontEncoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (enc *TrueTypeFontEncoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (enc *TrueTypeFontEncoder) GlyphToCharcode(glyph GlyphName) (CharCode, bool)
GlyphToCharcode returns character code matching the glyph name `glyph`. The bool return flag is true if there was a match, and false otherwise.
func (enc *TrueTypeFontEncoder) RegisteredRunes() []rune
RegisteredRunes returns the slice of runes that have been registered as used by the encoder.
func (enc *TrueTypeFontEncoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise.
func (enc *TrueTypeFontEncoder) String() string
String returns a string that describes `enc`.
func (enc *TrueTypeFontEncoder) SubsetRegistered()
SubsetRegistered subsets `enc` to only registered runes (that have been registered via encoding). NOTE: Make sure to call this soon before writing (once all needed runes have been registered).
func (enc *TrueTypeFontEncoder) ToPdfObject() core.PdfObject
ToPdfObject returns a nil as it is not truly a PDF object and should not be attempted to store in file.
UTF16Encoder represents UTF-16 encoding.
type UTF16Encoder struct {
// contains filtered or unexported fields
}
func NewUTF16TextEncoder(baseName string) UTF16Encoder
NewUTF16TextEncoder returns a new UTF16Encoder based on the predefined encoding `baseName`.
func (enc UTF16Encoder) CharcodeToRune(code CharCode) (rune, bool)
CharcodeToRune converts PDF character code `code` to a rune. The bool return flag is true if there was a match, and false otherwise.
func (enc UTF16Encoder) Decode(raw []byte) string
Decode converts PDF encoded string to a Go unicode string.
func (enc UTF16Encoder) Encode(str string) []byte
Encode converts the Go unicode string to a PDF encoded string.
func (enc UTF16Encoder) RuneToCharcode(r rune) (CharCode, bool)
RuneToCharcode converts rune `r` to a PDF character code. The bool return flag is true if there was a match, and false otherwise.
func (enc UTF16Encoder) String() string
String returns a string that describes `enc`.
func (enc UTF16Encoder) ToPdfObject() core.PdfObject
ToPdfObject returns a PDF Object that represents the encoding.
Name | Synopsis |
---|---|
.. |