Tesseract的所有参数

来源：互联网发布：手机sd卡数据恢复编辑：程序博客网时间：2024/04/30 00:52

（命令行下运行tesseract --print-parameters 之后打印出来的所有参数）

Tesseract parameters:

（参数名，默认值，简短描述）

editor_image_xpos 590 Editorimage X Pos

editor_image_ypos 10 Editorimage Y Pos

editor_image_menuheight 50 Addto image height for menu bar

editor_image_word_bb_color 7 Wordbounding box colour

editor_image_blob_bb_color 4 Blobbounding box colour

editor_image_text_color 2 Correcttext colour

editor_dbwin_xpos 50 Editordebug window X Pos

editor_dbwin_ypos 500 Editordebug window Y Pos

editor_dbwin_height 24 Editordebug window height

editor_dbwin_width 80 Editordebug window width

editor_word_xpos 60 Wordwindow X Pos

editor_word_ypos 510 Wordwindow Y Pos

editor_word_height 240 Wordwindow height

editor_word_width 655 Wordwindow width

classify_num_cp_levels 3 Numberof Class Pruner Levels

textord_debug_tabfind 0 Debugtab finding

textord_debug_bugs 0 Turnon output related to bugs in tab finding

textord_testregion_left -1 Leftedge of debug reporting rectangle

textord_testregion_top -1 Topedge of debug reporting rectangle

textord_testregion_right 2147483647 Rightedge of debug rectangle

textord_testregion_bottom 2147483647 Bottomedge of debug rectangle

textord_tabfind_show_partitions 0 Showpartition bounds, waiting if >1

devanagari_split_debuglevel 0 Debuglevel for split shiro-rekha process.

edges_max_children_per_outline 10 Maxnumber of children inside a character outline

edges_max_children_layers 5 Maxlayers of nested children inside a character outline

edges_children_per_grandchild 10 Importanceratio for chucking outlines

edges_children_count_limit 45 Maxholes allowed in blob

edges_min_nonhole 12 Minpixels for potential char in box

edges_patharea_ratio 40 Maxlensq/area for acceptable child outline

textord_fp_chop_error 2 Maxallowed bending of chop cells

textord_tabfind_show_images 0 Showimage blobs

textord_skewsmooth_offset 4 Forsmooth factor

textord_skewsmooth_offset2 1 Forsmooth factor

textord_test_x -2147483647 coordof test pt

textord_test_y -2147483647 coordof test pt

textord_min_blobs_in_row 4 Minblobs before gradient counted

textord_spline_minblobs 8 Minblobs in each spline segment

textord_spline_medianwin 6 Sizeof window for spline segmentation

textord_max_blob_overlaps 4 Maxnumber of blobs a big blob can overlap

textord_min_xheight 10 Mincredible pixel xheight

textord_lms_line_trials 12 Numberof linew fits to do

oldbl_holed_losscount 10 Maxlost before fallback line used

pitsync_linear_version 6 Usenew fast algorithm

pitsync_fake_depth 1 Maxadvance fake generation

textord_tabfind_show_strokewidths 0 Showstroke widths

textord_dotmatrix_gap 3 Maxpixel gap for broken pixed pitch

textord_debug_block 0 Blockto do debug on

textord_pitch_range 2 Maxrange test on pitch

textord_words_veto_power 5 Rowsrequired to outvote a veto

equationdetect_save_bi_image 0 Saveinput bi image

equationdetect_save_spt_image 0 Savespecial character image

equationdetect_save_seed_image 0 Savethe seed image

equationdetect_save_merged_image 0 Savethe merged image

poly_debug 0 Debugold poly

poly_wide_objects_better 1 Moreaccurate approx on wide things

wordrec_display_splits 0 Displaysplits

textord_debug_printable 0 Makedebug windows printable

textord_space_size_is_variable 0 Iftrue, word delimiter spaces are assumed to have variable width, even thoughcharacters have fixed pitch.

textord_tabfind_show_initial_partitions 0 Showpartition bounds

textord_tabfind_show_reject_blobs 0 Showblobs rejected as noise

textord_tabfind_show_columns 0 Showcolumn bounds

textord_tabfind_show_blocks 0 Showfinal block bounds

textord_tabfind_find_tables 1 runtable detection

textord_tabfind_show_color_fit 0 Showstroke widths

devanagari_split_debugimage 0 Whetherto create a debug image for split shiro-rekha process.

textord_show_fixed_cuts 0 Drawfixed pitch cell boundaries

edges_use_new_outline_complexity 0 Usethe new outline complexity module

edges_debug 0 turnon debugging for this module

edges_children_fix 0 Removeboxy parents of char-like children

gapmap_debug 0 Saywhich blocks have tables

gapmap_use_ends 0 Uselarge space at start and end of rows

gapmap_no_isolated_quanta 0 Ensuregaps not less than 2quanta wide

textord_heavy_nr 0 Vigorouslyremove noise

textord_show_initial_rows 0 Displayrow accumulation

textord_show_parallel_rows 0 Displaypage correlated rows

textord_show_expanded_rows 0 Displayrows after expanding

textord_show_final_rows 0 Displayrows after final fitting

textord_show_final_blobs 0 Displayblob bounds after pre-ass

textord_test_landscape 0 Testsrefer to land/port

textord_parallel_baselines 1 Forceparallel baselines

textord_straight_baselines 0 Forcestraight baselines

textord_old_baselines 1 Useold baseline algorithm

textord_old_xheight 0 Useold xheight algorithm

textord_fix_xheight_bug 1 Usespline baseline

textord_fix_makerow_bug 1 Preventmultiple baselines

textord_debug_xheights 0 Testxheight algorithms

textord_biased_skewcalc 1 Biasskew estimates with line length

textord_interpolating_skew 1 Interpolateacross gaps

textord_new_initial_xheight 1 Usetest xheight mechanism

textord_debug_blob 0 Printtest blob information

textord_really_old_xheight 0 Useoriginal wiseowl xheight

textord_oldbl_debug 0 Debugold baseline generation

textord_debug_baselines 0 Debugbaseline generation

textord_oldbl_paradef 1 Usepara default mechanism

textord_oldbl_split_splines 1 Splitstepped splines

textord_oldbl_merge_parts 1 Mergesuspect partitions

oldbl_corrfix 1 Improvecorrelation of heights

oldbl_xhfix 0 Fixbug in modes threshold for xheights

textord_ocropus_mode 0 Makebaselines for ocropus

textord_tabfind_only_strokewidths 0 Onlyrun stroke widths

textord_tabfind_show_initialtabs 0 Showtab candidates

textord_tabfind_show_finaltabs 0 Showtab vectors

textord_show_tables 0 Showtable regions

textord_tablefind_show_mark 0 Debugtable marking steps in detail

textord_tablefind_show_stats 0 Showpage stats used in table finding

textord_tablefind_recognize_tables 0 Enablesthe table recognizer for table layout and filtering.

textord_all_prop 0 Alldoc is proportial text

textord_debug_pitch_test 0 Debugon fixed pitch test

textord_disable_pitch_test 0 Turnoff dp fixed pitch algorithm

textord_fast_pitch_test 0 Doeven faster pitch algorithm

textord_debug_pitch_metric 0 Writefull metric stuff

textord_show_row_cuts 0 Drawrow-level cuts

textord_show_page_cuts 0 Drawpage-level cuts

textord_pitch_cheat 0 Usecorrect answer for fixed/prop

textord_blockndoc_fixed 0 Attemptwhole doc/block fixed pitch

textord_show_initial_words 0 Displayseparate words

textord_show_new_words 0 Displayseparate words

textord_show_fixed_words 0 Displayforced fixed pitch words

textord_blocksall_fixed 0 Moanabout prop blocks

textord_blocksall_prop 0 Moanabout fixed pitch blocks

textord_blocksall_testing 0 Dumpstats when moaning

textord_test_mode 0 Docurrent test

textord_pitch_scalebigwords 0 Scalescores on big words

textord_restore_underlines 1 Chopunderlines & put back

textord_fp_chopping 1 Dofixed pitch chopping

textord_force_make_prop_words 0 Forceproportional word segmentation on all rows

textord_chopper_test 0 Chopperis being tested.

wordrec_display_all_blobs 0 DisplayBlobs

wordrec_display_all_words 0 DisplayWords

wordrec_blob_pause 0 Blobpause

stream_filelist 0 Streama filelist from stdin

editor_image_win_name EditorImage Editorimage window name

editor_dbwin_name EditorDBWin Editordebug window name

editor_word_name BlnWords BLnormalized word window

editor_debug_config_file Config file to apply to single words

debug_file File to send tprintf output to

classify_font_name UnknownFont Defaultfont name to be used in training

classify_training_file MicroFeatures Trainingfile

fx_debugfile FXDebug Nameof debugfile

classify_cp_angle_pad_loose 45 ClassPruner Angle Pad Loose

classify_cp_angle_pad_medium 20 ClassPruner Angle Pad Medium

classify_cp_angle_pad_tight 10 CLassPruner Angle Pad Tight

classify_cp_end_pad_loose 0.5 ClassPruner End Pad Loose

classify_cp_end_pad_medium 0.5 ClassPruner End Pad Medium

classify_cp_end_pad_tight 0.5 ClassPruner End Pad Tight

classify_cp_side_pad_loose 2.5 ClassPruner Side Pad Loose

classify_cp_side_pad_medium 1.2 ClassPruner Side Pad Medium

classify_cp_side_pad_tight 0.6 ClassPruner Side Pad Tight

classify_pp_angle_pad 45 ProtoPruner Angle Pad

classify_pp_end_pad 0.5 ProtoPrune End Pad

classify_pp_side_pad 2.5 ProtoPruner Side Pad

classify_min_slope 0.414214 Slopebelow which lines are called horizontal

classify_max_slope 2.41421 Slopeabove which lines are called vertical

classify_norm_adj_midpoint 32 Normadjust midpoint ...

classify_norm_adj_curl 2 Normadjust curl ...

classify_pico_feature_length 0.05 PicoFeature Length

textord_underline_threshold 0.5 Fractionof width occupied

edges_childarea 0.5 Minarea fraction of child outline

edges_boxarea 0.875 Minarea fraction of grandchild for box

textord_fp_chop_snap 0.5 Maxdistance of chop pt from vertex

gapmap_big_gaps 1.75 xhtmultiplier

textord_spline_shift_fraction 0.02 Fractionof line spacing for quad

textord_spline_outlier_fraction 0.1 Fractionof line spacing for outlier

textord_skew_ile 0.5 Ileof gradients for page skew

textord_skew_lag 0.02 Lagfor skew on row accumulation

textord_linespace_iqrlimit 0.2 Maxiqr/median for linespace

textord_width_limit 8 Maxwidth of blobs to make rows

textord_chop_width 1.5 Maxwidth before chopping

textord_expansion_factor 1 Factorto expand rows by in expand_rows

textord_overlap_x 0.375 Fractionof linespace for good overlap

textord_minxh 0.25 fractionof linesize for min xheight

textord_min_linesize 1.25 *blob height for initial linesize

textord_excess_blobsize 1.3 Newrow made if blob makes row this big

textord_occupancy_threshold 0.4 Fractionof neighbourhood

textord_underline_width 2 Multipleof line_size for underline

textord_min_blob_height_fraction 0.75 Minblob height/top to include blob top into xheight stats

textord_xheight_mode_fraction 0.4 Minpile height to make xheight

textord_ascheight_mode_fraction 0.08 Minpile height to make ascheight

textord_descheight_mode_fraction 0.08 Minpile height to make descheight

textord_ascx_ratio_min 1.25 Mincap/xheight

textord_ascx_ratio_max 1.8 Maxcap/xheight

textord_descx_ratio_min 0.25 Mindesc/xheight

textord_descx_ratio_max 0.6 Maxdesc/xheight

textord_xheight_error_margin 0.1 Acceptedvariation

oldbl_xhfract 0.4 Fractionof est allowed in calc

oldbl_dot_error_size 1.26 Maxaspect ratio of a dot

textord_oldbl_jumplimit 0.15 Xfraction for new partition

pitsync_joined_edge 0.75 Distinside big blob for chopping

pitsync_offset_freecut_fraction 0.25 Fractionof cut for free cuts

textord_tabvector_vertical_gap_fraction 0.5 maxfraction of mean blob width allowed for vertical gaps in vertical text

textord_tabvector_vertical_box_ratio 0.5 Fractionof box matches required to declare a line vertical

textord_projection_scale 0.2 Dingrate for mid-cuts

textord_balance_factor 1 Dingrate for unbalanced char cells

textord_wordstats_smooth_factor 0.05 Smoothinggap stats

textord_width_smooth_factor 0.1 Smoothingwidth stats

textord_words_width_ile 0.4 Ileof blob widths for space est

textord_words_maxspace 4 Multipleof xheight

textord_words_default_maxspace 3.5 Maxbelievable third space

textord_words_default_minspace 0.6 Fractionof xheight

textord_words_min_minspace 0.3 Fractionof xheight

textord_words_default_nonspace 0.2 Fractionof xheight

textord_words_initial_lower 0.25 Maxinitial cluster size

textord_words_initial_upper 0.15 Mininitial cluster spacing

textord_words_minlarge 0.75 Fractionof valid gaps needed

textord_words_pitchsd_threshold 0.04 Pitchsync threshold

textord_words_def_fixed 0.016 Thresholdfor definite fixed

textord_words_def_prop 0.09 Thresholdfor definite prop

textord_pitch_rowsimilarity 0.08 Fractionof xheight for sameness

words_initial_lower 0.5 Maxinitial cluster size

words_initial_upper 0.15 Mininitial cluster spacing

words_default_prop_nonspace 0.25 Fractionof xheight

words_default_fixed_space 0.75 Fractionof xheight

words_default_fixed_limit 0.6 Allowedsize variance

textord_words_definite_spread 0.3 Non-fuzzyspacing region

textord_spacesize_ratiofp 2.8 Minratio space/nonspace

textord_spacesize_ratioprop 2 Minratio space/nonspace

textord_fpiqr_ratio 1.5 PitchIQR/Gap IQR threshold

textord_max_pitch_iqr 0.2 Xhfraction noise in pitch

textord_fp_min_width 0.5 Minwidth of decent blobs

textord_underline_offset 0.1 Fractionof x to ignore

ambigs_debug_level 0 Debuglevel for unichar ambiguities

tessedit_single_match 0 Topchoice only from CP

classify_debug_level 0 Classifydebug level

classify_norm_method 1 NormalizationMethod ...

matcher_debug_level 0 MatcherDebug Level

matcher_debug_flags 0 MatcherDebug Flags

classify_learning_debug_level 0 LearningDebug Level:

matcher_permanent_classes_min 1 Min #of permanent classes

matcher_min_examples_for_prototyping 3 ReliableConfig Threshold

matcher_sufficient_examples_for_prototyping 5 Enableadaption even if the ambiguities have not been seen

classify_adapt_proto_threshold 230 Thresholdfor good protos during adaptive 0-255

classify_adapt_feature_threshold 230 Thresholdfor good features during adaptive 0-255

classify_class_pruner_threshold 229 ClassPruner Threshold 0-255

classify_class_pruner_multiplier 15 ClassPruner Multiplier 0-255:

classify_cp_cutoff_strength 7 ClassPruner CutoffStrength:

classify_integer_matcher_multiplier 10 IntegerMatcher Multiplier 0-255:

il1_adaption_test 0 Don'tadapt to i/I at beginning of word

dawg_debug_level 0 Setto 1 for general debug info, to 2 for more details, to 3 to see all the debugmessages

hyphen_debug_level 0 Debuglevel for hyphenated words.

max_viterbi_list_size 10 Maximumsize of viterbi list.

stopper_smallword_size 2 Sizeof dict word to be treated as non-dict word

stopper_debug_level 0 Stopperdebug level

tessedit_truncate_wordchoice_log 10 Maxwords to keep in list

fragments_debug 0 Debugcharacter fragments

max_permuter_attempts 10000 Maximumnumber of different character choices to consider during permutation. Thislimit is especially useful when user patterns are specified, since overlygeneric patterns can result in dawg search exploring an overly large number ofoptions.

repair_unchopped_blobs 1 Fixblobs that aren't chopped

chop_debug 0 Chopdebug

chop_split_length 10000 SplitLength

chop_same_distance 2 Samedistance

chop_min_outline_points 6 MinNumber of Points on Outline

chop_seam_pile_size 150 Maxnumber of seams in seam_pile

chop_inside_angle -50 MinInside Angle Bend

chop_min_outline_area 2000 MinOutline Area

chop_centered_maxwidth 90 Widthof (smaller) chopped blobs above which we don't care that a chop is not nearthe center.

chop_x_y_weight 3 X/ Y length weight

segment_adjust_debug 0 Segmentationadjustment debug

wordrec_debug_level 0 Debuglevel for wordrec

wordrec_max_join_chunks 4 Maxnumber of broken pieces to associate

segsearch_debug_level 0 SegSearchdebug level

segsearch_max_pain_points 2000 Maximumnumber of pain points stored in the queue

segsearch_max_futile_classifications 20 Maximumnumber of pain point classifications per chunk thatdid not result in finding abetter word choice.

language_model_debug_level 0 Languagemodel debug level

language_model_ngram_order 8 Maximumorder of the character ngram model

language_model_viterbi_list_max_num_prunable 10 Maximumnumber of prunable (those for which PrunablePath() is true) entries in eachviterbi list recorded in BLOB_CHOICEs

language_model_viterbi_list_max_size 500 Maximumsize of viterbi lists recorded in BLOB_CHOICEs

language_model_min_compound_length 3 Minimumlength of compound words

wordrec_display_segmentations 0 DisplaySegmentations

tessedit_pageseg_mode 6 Pageseg mode: 0=osd only, 1=auto+osd, 2=auto, 3=col, 4=block, 5=line, 6=word,7=char (Values from PageSegMode enum in publictypes.h)

tessedit_ocr_engine_mode 2 WhichOCR engine(s) to run (Tesseract, LSTM, both). Defaults to loading and runningthe most accurate available.

pageseg_devanagari_split_strategy 0 Whetherto use the top-line splitting process for Devanagari documents while performingpage-segmentation.

ocr_devanagari_split_strategy 0 Whetherto use the top-line splitting process for Devanagari documents while performingocr.

bidi_debug 0 Debuglevel for BiDi

applybox_debug 1 Debuglevel

applybox_page 0 Pagenumber to apply boxes from

tessedit_bigram_debug 0 Amountof debug output for bigram correction.

debug_noise_removal 0 Debugreassignment of small outlines

noise_maxperblob 8 Maxdiacritics to apply to a blob

noise_maxperword 16 Maxdiacritics to apply to a word

debug_x_ht_level 0 Reestimatedebug

quality_min_initial_alphas_reqd 2 alphasin a good word

tessedit_tess_adaption_mode 39 Adaptationdecision algorithm for tess

tessedit_test_adaption_mode 3 Adaptationdecision algorithm for tess

multilang_debug_level 0 Printmultilang debug info.

paragraph_debug_level 0 Printparagraph debug info.

tessedit_preserve_min_wd_len 2 Onlypreserve wds longer than this

crunch_rating_max 10 Foradj length in rating per ch

crunch_pot_indicators 1 Howmany potential indicators needed

crunch_leave_lc_strings 4 Don'tcrunch words with long lower case strings

crunch_leave_uc_strings 4 Don'tcrunch words with long lower case strings

crunch_long_repetitions 3 Crunchwords with long repetitions

crunch_debug 0 Asit says

fixsp_non_noise_limit 1 Howmany non-noise blbs either side?

fixsp_done_mode 1 Whatconstitues done for spacing

debug_fix_space_level 0 Contextualfixspace debug

x_ht_acceptance_tolerance 8 Maxallowed deviation of blob top outside of font data

x_ht_min_change 8 Minchange in xht before actually trying it

superscript_debug 0 Debuglevel for sub & superscript fixer

suspect_level 99 Suspectmarker level

suspect_space_level 100 Minsuspect level for rejecting spaces

suspect_short_words 2 Don'tsuspect dict wds longer than this

tessedit_reject_mode 0 Rejectionalgorithm

tessedit_image_border 2 Rejblbs near image edge limit

min_sane_x_ht_pixels 8 Rejectany x-ht lt or eq than this

tessedit_page_number -1 -1-> All pages , else specific page to process

tessdata_manager_debug_level 0 Debuglevel for TessdataManager functions.

tessedit_parallelize 0 Runin parallel where possible

tessedit_ok_mode 5 Acceptancedecision algorithm

segment_debug 0 Debugthe whole segmentation process

language_model_fixed_length_choices_depth 3 Depthof blob choice lists to explore when fixed length dawgs are on

tosp_debug_level 0 Debugdata

tosp_enough_space_samples_for_median 3 orshould we use mean

tosp_redo_kern_limit 10 No.samplesreqd to reestimate for row

tosp_few_samples 40 No.gapsreqd with 1 large gap to treat as a table

tosp_short_row 20 No.gapsreqd with few cert spaces to use certs

tosp_sanity_method 1 Howto avoid being silly

textord_max_noise_size 7 Pixelsize of noise

textord_baseline_debug 0 Baselinedebug level

textord_noise_sizefraction 10 Fractionof size for maxima

textord_noise_translimit 16 Transitionsfor normal blob

textord_noise_sncount 1 supernorm blobs to save row

use_definite_ambigs_for_classifier 0 Usedefinite ambiguities when running character classifier

use_ambigs_for_adaption 0 Useambigs for deciding whether to adapt to a character

allow_blob_division 1 Usedivisible blobs chopping

prioritize_division 0 Prioritizeblob division over chopping

classify_enable_learning 1 Enableadaptive classifier

tess_cn_matching 0 CharacterNormalized Matching

tess_bn_matching 0 BaselineNormalized Matching

classify_enable_adaptive_matcher 1 Enableadaptive classifier

classify_use_pre_adapted_templates 0 Usepre-adapted classifier templates

classify_save_adapted_templates 0 Saveadapted templates to a file

classify_enable_adaptive_debugger 0 Enablematch debugger

classify_nonlinear_norm 0 Non-linearstroke-density normalization

disable_character_fragments 1 Donot include character fragments in the results of the classifier

classify_debug_character_fragments 0 Bringup graphical debugging windows for fragments training

matcher_debug_separate_windows 0 Usetwo different windows for debugging the matching: One for the protos and onefor the features.

classify_bln_numeric_mode 0 Assumethe input is numbers [0-9].

load_system_dawg 1 Loadsystem word dawg.

load_freq_dawg 1 Loadfrequent word dawg.

load_unambig_dawg 1 Loadunambiguous word dawg.

load_punc_dawg 1 Loaddawg with punctuation patterns.

load_number_dawg 1 Loaddawg with number patterns.

load_bigram_dawg 1 Loaddawg with special word bigrams.

use_only_first_uft8_step 0 Useonly the first UTF8 step of the given string when computing log probabilities.

stopper_no_acceptable_choices 0 MakeAcceptableChoice() always return false. Useful when there is a need to exploreall segmentations

save_raw_choices 0 Deprecated-backward compatibility only

segment_nonalphabetic_script 0 Don'tuse any alphabetic-specific tricks.Set to true in the traineddata config filefor scripts that are cursive or inherently fixed-pitch

save_doc_words 0 SaveDocument Words

merge_fragments_in_matrix 1 Mergethe fragments in the ratings matrix and delete them after merging

wordrec_no_block 0 Don'toutput block information

wordrec_enable_assoc 1 AssociatorEnable

force_word_assoc 0 forceassociator to run regardless of what enable_assoc is.This is used for CJK wherecomponent grouping is necessary.

fragments_guide_chopper 0 Useinformation from fragments to guide chopping process

chop_enable 1 Chopenable

chop_vertical_creep 0 Verticalcreep

chop_new_seam_pile 1 Usenew seam_pile

assume_fixed_pitch_char_segment 0 includefixed-pitch heuristics in char segmentation

wordrec_skip_no_truth_words 0 Onlyrun OCR for words that had truth recorded in BlamerBundle

wordrec_debug_blamer 0 Printblamer debug messages

wordrec_run_blamer 0 Tryto set the blame for errors

save_alt_choices 1 Savealternative paths found during chopping and segmentation search

language_model_ngram_on 0 Turnon/off the use of character ngram model

language_model_ngram_use_only_first_uft8_step 0 Useonly the first UTF8 step of the given string when computing log probabilities.

language_model_ngram_space_delimited_language 1 Wordsare delimited by space

language_model_use_sigmoidal_certainty 0 Usesigmoidal score for certainty

tessedit_resegment_from_boxes 0 Takesegmentation and labeling from box file

tessedit_resegment_from_line_boxes 0 Conversionof word/line box file to char box file

tessedit_train_from_boxes 0 Generatetraining data from boxed chars

tessedit_make_boxes_from_boxes 0 Generatemore boxes from boxed chars

tessedit_train_line_recognizer 0 Breakinput into lines and remap boxes if present

tessedit_dump_pageseg_images 0 Dumpintermediate images made during page segmentation

tessedit_ambigs_training 0 Performtraining for ambiguities

tessedit_adaption_debug 0 Generateand print debug information for adaption

applybox_learn_chars_and_char_frags_mode 0 Learnboth character fragments (as is done in the special low exposure mode) as wellas unfragmented characters.

applybox_learn_ngrams_mode 0 Eachbounding box is assumed to contain ngrams. Only learn the ngrams whose outlinesoverlap horizontally.

tessedit_display_outwords 0 Drawoutput words

tessedit_dump_choices 0 Dumpchar choices

tessedit_timing_debug 0 Printtiming stats

tessedit_fix_fuzzy_spaces 1 Tryto improve fuzzy spaces

tessedit_unrej_any_wd 0 Don'tbother with word plausibility

tessedit_fix_hyphens 1 Crunchdouble hyphens?

tessedit_redo_xheight 1 Check/Correctx-height

tessedit_enable_doc_dict 1 Addwords to the document dictionary

tessedit_debug_fonts 0 Outputfont info per char

tessedit_debug_block_rejection 0 Blockand Row stats

tessedit_enable_bigram_correction 1 Enablecorrection based on the word bigram dictionary.

tessedit_enable_dict_correction 0 Enablesingle word correction based on the dictionary.

enable_noise_removal 1 Removeand conditionally reassign small outlines when they confuse layout analysis,determining diacritics vs noise

debug_acceptable_wds 0 Dumpword pass/fail chk

tessedit_minimal_rej_pass1 0 Dominimal rejection on pass 1 output

tessedit_test_adaption 0 Testadaption criteria

tessedit_matcher_log 0 Logmatcher activity

test_pt 0 Testfor point

paragraph_text_based 1 Runparagraph detection on the post-text-recognition (more accurate)

lstm_use_matrix 1 Useratings matrix/beam search with lstm

docqual_excuse_outline_errs 0 Allowoutline errs in unrejection?

tessedit_good_quality_unrej 1 Reducerejection on good docs

tessedit_use_reject_spaces 1 Rejectspaces?

tessedit_preserve_blk_rej_perfect_wds 1 Onlyrej partially rejected words in block rejection

tessedit_preserve_row_rej_perfect_wds 1 Onlyrej partially rejected words in row rejection

tessedit_dont_blkrej_good_wds 0 Useword segmentation quality metric

tessedit_dont_rowrej_good_wds 0 Useword segmentation quality metric

tessedit_row_rej_good_docs 1 Applyrow rejection to good docs

tessedit_reject_bad_qual_wds 1 Rejectall bad quality wds

tessedit_debug_doc_rejection 0 Pagestats

tessedit_debug_quality_metrics 0 Outputdata to debug file

bland_unrej 0 unrejpotential with no checks

unlv_tilde_crunching 1 Markv.bad words for tilde crunch

hocr_font_info 0 Addfont info to hocr output

crunch_early_merge_tess_fails 1 Beforeword crunch?

crunch_early_convert_bad_unlv_chs 0 Takeout ~^ early?

crunch_terrible_garbage 1 Asit says

crunch_pot_garbage 1 POTENTIALcrunch garbage

crunch_leave_ok_strings 1 Don'ttouch sensible strings

crunch_accept_ok 1 Useacceptability in okstring

crunch_leave_accept_strings 0 Don'tpot crunch sensible strings

crunch_include_numerals 0 Fiddlealpha figures

tessedit_prefer_joined_punct 0 Rewardpunctation joins

tessedit_write_block_separators 0 Writeblock separators in output

tessedit_write_rep_codes 0 Writerepetition char code

tessedit_write_unlv 0 Write.unlv output file

tessedit_create_txt 0 Write.txt output file

tessedit_create_hocr 0 Write.html hOCR output file

tessedit_create_tsv 0 Write.tsv output file

tessedit_create_pdf 0 Write.pdf output file

textonly_pdf 0 CreatePDF with only one invisible text layer

suspect_constrain_1Il 0 UNLVkeep 1Il chars rejected

tessedit_minimal_rejection 0 Onlyreject tess failures

tessedit_zero_rejection 0 Don'treject ANYTHING

tessedit_word_for_word 0 Makeoutput have exactly one word per WERD

tessedit_zero_kelvin_rejection 0 Don'treject ANYTHING AT ALL

tessedit_consistent_reps 1 Forceall rep chars the same

tessedit_rejection_debug 0 Adaptiondebug

tessedit_flip_0O 1 Contextual0O O0 flips

rej_trust_doc_dawg 0 UseDOC dawg in 11l conf. detector

rej_1Il_use_dict_word 0 Usedictword test

rej_1Il_trust_permuter_type 1 Don'tdouble check

rej_use_tess_accepted 1 Individualrejection control

rej_use_tess_blanks 1 Individualrejection control

rej_use_good_perm 1 Individualrejection control

rej_use_sensible_wd 0 Extendpermuter check

rej_alphas_in_number_perm 0 Extendpermuter check

tessedit_create_boxfile 0 Outputtext with boxes

tessedit_write_images 1 Capturethe image from the IPE

interactive_display_mode 0 Runinteractively?

tessedit_override_permuter 1 Accordingto dict_word

tessedit_use_primary_params_model 0 Inmultilingual mode use params model of the primary language

textord_tabfind_show_vlines 0 Debugline finding

textord_use_cjk_fp_model 0 UseCJK fixed pitch model

poly_allow_detailed_fx 0 Allowfeature extractors to see the original outline

tessedit_init_config_only 0 Onlyinitialize with the config file. Useful if the instance is not going to be usedfor OCR but say only for layout analysis.

textord_equation_detect 0 Turnon equation detector

textord_tabfind_vertical_text 1 Enablevertical detection

textord_tabfind_force_vertical_text 0 Forceusing vertical text page mode

preserve_interword_spaces 0 Preservemultiple interword spaces

include_page_breaks 0 Includepage separator string in output text after each image/page.

textord_tabfind_vertical_horizontal_mix 1 findhorizontal lines such as headers in vertical page mode

load_fixed_length_dawgs 1 Loadfixed length dawgs (e.g. for non-space delimited languages)

permute_debug 0 Debugchar permutation process

permute_script_word 0 Turnon word script consistency permuter

segment_segcost_rating 0 incorporatesegmentation cost in word rating?

permute_fixed_length_dawg 0 Turnon fixed-length phrasebook search permuter

permute_chartype_word 0 Turnon character type (property) consistency permuter

ngram_permuter_activated 0 Activatecharacter-level n-gram-based permuter

permute_only_top 0 Runonly the top choice permuter

use_new_state_cost 0 usenew state cost heuristics for segmentation state evaluation

enable_new_segsearch 1 Enablenew segmentation search path.

textord_single_height_mode 0 Scripthas no xheight, so use a single mode

tosp_old_to_method 0 Spacestats use prechopping?

tosp_old_to_constrain_sp_kn 0 Constrainrelative values of inter and intra-word gaps for old_to_method.

tosp_only_use_prop_rows 1 Blockstats to use fixed pitch rows?

tosp_force_wordbreak_on_punct 0 Forceword breaks on punct to break long lines in non-space delimited langs

tosp_use_pre_chopping 0 Spacestats use prechopping?

tosp_old_to_bug_fix 0 Fixsuspected bug in old code

tosp_block_use_cert_spaces 1 Onlystat OBVIOUS spaces

tosp_row_use_cert_spaces 1 Onlystat OBVIOUS spaces

tosp_narrow_blobs_not_cert 1 Onlystat OBVIOUS spaces

tosp_row_use_cert_spaces1 1 Onlystat OBVIOUS spaces

tosp_recovery_isolated_row_stats 1 Userow alone when inadequate cert spaces

tosp_only_small_gaps_for_kern 0 Betterguess

tosp_all_flips_fuzzy 0 PassANY flip to context?

tosp_fuzzy_limit_all 1 Don'trestrict kn->sp fuzzy limit to tables

tosp_stats_use_xht_gaps 1 Usewithin xht gap for wd breaks

tosp_use_xht_gaps 1 Usewithin xht gap for wd breaks

tosp_only_use_xht_gaps 0 Onlyuse within xht gap for wd breaks

tosp_rule_9_test_punct 0 Don'tchng kn to space next to punct

tosp_flip_fuzz_kn_to_sp 1 Defaultflip

tosp_flip_fuzz_sp_to_kn 1 Defaultflip

tosp_improve_thresh 0 Enableimprovement heuristic

textord_no_rejects 0 Don'tremove noise blobs

textord_show_blobs 0 Displayunsorted blobs

textord_show_boxes 0 Displayunsorted blobs

textord_noise_rejwords 1 Rejectnoise-like words

textord_noise_rejrows 1 Rejectnoise-like rows

textord_noise_debug 0 Debugrow garbage detector

m_data_sub_dir tessdata/ Directoryfor data files

tessedit_module_name libtesseract400.dll Module colocated with tessdata dir

classify_learn_debug_str Class str to debug learning

user_words_file A filename of user-provided words.

user_words_suffix A suffix of user-provided wordslocated in tessdata.

user_patterns_file A filename of user-provided patterns.

user_patterns_suffix A suffix of user-provided patternslocated in tessdata.

output_ambig_words_file Output file for ambiguities found inthe dictionary

word_to_debug Word for which stopper debug informationshould be printed to stdout

word_to_debug_lengths Lengths of unichars in word_to_debug

tessedit_char_blacklist Blacklist of chars not to recognize

tessedit_char_whitelist Whitelist of chars to recognize

tessedit_char_unblacklist List of chars to overridetessedit_char_blacklist

tessedit_write_params_to_file Write all parameters to the givenfile.

applybox_exposure_pattern .exp Exposurevalue follows this pattern in the image filename. The name of the image filesare expected to be in the form [lang].[fontname].exp[num].tif

chs_leading_punct ('`" Leadingpunctuation

chs_trailing_punct1 ).,;:?! 1stTrailing punctuation

chs_trailing_punct2 )'`" 2ndTrailing punctuation

outlines_odd %| Nonstandard number of outlines

outlines_2 ij!?%":; Nonstandard number of outlines

numeric_punctuation ., Punct.chs expected WITHIN numbers

unrecognised_char | Outputchar for unidentified blobs

ok_repeated_ch_non_alphanum_wds -?*= AllowNN to unrej

conflict_set_I_l_1 Il1[] Il1conflict set

file_type .tif Filenameextension

tessedit_load_sublangs List of languages to load with this one

page_separator
Page separator (default is form feedcontrol character)

classify_char_norm_range 0.2 CharacterNormalization Range ...

classify_min_norm_scale_x 0 Minchar x-norm scale ...

classify_max_norm_scale_x 0.325 Maxchar x-norm scale ...

classify_min_norm_scale_y 0 Minchar y-norm scale ...

classify_max_norm_scale_y 0.325 Maxchar y-norm scale ...

classify_max_rating_ratio 1.5 Vetoratio between classifier ratings

classify_max_certainty_margin 5.5 Vetodifference between classifier certainties

matcher_good_threshold 0.125 GoodMatch (0-1)

matcher_reliable_adaptive_result 0 GreatMatch (0-1)

matcher_perfect_threshold 0.02 PerfectMatch (0-1)

matcher_bad_match_pad 0.15 BadMatch Pad (0-1)

matcher_rating_margin 0.1 Newtemplate margin (0-1)

matcher_avg_noise_size 12 Avg.noise blob length

matcher_clustering_max_angle_delta 0.015 Maximumangle delta for prototype clustering

classify_misfit_junk_penalty 0 Penaltyto apply when a non-alnum is vertically out of its expected textline position

rating_scale 1.5 Ratingscaling factor

certainty_scale 20 Certaintyscaling factor

tessedit_class_miss_scale 0.00390625 Scalefactor for features not used

classify_adapted_pruning_factor 2.5 Prunepoor adapted results this much worse than best result

classify_adapted_pruning_threshold -1 Thresholdat which classify_adapted_pruning_factor starts

classify_character_fragments_garbage_certainty_threshold -3 Excludefragments that do not look like whole characters from training and adaption

speckle_large_max_size 0.3 Maxlarge speckle size

speckle_rating_penalty 10 Penaltyto add to worst rating for noise

xheight_penalty_subscripts 0.125 Scorepenalty (0.1 = 10%) added if there are subscripts or superscripts in a word,but it is otherwise OK.

xheight_penalty_inconsistent 0.25 Scorepenalty (0.1 = 10%) added if an xheight is inconsistent.

segment_penalty_dict_frequent_word 1 Scoremultiplier for word matches which have good case andare frequent in the givenlanguage (lower is better).

segment_penalty_dict_case_ok 1.1 Scoremultiplier for word matches that have good case (lower is better).

segment_penalty_dict_case_bad 1.3125 Defaultscore multiplier for word matches, which may have case issues (lower isbetter).

segment_penalty_ngram_best_choice 1.24 Multiplerto for the best choice from the ngram model.

segment_penalty_dict_nonword 1.25 Scoremultiplier for glyph fragment segmentations which do not match a dictionaryword (lower is better).

segment_penalty_garbage 1.5 Scoremultiplier for poorly cased strings that are not in the dictionary andgenerally look like garbage (lower is better).

certainty_scale 20 Certaintyscaling factor

stopper_nondict_certainty_base -2.5 Certaintythreshold for non-dict words

stopper_phase2_certainty_rejection_offset 1 Rejectcertainty offset

stopper_certainty_per_char -0.5 Certaintyto add for each dict char above small word size.

stopper_allowable_character_badness 3 Maxcertaintly variation allowed in a word (in sigma)

doc_dict_pending_threshold 0 Worstcertainty for using pending dictionary

doc_dict_certainty_threshold -2.25 Worstcertainty for words that can be inserted into thedocument dictionary

wordrec_worst_state 1 Worstsegmentation state

tessedit_certainty_threshold -2.25 Goodblob limit

chop_split_dist_knob 0.5 Splitlength adjustment

chop_overlap_knob 0.9 Splitoverlap adjustment

chop_center_knob 0.15 Splitcenter adjustment

chop_sharpness_knob 0.06 Splitsharpness adjustment

chop_width_change_knob 5 Widthchange adjustment

chop_ok_split 100 OKsplit limit

chop_good_split 50 Goodsplit limit

segsearch_max_char_wh_ratio 2 Maximumcharacter width-to-height ratio

language_model_ngram_small_prob 1e-06 Toavoid overly small denominators use this as the floor of the probabilityreturned by the ngram model.

language_model_ngram_nonmatch_score -40 Averageclassifier score of a non-matching unichar.

language_model_ngram_scale_factor 0.03 Strengthof the character ngram model relative to the character classifier

language_model_ngram_rating_factor 16 Factorto bring log-probs into the same range as ratings when multiplied by outlinelength

language_model_penalty_non_freq_dict_word 0.1 Penaltyfor words not in the frequent word dictionary

language_model_penalty_non_dict_word 0.15 Penaltyfor non-dictionary words

language_model_penalty_punc 0.2 Penaltyfor inconsistent punctuation

language_model_penalty_case 0.1 Penaltyfor inconsistent case

language_model_penalty_script 0.5 Penaltyfor inconsistent script

language_model_penalty_chartype 0.3 Penaltyfor inconsistent character type

language_model_penalty_font 0 Penaltyfor inconsistent font

language_model_penalty_spacing 0.05 Penaltyfor inconsistent spacing

language_model_penalty_increment 0.01 Penaltyincrement

noise_cert_basechar -8 Hingepointfor base char certainty

noise_cert_disjoint -1 Hingepointfor disjoint certainty

noise_cert_punc -3 Thresholdfor new punc char certainty

noise_cert_factor 0.375 Scalingon certainty diff from Hingepoint

quality_rej_pc 0.08 good_quality_doclte rejection limit

quality_blob_pc 0 good_quality_docgte good blobs limit

quality_outline_pc 1 good_quality_doclte outline error limit

quality_char_pc 0.95 good_quality_docgte good char limit

test_pt_x 100000 xcoord

test_pt_y 100000 ycoord

tessedit_reject_doc_percent 65 %rejallowed before rej whole doc

tessedit_reject_block_percent 45 %rejallowed before rej whole block

tessedit_reject_row_percent 40 %rejallowed before rej whole row

tessedit_whole_wd_rej_row_percent 70 Numberof row rejects in whole word rejectswhich prevents whole row rejection

tessedit_good_doc_still_rowrej_wd 1.1 rejgood doc wd if more than this fraction rejected

quality_rowrej_pc 1.1 good_quality_docgte good char limit

crunch_terrible_rating 80 crunchrating lt this

crunch_poor_garbage_cert -9 crunchgarbage cert lt this

crunch_poor_garbage_rate 60 crunchgarbage rating lt this

crunch_pot_poor_rate 40 POTENTIALcrunch rating lt this

crunch_pot_poor_cert -8 POTENTIALcrunch cert lt this

crunch_del_rating 60 POTENTIALcrunch rating lt this

crunch_del_cert -10 POTENTIALcrunch cert lt this

crunch_del_min_ht 0.7 Delif word ht lt xht x this

crunch_del_max_ht 3 Delif word ht gt xht x this

crunch_del_min_width 3 Delif word width lt xht x this

crunch_del_high_word 1.5 Delif word gt xht x this above bl

crunch_del_low_word 0.5 Delif word gt xht x this below bl

crunch_small_outlines_size 0.6 Smallif lt xht x this

fixsp_small_outlines_size 0.28 Smallif lt xht x this

superscript_worse_certainty 2 Howmany times worse certainty does a superscript position glyph need to be for usto try classifying it as a char with a different baseline?

superscript_bettered_certainty 0.97 Whatreduction in badness do we think sufficient to choose a superscript over whatwe'd thought. For example, a value of0.6 means we want to reduce badness of certainty by at least 40%

superscript_scaledown_ratio 0.4 Asuperscript scaled down more than this is unbelievably small. For example, 0.3 means we expect the fontsize to be no smaller than 30% of the text line font size.

subscript_max_y_top 0.5 Maximumtop of a character measured as a multiple of x-height above the baseline for usto reconsider whether it's a subscript.

superscript_min_y_bottom 0.3 Minimumbottom of a character measured as a multiple of x-height above the baseline forus to reconsider whether it's a superscript.

suspect_rating_per_ch 999.9 Don'ttouch bad rating limit

suspect_accept_rating -999.9 Acceptgood rating limit

tessedit_lower_flip_hyphen 1.5 Aspectratio dot/hyphen test

tessedit_upper_flip_hyphen 1.8 Aspectratio dot/hyphen test

rej_whole_of_mostly_reject_word_fract 0.85 if>this fract

min_orientation_margin 7 Minacceptable orientation margin

textord_tabfind_vertical_text_ratio 0.5 Fractionof textlines deemed vertical to use vertical page mode

textord_tabfind_aligned_gap_fraction 0.75 Fractionof height used as a minimum gap for aligned blobs.

bestrate_pruning_factor 2 Multiplyingfactor of current best rate to prune other hypotheses

segment_reward_script 0.95 Scoremultipler for script consistency within a word. Being a 'reward' factor, itshould be <= 1. Smaller value implies bigger reward.

segment_reward_chartype 0.97 Scoremultipler for char type consistency within a word.

segment_reward_ngram_best_choice 0.99 Scoremultipler for ngram permuter's best choice (only used in the Han script path).

heuristic_segcost_rating_base 1.25 basefactor for adding segmentation cost into word rating.It's a multiplying factor,the larger the value above 1, the bigger the effect of segmentation cost.

heuristic_weight_rating 1 weightassociated with char rating in combined cost ofstate

heuristic_weight_width 1000 weightassociated with width evidence in combined cost of state

heuristic_weight_seamcut 0 weightassociated with seam cut in combined cost of state

heuristic_max_char_wh_ratio 2 maxchar width-to-height ratio allowed in segmentation

segsearch_max_fixed_pitc

***** VIDEOINPUTLIBRARY - 0.1995 - TFW07 *****

h_char_wh_ratio 2 Maximumcharacter width-to-height ratio for fixed-pitch fonts

tosp_old_sp_kn_th_factor 2 Factorfor defining space threshold in terms of space and kern sizes

tosp_threshold_bias1 0 howfar between kern and space?

tosp_threshold_bias2 0 howfar between kern and space?

tosp_narrow_fraction 0.3 Fractof xheight for narrow

tosp_narrow_aspect_ratio 0.48 narrowif w/h less than this

tosp_wide_fraction 0.52 Fractof xheight for wide

tosp_wide_aspect_ratio 0 wideif w/h less than this

tosp_fuzzy_space_factor 0.6 Fractof xheight for fuzz sp

tosp_fuzzy_space_factor1 0.5 Fractof xheight for fuzz sp

tosp_fuzzy_space_factor2 0.72 Fractof xheight for fuzz sp

tosp_gap_factor 0.83 gapratio to flip sp->kern

tosp_kern_gap_factor1 2 gapratio to flip kern->sp

tosp_kern_gap_factor2 1.3 gapratio to flip kern->sp

tosp_kern_gap_factor3 2.5 gapratio to flip kern->sp

tosp_ignore_big_gaps -1 xhtmultiplier

tosp_ignore_very_big_gaps 3.5 xhtmultiplier

tosp_rep_space 1.6 repgap multiplier for space

tosp_enough_small_gaps 0.65 Fractof kerns reqd for isolated row stats

tosp_table_kn_sp_ratio 2.25 Mindifference of kn & sp in table

tosp_table_xht_sp_ratio 0.33 Expectspaces bigger than this

tosp_table_fuzzy_kn_sp_ratio 3 Fuzzyif less than this

tosp_fuzzy_kn_fraction 0.5 Newfuzzy kn alg

tosp_fuzzy_sp_fraction 0.5 Newfuzzy sp alg

tosp_min_sane_kn_sp 1.5 Don'ttrust spaces less than this time kn

tosp_init_guess_kn_mult 2.2 Threshguess - mult kn by this

tosp_init_guess_xht_mult 0.28 Threshguess - mult xht by this

tosp_max_sane_kn_thresh 5 Multiplieron kn to limit thresh

tosp_flip_caution 0 Don'tautoflip kn to sp when large separation

tosp_large_kerning 0.19 Limituse of xht gap with large kns

tosp_dont_fool_with_small_kerns -1 Limituse of xht gap with odd small kns

tosp_near_lh_edge 0 Don'treduce box if the top left is non blank

tosp_silly_kn_sp_gap 0.2 Don'tlet sp minus kn get too small

tosp_pass_wide_fuzz_sp_to_context 0.75 Howwide fuzzies need context

textord_blob_size_bigile 95 Percentilefor large blobs

textord_noise_area_ratio 0.7 Fractionof bounding box for noise

textord_blob_size_smallile 20 Percentilefor small blobs

textord_initialx_ile 0.75 Ileof sizes for xheight guess

textord_initialasc_ile 0.9 Ileof sizes for xheight guess

textord_noise_sizelimit 0.5 Fractionof x for big t count

textord_noise_normratio 2 Dotto norm ratio for deletion

textord_noise_syfract 0.2 xhfract height error for norm blobs

textord_noise_sxfract 0.4 xhfract width error for norm blobs

textord_noise_hfract 0.015625 Heightfraction to discard outlines as speckle noise

textord_noise_rowratio 6 Dotto norm ratio for deletion

textord_blshift_maxshift 0 Maxbaseline shift

textord_blshift_xfraction 9.99 Minsize of baseline shift

阅读全文

0 0