Book Description | Text classification (TC) is the task of automaticallycategorizing text into pre-set categories byanalyzing their contents. In this monograph, aframework of a new TC model based on hidden Markovmodel is proposed, and its implementation isdemonstrated in an application of library andinformation science. Two primary objectives are:First, the development of a new TC model based onhidden Markov model (HMM) proposed as a new frameworkfor TC task. HMM has been applied to a wide range ofapplications in text processing such as textsegmentation and event tracking, informationretrieval, and information extraction. Few, however,have applied HMM to TC. Second, the application ofthe Library of Congress Classification (LCC) as aclassification scheme for automatically organizingdigital resources. A general prototype for anHMM-based TC model is proposed and implemented, so asto classify a collection of dissertation abstractsfrom the ProQuest Digital Dissertations database intoLCC. The proposed model is compared to a NaveBayesian model, which has been extensively used in TCapplications. Lastly, current TC challenges andissues are discussed. |