voice recognition - How large must a corpus be to create a language model for Sphinx? -



voice recognition - How large must a corpus be to create a language model for Sphinx? -

i know how many documents or sentences or words need process in order language model of domain , utilize in voice recognition tools such cmu sphinx.

to create decent language model little domain it's plenty have 100 mb of texts. can mix them generic language model improve generalization of language model.

to create generic language model developers utilize big corpora. illustration there google 1tb corpus contains millions of words , terabyte of data. trigram part of 40gb of bigram counts must hundred terabytes of texts.

voice-recognition sphinx4

Comments

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -