voice recognition - How large must a corpus be to create a language model for Sphinx? -

May 15, 2014

i know how many documents or sentences or words need process in order language model of domain , utilize in voice recognition tools such cmu sphinx.

to create decent language model little domain it's plenty have 100 mb of texts. can mix them generic language model improve generalization of language model.

to create generic language model developers utilize big corpora. illustration there google 1tb corpus contains millions of words , terabyte of data. trigram part of 40gb of bigram counts must hundred terabytes of texts.

voice-recognition sphinx4

Search This Blog

JC

voice recognition - How large must a corpus be to create a language model for Sphinx? -

Comments

Post a Comment

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -