c++ - Access Violation while using _tcstok -



c++ - Access Violation while using _tcstok -

i trying tokenize lines in file using _tcstok. able tokenize line once, when seek tokenize sec time, access violation. sense has not accessing values, locations instead. i'm not sure how else though.

thanks,

dave

p.s. i'm using tchar , _tcstok because file utf-8.

this error i'm getting:

first-chance exception @ 0x63e866b4 (msvcr90d.dll) in testing.exe: 0xc0000005: access violation reading location 0x0000006c.

vector<tchar> tabdelimitedsource::getnext() { // returns next document (a given cell) file(s) tchar row[256]; // homecoming null if no more documents/rows vector<tchar> document; try{ //read each line in file, corresponding , individual document buff_reader->getline(row,10000); } grab (ifstream::failure e){ ; // ignore , fall through } if (_tcslen(row)>0){ this->current_row += 1; vector<tchar> cells; //separate line on tabs (id 'tab' document title 'tab' document body) tchar * pch; pch = _tcstok(row,"\t"); while (pch != null){ cells.push_back(*pch); pch = _tcstok(null, "\t"); } // split cell individual words using lucene analyzer try{ //separate body spaces tchar original_document ; original_document = (cells[column_holding_doc]); try{ tchar * pc; pc = _tcstok((char*)original_document," "); while (pch != null){ document.push_back(*pc); pc = _tcstok(null, "\t"); }

first up, code mongrel mixture of c string manipulation , c++ containers. dig hole. ideally should tokenize line std::vector<std::wstring>

also, you're confused tchar , utf-8. tchar character type 'floats' between 8 , 16 bits depending on compile time flags. utf-8 files utilize between 1 , 4 bytes represent each character. so, want hold text std::wstring objects, you're going need explicitly convert utf-8 wstrings.

but, if want anything working, focus on tokenization. need store address of start of each token (as tchar*) vector vector of tchars instead. when seek utilize token data, you're casting tchars tchar* pointers, unsurprising result of access violations. av address give 0x0000006c, ascii code character l.

vector<tchar*> cells; ... cells.push_back(pch);

... , then...

tchar *original_document = cells[column_holding_doc]; tchar *pc = _tcstok(original_document," ");

c++ visual-studio-2008 utf-8 access-violation

Comments

Popular posts from this blog

iphone - Dismissing a UIAlertView -

intellij idea - Update external libraries with intelij and java -

javascript - send data from a new window to previous window in php -