Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

加载loadGoogleModel后的单词编码不对,不知是什么问题。 #22

Open
kinglai opened this issue Sep 11, 2016 · 6 comments
Open

Comments

@kinglai
Copy link

kinglai commented Sep 11, 2016

我用gensim训练的Word2Vec用这个程序加载时发现词的编码不对。请教一下是什么问题呢?

@xuexingdong
Copy link

看我新提的issue,希望能帮到你

@ansjsun
Copy link
Member

ansjsun commented Dec 12, 2016

必须uft-8格式

@ansjsun
Copy link
Member

ansjsun commented Mar 4, 2017

#23

@swy0915
Copy link

swy0915 commented Nov 24, 2017

在 Word2VEC 类的 loadGoogleModel 方法最后把
wordMap.put(word, vectors);
//dis.read(); 注释掉

操作系统是window7 64位,java 1.8
我是python导出的二进制文件,然后用loadGoogleModel 加载模型,word有问题,后来发现每次读完后都有dis.read(); 导致下一个词的字节少了1位,最后把dis.read(); 注释掉就好了

@Arthassssss
Copy link

Arthassssss commented Jul 13, 2022

已解决
一是注释掉//dis.read(); 注释掉
二还需要new string指定“utf-8”

@dongliuliu
Copy link

dongliuliu commented Jul 13, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants