Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 2658, saw 8 #2

Open
momozzing opened this issue Oct 31, 2021 · 0 comments

Comments

@momozzing
Copy link

간단하게 데이터를 불러와서 사용하려고 했는데 오류가 발생합니다

data = pd.read_csv("KorSTS/sts-train.tsv", delimiter="\t")

pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 2658, saw 8

data = pd.read_csv("KorSTS/sts-train.tsv", delimiter="\t",error_bad_lines=False)

error_bad_lines 를 붙여 사용하니 train data 5750 개중에 5696개만 출력됩니다.

dev도 1500개중 1466개만 출력이 됩니다. test는 오류가 없습니다.

중간에 \t 말고 띄어쓰기가 들어간것 같습니다. (오류난 라인 들어가서 변경해보니 오류가 제거됬습니다)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant