Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV output: bubble up error message so more clear #301

Open
chapmanjacobd opened this issue Oct 15, 2022 · 5 comments
Open

CSV output: bubble up error message so more clear #301

chapmanjacobd opened this issue Oct 15, 2022 · 5 comments

Comments

@chapmanjacobd
Copy link

$ unzstd --memory=2048MB --stdout RS_2022-08.zst | octosql "SELECT * FROM stdin.json" -o csv > RS_2022-08.csv                                                panic: runtime error: invalid memory address or nil pointer dereference                                         [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]                                          
goroutine 14 [running]:                                 github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)                                                  /home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc000638000, ...}}, ...}, ...)
        /home/xk/octosql/datasources/json/execution.go:2
09 +0xa58                                               github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc00062aa80}, {{0x0, ...}}, ...}, ...)                                         /home/xk/octosql/datasources/json/execution.go:196 +0xc78                                               github.com/cube2222/octosql/datasources/json.glob..func1.1()                                                            /home/xk/octosql/datasources/json/workers.go:68 +0x5b8                                                  created by github.com/cube2222/octosql/datasources/json.glob..func1                                                     /home/xk/octosql/datasources/json/workers.go:44 +0x45                                                   panic: runtime error: invalid memory address or nil poin
ter dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x
0 pc=0x908f1b]
@cube2222
Copy link
Owner

Hey, thanks for the report.

First time I'm seeing this. Do you possibly have the JSON value that is causing this?

@chapmanjacobd
Copy link
Author

It's the same 10GB file as before https://files.pushshift.io/reddit/submissions/RS_2022-08.zst.

Not sure how to get the JSON value. In the code you should be able to add some logging in the stacktrace to print it but maybe with stuff like memory errors that is more difficult?

when I try to generate a trace it only creates a 16 byte file:

unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | octosql "SELECT * FROM stdin.json" -o csv --profile trace
2022/10/15 13:02:41 profile: trace enabled, trace.out
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]

goroutine 7 [running]:
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)
	/home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc00027aa00, ...}}, ...}, ...)
	/home/xk/octosql/datasources/json/execution.go:209 +0xa58
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc0005f9b60}, {{0x0, ...}}, ...}, ...)
	/home/xk/octosql/datasources/json/execution.go:196 +0xc78
github.com/cube2222/octosql/datasources/json.glob..func1.1()
	/home/xk/octosql/datasources/json/workers.go:68 +0x5b8
created by github.com/cube2222/octosql/datasources/json.glob..func1
	/home/xk/octosql/datasources/json/workers.go:44 +0x45
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x908f1b]

goroutine 8 [running]:
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0x0, ...}}, ...}, ...)
	/home/xk/octosql/datasources/json/execution.go:196 +0xc1b
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x8, {}, {}, {}, {}, {}, {}, {}, {0x0}, {{0xc00027aa00, ...}}, ...}, ...)
	/home/xk/octosql/datasources/json/execution.go:209 +0xa58
github.com/cube2222/octosql/datasources/json.getOctoSQLValue({0x7, {}, {}, {}, {}, {}, {}, {}, {0xc0005f9b60}, {{0x0, ...}}, ...}, ...)
	/home/xk/octosql/datasources/json/execution.go:196 +0xc78
github.com/cube2222/octosql/datasources/json.glob..func1.1()
	/home/xk/octosql/datasources/json/workers.go:68 +0x5b8
created by github.com/cube2222/octosql/datasources/json.glob..func1
	/home/xk/octosql/datasources/json/workers.go:44 +0x45

▓▒▓░ lb:/ (main|●2?1) [141/SIGPIPE|2]🍪  ls trace.out 
Permissions Size User Date Modified Git Name
.rw-r--r--@   16 xk   15 Oct 13:02   -N trace.out
████ lb:/ (main|●2?1) 🌍 cat trace.out
go 1.11 trace⏎                    

@chapmanjacobd
Copy link
Author

Interesting... when I only do the first line then I get a more clear error message:

░▒░█ lb:/ (main|●2?1) 🍞 unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | head -1 | octosql "SELECT * FROM stdin.json" -o csv --profile trace
2022/10/15 13:04:57 profile: trace enabled, trace.out
panic: invalid value type to print in CSV: List

goroutine 1 [running]:
github.com/cube2222/octosql/outputs/formats.FormatCSVValue(0xc0003ac480, {0x7, 0x0, 0x0, 0x0, {0x0, 0x0}, {0x0, 0x0, 0x0}, ...})
	/home/xk/octosql/outputs/formats/csv_format.go:65 +0x785
github.com/cube2222/octosql/outputs/formats.(*CSVFormatter).Write(0xc00008b6a0, {0xc0009cc000, 0x5e, 0x0?})
	/home/xk/octosql/outputs/formats/csv_format.go:42 +0x12c
github.com/cube2222/octosql/outputs/eager.(*OutputPrinter).Run.func1({{0xcdd960?, 0x0?}}, {{0xc0009cc000, 0x5e, 0x5e}, 0x0, {0x0, 0x0, 0x0}})
	/home/xk/octosql/outputs/eager/eager.go:42 +0x54
github.com/cube2222/octosql/datasources/json.(*DatasourceExecuting).Run(0xc000497f50, {{0xf63870?, 0xc0003bd3b0?}, 0x0?}, 0xc0001abc50, 0x0?)
	/home/xk/octosql/datasources/json/execution.go:127 +0xc86
github.com/cube2222/octosql/outputs/eager.(*OutputPrinter).Run(0xc0002091c0, {{0xf63870?, 0xc0003bd3b0?}, 0x0?})
	/home/xk/octosql/outputs/eager/eager.go:39 +0x1e4
github.com/cube2222/octosql/cmd.glob..func4(0x157bd60, {0xc00009eb90, 0x1, 0x5?})
	/home/xk/octosql/cmd/root.go:458 +0x3b54
github.com/spf13/cobra.(*Command).execute(0x157bd60, {0xc000032250, 0x5, 0x5})
	/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0x157bd60)
	/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
	/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	/home/xk/go/pkg/mod/github.com/spf13/[email protected]/command.go:895
github.com/cube2222/octosql/cmd.Execute({0xf637c8?, 0xc000208dc0?})
	/home/xk/octosql/cmd/root.go:471 +0x53
main.main()
	/home/xk/octosql/main.go:24 +0xe8
█░█▒ lb:/ (main|●2?1) [141/SIGPIPE|0|2]🌎 unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | head -1 
{"all_awardings":[],"allow_live_comments":false,"archived":false,"author":"Kresley","author_created_utc":1390870929,"author_flair_background_color":null,"author_flair_css_class":null,"author_flair_richtext":[],"author_flair_template_id":null,"author_flair_text":null,"author_flair_text_color":null,"author_flair_type":"text","author_fullname":"t2_f0c4k","author_patreon_flair":false,"author_premium":false,"awarders":[],"banned_by":null,"can_gild":true,"can_mod_post":false,"category":null,"content_categories":null,"contest_mode":false,"created_utc":1659312011,"discussion_type":null,"distinguished":"moderator","domain":"self.podcasts","edited":false,"event_end":1659326400.0,"event_is_live":false,"event_start":1659312000.0,"gilded":0,"gildings":{},"hidden":false,"hide_score":false,"id":"wcciq1","is_created_from_ads_ui":false,"is_crosspostable":true,"is_meta":false,"is_original_content":false,"is_reddit_media_domain":false,"is_robot_indexable":true,"is_self":true,"is_video":false,"link_flair_background_color":"#7e53c1","link_flair_css_class":"announce","link_flair_richtext":[],"link_flair_template_id":"efc6fb7e-75ae-11ea-baed-0e1ba7588035","link_flair_text":"Announcement","link_flair_text_color":"light","link_flair_type":"text","locked":false,"media":null,"media_embed":{},"media_only":false,"name":"t3_wcciq1","no_follow":false,"num_comments":8,"num_crossposts":1,"over_18":false,"parent_whitelist_status":"all_ads","permalink":"\/r\/podcasts\/comments\/wcciq1\/have_questions_about_recent_supreme_court\/","pinned":false,"post_hint":"self","preview":{"enabled":false,"images":[{"id":"Qalc2eReon-KD6FX7LpDmEuzwjap9_yobQbqDYd2RoQ","resolutions":[{"height":108,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=108&crop=smart&auto=webp&s=01924a24a46b6e764caf58f310678ac69a6c6c8b","width":108},{"height":216,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=216&crop=smart&auto=webp&s=ae46ea72110dc7c060d102b496cfd457ecd6b950","width":216},{"height":320,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?width=320&crop=smart&auto=webp&s=f6b853bbf5f384084a45c8dde00d5d7b97356d69","width":320}],"source":{"height":500,"url":"https:\/\/external-preview.redd.it\/aY2_qQw82pmeuXTAHhzUC3BvZrwYhi9GKxP6Gw9ECQU.jpg?auto=webp&s=dc05866f0c0b48980e43549ceb0955176f30e3c6","width":500},"variants":{}}]},"pwls":6,"quarantine":false,"removed_by":null,"removed_by_category":null,"retrieved_on":1660025808,"score":16,"secure_media":null,"secure_media_embed":{},"selftext":"We're lucky to have an attorney who is *also* a veteran podcaster kind enough to join here for a live Reddit Talk about recent US Supreme Court decisions, Justices and perhaps some other federal law decisions you may (or may not have) heard about in the news lately.  He may also have some friends and guests join. \n\n\nCharles Star is an attorney in Brooklyn, the host of [Hostile Witness](https:\/\/www.soundcloud.com\/hwpod), a panelist on [ALAB Series](https:\/\/www.alabseries.com\/) and the guy who did most of the talking on the late [Mic Dicta](https:\/\/soundcloud.com\/micdicta). He tweets at [@ugarles](https:\/\/twitter.com\/Ugarles) (and has a reddit account he almost never uses). He writes about legal issues from time to time and has bylines in Vice, Slate and a bunch of places that no longer exist. (He'd probably appear on your podcast.)  \n\n\n**What is Reddit Talk?**\n\nTalk allows us to host interactive sessions with podcasters and users of the community can join in and interact with the speakers in real time from either new.reddit on desktop or though (an official) Reddit app. A Talk can handle up to 30 different speakers and up to 100k listeners in the audience can tune in at the same time.\n\nYou can learn more about Reddit Talk from the user guide [here](https:\/\/reddithelp.com\/hc\/en-us\/articles\/4404349195284-What-is-Reddit-Talk-) .\n\n**Ask Away!**\n\nIf you can't make it live at that time, you may leave any questions you might have  as a comment on this announcement post and we'll try to pick some to read during the Talk. A recording of the Talk should continue to be available here after it concludes.\n\nYou can tap on **Follow** on this post if you'd like to receive a reminder notification when the Talk will begin, if you'd like to come ask a question live.","send_replies":true,"spoiler":false,"stickied":false,"subreddit":"podcasts","subreddit_id":"t5_2qh1d","subreddit_name_prefixed":"r\/podcasts","subreddit_subscribers":2322456,"subreddit_type":"public","suggested_sort":null,"thumbnail":"self","thumbnail_height":null,"thumbnail_width":null,"title":"Have questions about recent Supreme Court decisions? So do we! Join us Sunday @ 8:00 p.m. EST with a lawyer\/podcaster who can help answer how we got here.","top_awarded_type":null,"total_awards_received":0,"treatment_tags":[],"upvote_ratio":0.71,"url":"https:\/\/www.reddit.com\/r\/podcasts\/comments\/wcciq1\/have_questions_about_recent_supreme_court\/","view_count":null,"whitelist_status":"all_ads","wls":6}

@chapmanjacobd
Copy link
Author

chapmanjacobd commented Oct 15, 2022

That makes sense to me. Maybe instead of panic it should just be Error:

Error: typecheck error: invalid type for CSV: List

It works with no problems when specifying all columns except the list and object type columns. 38 million rows in 500 seconds.

unzstd --memory=2048MB --stdout ~/lb/psaw/files.pushshift.io/reddit/submissions/RS_2022-08.zst | octosql "SELECT author_flair_css_class,allow_live_co
mments,archived,author,author_patreon_flair,author_created_utc,author_flair_background_color,author_flair_template_id,author_flair_text,author_flair_text_color,author_flair_type,author_fullnam
e,author_premium,banned_by,can_gild,can_mod_post,category,content_categories,contest_mode,created_utc,discussion_type,distinguished,domain,edited,event_end,event_is_live,event_start,gilded,hid
den,hide_score,is_reddit_media_domain,id,is_created_from_ads_ui,is_crosspostable,is_meta,is_original_content,is_robot_indexable,is_self,is_video,link_flair_background_color,link_flair_css_clas
s,link_flair_template_id,link_flair_text,link_flair_text_color,link_flair_type,locked,media,media_only,name,no_follow,num_comments,num_crossposts,over_18,parent_whitelist_status,permalink,pinn
ed,post_hint,pwls,quarantine,removed_by,removed_by_category,retrieved_on,score,secure_media,send_replies,spoiler,stickied,subreddit,subreddit_id,subreddit_name_prefixed,subreddit_subscribers,s
ubreddit_type,suggested_sort,thumbnail_height,thumbnail,thumbnail_width,title,top_awarded_type,total_awards_received,upvote_ratio,url,view_count,whitelist_status,wls FROM stdin.json" -o csv > 
RS_2022-08.csv

Maybe the only thing to fix here is to bubble up the error somehow so that it does not show up as "invalid memory address or nil pointer dereference"

@chapmanjacobd chapmanjacobd changed the title panic: runtime error: invalid memory address or nil pointer dereference CSV output: bubble up error message so more clear Oct 15, 2022
@cube2222
Copy link
Owner

Thanks for the writeup!

I'll admit I have no idea why you got the nil dereference instead of the List panic there, but I'll look into it (and probably, as you suggested, turn it into a normal error, to avoid further similar issues).

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants