forked from xlang-ai/Spider2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspider2.jsonl
299 lines (299 loc) · 80.9 KB
/
spider2.jsonl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
{"instance_id": "postgres_chinook001", "instruction": "How can we consolidate invoice data with customer and date details for comprehensive reporting, and how can we aggregate playlist tracks with their metadata for analysis? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_shopify001", "instruction": "comprehensively analyze Shopify orders and transactions, incorporating incremental data processing, order adjustments, refunds, discounts, fulfillment details, and classifying customers as new or repeat. After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_airport001", "instruction": " calculate the distances between Malaysian airports and summarize their arrival flight counts. After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_playbook001", "instruction": "How can we calculate customer attribution using various models to assign revenue from customer conversions to different touchpoints, and then aggregate this data with ad spend information to determine key marketing metrics such as Cost Per Acquisition (CPA) and Return on Advertising Spend (ROAS)? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_mrr001", "instruction": "How can we analyze customer subscription data to calculate the Monthly Recurring Revenue (MRR) for each customer by month\u2014including changes due to upgrades, downgrades, churns, and reactivations\u2014identify the churn month for customers by adding records for the months after their last active month with zero MRR, and generate monthly records for each customer between their first and last active months to track customer activity and MRR over time? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_tpch001", "instruction": "How can we identify the top 10 returned parts per month by suppliers, analyze low-cost brass suppliers in Europe including their supply costs, and specifically extract details of such suppliers in the United Kingdom? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_airbnb001", "instruction": "How can we aggregate user reviews by sentiment on a daily and monthly basis, including month-over-month comparisons, and combine listings with host information to generate unique keys for tracking the latest updates to both listings and hosts? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "ch001", "instruction": "Please help me identify the year with the highest average property prices for the entire UK and for London specifically, as well as finding the town and district with the highest average property prices and at least 100 transactions since 2020.", "type": "Clickhouse"}
{"instance_id": "ch006", "instruction": "Determine the daily trend (increase, decrease, or no change) and percentage change in new confirmed COVID-19 cases for the location identified by 'US_DC'.", "type": "Clickhouse"}
{"instance_id": "ch005", "instruction": "Please calculate monthly power consumption metrics, including total and average usage, for different device categories (coffee machines, printers, projectors, and vending machines) based on hourly averages.", "type": "Clickhouse"}
{"instance_id": "ch003", "instruction": "Please help me find the top five ski resorts in the US with the highest recorded snowfalls from selected weather stations since 2017, where the stations are located within 20 kilometers of the resorts and are situated at elevations above 1800 meters.", "type": "Clickhouse"}
{"instance_id": "ch004", "instruction": "Could you help me identify distinct devices and their types and locations within a building where significant temperature variations, defined as differences of 25 degrees or more in hourly averages, occur during the winter (Dec 2018 - Feb 2019) or summer periods (Jun 2019 - Aug 2019).", "type": "Clickhouse"}
{"instance_id": "playbook001", "instruction": "Complete the project of this database to show the metrics of each traffic source, I believe every touchpoint in the conversion path is equally important, please choose the most suitable attribution method.", "type": "DBT"}
{"instance_id": "analytics_engineering001", "instruction": "comprehensive report that combines customer, employee, product, and purchase order details, including order quantities, costs, and timestamps, all joined together for easy analysis", "type": "DBT"}
{"instance_id": "workday002", "instruction": "Create a table that aggregates job profile information along with job family and job family group details", "type": "DBT"}
{"instance_id": "synthea001", "instruction": "How can we aggregate financial data related to healthcare events, such as conditions, drug exposures, and procedures, to calculate total costs, charges, and payments for each event?", "type": "DBT"}
{"instance_id": "tickit002", "instruction": "Get detailed information about events and ticket listings, including venue details, event timing, categories, seller information, and pricing for each listing", "type": "DBT"}
{"instance_id": "activity001", "instruction": "How can I compare user activities to see how many users signed up and visited a page, using both the 'aggregate after' and 'aggregate all ever' methods for capturing the visit page activities?", "type": "DBT"}
{"instance_id": "lever001", "instruction": "Pull together the data from multiple tables related to job postings and create a complete report that covers job applications, interviews, requisitions, tags, and the hiring manager details.", "type": "DBT"}
{"instance_id": "greenhouse001", "instruction": "Please generate a report on enhanced data of job and application.", "type": "DBT"}
{"instance_id": "app_reporting002", "instruction": "Please generate an overview report of the app combine apple store and google play.", "type": "DBT"}
{"instance_id": "mrr001", "instruction": "Complete the project on this database to calculate the monthly recurring revenue.", "type": "DBT"}
{"instance_id": "quickbooks003", "instruction": "Pull a table with all balance sheet entries for asset, liability, and equity accounts. Make sure it includes account details, class, parent information, and the monthly period balances.", "type": "DBT"}
{"instance_id": "quickbooks002", "instruction": "Generate a table that includes bill and invoice transaction information, including supplier and customer details, payment status, balance, overdue days, and other financial information.", "type": "DBT"}
{"instance_id": "tpch001", "instruction": "Calculate the lifetime value of a customer by analyzing their total purchases and returns, categorize their status based on the percentage of returns, and combine this data with lost revenue information.", "type": "DBT"}
{"instance_id": "divvy001", "instruction": "Analyze bike trips by combining user data, trip duration, and geo-locational information for start and end stations, while filtering trips based on their duration and associating stations with specific neighborhoods", "type": "DBT"}
{"instance_id": "playbook002", "instruction": "Please assist me in completing the data transformation project of this database.", "type": "DBT"}
{"instance_id": "apple_store001", "instruction": "Please finish the data transformation project to generate source type and territory report for me.", "type": "DBT"}
{"instance_id": "superstore001", "instruction": "How can I generate a dataset that associates sales transactions with their respective regional managers, including details about products, customers, shipping, and geographical data?", "type": "DBT"}
{"instance_id": "marketo001", "instruction": "How can I combine the most recent version of each email template with aggregated metrics for sends, opens, bounces, clicks, deliveries, and unsubscribes?", "type": "DBT"}
{"instance_id": "workday001", "instruction": "Create a table that combines organization roles with worker position", "type": "DBT"}
{"instance_id": "f1003", "instruction": "Create data models to track Formula 1 drivers' podium finishes, fastest laps, and constructors' retirements per season", "type": "DBT"}
{"instance_id": "retail001", "instruction": "Which countries have the highest total revenue based on the number of invoices, and what are the top 10 countries by total revenue?", "type": "DBT"}
{"instance_id": "app_reporting001", "instruction": "Please generate reports for app version and OS version.", "type": "DBT"}
{"instance_id": "mrr002", "instruction": "Please complete this data transformation project to analyze the trends in user subscription changes.", "type": "DBT"}
{"instance_id": "tickit001", "instruction": "Generating a complete sales summary that includes buyer and seller details, event categories, and sales metrics.", "type": "DBT"}
{"instance_id": "maturity001", "instruction": "How can I retrieve detailed information about doctors, including their specialties, and patients, including their medical details, such as diabetes status, from the respective dimension tables?", "type": "DBT"}
{"instance_id": "tpch002", "instruction": "Find low-cost brass part suppliers located in the United Kingdom, including their part availability, retail prices, and contact details", "type": "DBT"}
{"instance_id": "quickbooks001", "instruction": "Please create a table that unions all records from each model within the double_entry_transactions directory. The table should result in a comprehensive general ledger, ensuring each transaction has an offsetting debit and credit entry.", "type": "DBT"}
{"instance_id": "bq229", "instruction": "Can you provide a count of how many image URLs are categorized as \u2018cat\u2019 (with label '/m/01yrx' and full confidence) and how many contain no such cat labels(categorized as \u2018other\u2019) at all? ", "type": "Bigquery"}
{"instance_id": "bq216", "instruction": "Identify the top five patents filed in the same year as `US-9741766-B2` that are most similar to it based on technological similarities. Please provide the publication numbers.", "type": "Bigquery"}
{"instance_id": "bq024", "instruction": "For the year 2012, which top 10 evaluation groups have the largest subplot acres when considering only the condition with the largest subplot acres within each group? Please include the evaluation group, evaluation type, condition status code, evaluation description, state code, macroplot acres, and subplot acres.", "type": "Bigquery"}
{"instance_id": "bq012", "instruction": "What is the average balance of the top 10 addresses with the most balance on the Ethereum blockchain, considering both incoming and outgoing transactions with valid addresses, but only those recorded as used on receipt, as well as transaction fees? Only keep successful transactions with no call type or where the call type is 'call'. The average balance, expressed in quadrillions (10^15), is rounded to two decimal places.", "type": "Bigquery"}
{"instance_id": "bq015", "instruction": "Rank the top 10 most discussed tags on Stack Overflow questions that were mentioned on Hacker News since 2014.", "type": "Bigquery"}
{"instance_id": "bq273", "instruction": "Can you list the top 5 months from August 2022 to November 2023 where the profit from Facebook-sourced completed orders showed the largest month-over-month increase? Calculate profit as sales minus costs.", "type": "Bigquery"}
{"instance_id": "bq041", "instruction": "What are the monthly statistics for new StackOverflow users created in 2021, including the percentage of new users who asked questions and the percentage of those who asked questions and then answered questions within their first 30 days?", "type": "Bigquery"}
{"instance_id": "bq425", "instruction": "List all distinct molecules associated with the company 'SanofiAventis,' along with their trade name and approval date, retaining the most recent approval date for each molecule, using data from ChEMBL Release 23.", "type": "Bigquery"}
{"instance_id": "bq422", "instruction": "What are the average series sizes in MiB for the top 3 patients with the highest slice interval difference tolerance and the top 3 patients with the highest maximum exposure difference, considering only CT images from the 'nlst' collection?", "type": "Bigquery"}
{"instance_id": "bq280", "instruction": "Please provide the display name of the user who has answered the most questions on Stack Overflow, considering only users with a reputation greater than 10.", "type": "Bigquery"}
{"instance_id": "bq079", "instruction": "Given the latest evaluations of timberland and forestland plots, which state within each category has the highest total acreage? Please provide the state code, the evaluation group, the state name, and the total acres for the top state in each category.", "type": "Bigquery"}
{"instance_id": "bq070", "instruction": "Could you construct a structured clean dataset from `dicom_all` for me? It should retrieve digital slide microscopy (SM) images from the TCGA-LUAD and TCGA-LUSC datasets and meet the requirements in `dicom_dataset_selection.md`. The target labels are tissue type and cancer subtype.", "type": "Bigquery"}
{"instance_id": "bq414", "instruction": "Retrieve the object id, title, and the formatted metadata date (as a string in 'YYYY-MM-DD' format) for objects in the \"The Libraries\" department where the cropConfidence is greater than 0.5, the object's title contains the word \"book\".", "type": "Bigquery"}
{"instance_id": "bq289", "instruction": "Can you find the shortest distance between any two amenities (either a library, place of worship, or community center) located within Philadelphia?", "type": "Bigquery"}
{"instance_id": "bq083", "instruction": "What is the daily change in the total market value (formatted as a string in USD currency format) of the USDC token (with a target address of \"0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48\") in 2023 , considering both Mint (the input starts with 0x42966c68) and Burn (the input starts with 0x40c10f19) transactions?", "type": "Bigquery"}
{"instance_id": "bq077", "instruction": "For each year from 2010 to 2016, what is the highest number of motor thefts in one month?", "type": "Bigquery"}
{"instance_id": "bq219", "instruction": "Which two liquor categories, each contributing an average of at least 1% to monthly sales volume over 24 months, have the lowest Pearson correlation coefficient in their sales percentages?", "type": "Bigquery"}
{"instance_id": "bq226", "instruction": "Can you find me the complete url of the most frequently used sender's address on the Cronos blockchain since January 1, 2023, where transactions were made to non-null addresses and in blocks larger than 4096 bytes?", "type": "Bigquery"}
{"instance_id": "bq221", "instruction": "Identify the CPC technology areas with the highest exponential moving average of patent filings each year (smoothing factor 0.2), and provide the full title and the best year for each CPC group at level 5.", "type": "Bigquery"}
{"instance_id": "bq025", "instruction": "Provide a list of the top 10 countries for the year 2020, ordered by the highest percentage of their population under 20 years old. For each country, include the total population under 20 years old, the total midyear population, and the percentage of the population that is under 20 years old.", "type": "Bigquery"}
{"instance_id": "bq441", "instruction": "Please help me compile the critical details on traffic accidents in 2015, as listed in the info document.", "type": "Bigquery"}
{"instance_id": "bq210", "instruction": "How many US B2 patents granted between 2008 and 2018 contain claims that do not include the word 'claim'?", "type": "Bigquery"}
{"instance_id": "bq022", "instruction": "Given the taxi trip data in Chicago, partition the trips that last no more than 1 hour into 6 quantiles based on trip duration. Please provide the minimum/maximum trip duration (rounded-off to integer minutes), total trips, and average fare for each quantile.", "type": "Bigquery"}
{"instance_id": "bq412", "instruction": "Please provide the page URLs, first shown time, last shown time, removal reason, violation category, and lower and upper bound shown times for the most recent five closed ads in the Croatia region which had shown higher than 10,000 and lower than 25,000, and used at least one audience criterion such as demographics, geographic location, contextual signals, customer lists, or interest topics. The region code of Croatia is HR.", "type": "Bigquery"}
{"instance_id": "bq076", "instruction": "Which month generally has the greatest number of motor vehicle thefts in 2016?", "type": "Bigquery"}
{"instance_id": "bq085", "instruction": "Could you provide the total number of confirmed COVID-19 cases and the number of cases per 100,000 people, based on the 2020 population, on April 20, 2020, for the US, France, China, Italy, Spain, Germany, and Iran?", "type": "Bigquery"}
{"instance_id": "bq281", "instruction": "What is the highest number of electric bike rides lasting more than 10 minutes taken by subscribers with 'Student Membership' in a single day, excluding rides starting or ending at 'Mobile Station' or 'Repair Shop'?", "type": "Bigquery"}
{"instance_id": "bq275", "instruction": "Can you provide a list of visitor IDs for those who made their first transaction on a mobile device on a different day than their first visit?", "type": "Bigquery"}
{"instance_id": "bq078", "instruction": "Retrieve the approved symbol of target genes with the highest overall score that are associated with the disease 'EFO_0000676' from the data source 'IMPC'.", "type": "Bigquery"}
{"instance_id": "bq040", "instruction": "For NYC yellow taxi trips between January 1-7, 2016, excluding pickups from 'EWR' and 'Staten Island', calculate the proportion of trips by tip category for each pickup borough. Show the borough, tip category, and proportion, ensuring trips where the dropoff occurs after the pickup, the passenger count is greater than 0, and trip distance, tip, tolls, MTA tax, fare, and total amount are non-negative.", "type": "Bigquery"}
{"instance_id": "bq424", "instruction": "List the top 10 countries with respect to the total amount of long-term external debt in descending order, excluding those without a specified region.", "type": "Bigquery"}
{"instance_id": "bq286", "instruction": "Can you tell me the name of the most popular female baby in Wyoming for the year 2021, based on the proportion of female babies given that name compared to the total number of female babies given the same name across all states?", "type": "Bigquery"}
{"instance_id": "ga018", "instruction": "I'd like to analyze the appeal of our products to users. Can you calculate the percentage of times users go from browsing the product list pages to clicking into the product detail pages during a single session on January 2nd, 2021?", "type": "Bigquery"}
{"instance_id": "bq300", "instruction": "What is the highest number of answers received for a single Python 2 specific question on Stack Overflow, excluding any discussions that involve Python 3?", "type": "Bigquery"}
{"instance_id": "bq338", "instruction": "Can you find the census tracts in the 36047 area that made both the top 20 lists for biggest population and median income increases from 2011 to 2018, and had over 1000 residents each year?", "type": "Bigquery"}
{"instance_id": "ga020", "instruction": "Which quickplay event type had the lowest user retention rate during the second week after their initial engagement, for users who first engaged between August 1 and August 15, 2018?", "type": "Bigquery"}
{"instance_id": "bq309", "instruction": "Show the top 10 longest Stack Overflow questions where the question has an accepted answer or an answer with a score-to-view ratio above 0.01, including the user's reputation, net votes, and badge count.", "type": "Bigquery"}
{"instance_id": "bq104", "instruction": "Identify which DMA had the highest search scores for the terms that were top rising one year ago", "type": "Bigquery"}
{"instance_id": "bq396", "instruction": "Which top 3 states had the largest differences in the number of traffic accidents between rainy and clear weather during weekends in 2016? Please also provide the respective differences for each state.", "type": "Bigquery"}
{"instance_id": "bq362", "instruction": "Which three companies had the largest increase in trip numbers between two consecutive months in 2018?", "type": "Bigquery"}
{"instance_id": "bq150", "instruction": "Assess whether different genetic variants affect the log10-transformed TP53 expression levels in TCGA-BRCA samples using sequencing and mutation data. Provide the total number of samples, the number of mutation types, the mean square between groups, the mean square within groups, and the F-statistic.", "type": "Bigquery"}
{"instance_id": "bq391", "instruction": "Could you find out which health conditions have the most types of medications per case, for living patients whose last names start with 'A' and have only one unique condition? I'd like to see the top eight conditions and their codes, ranked by the highest number of different meds prescribed to any single patient.", "type": "Bigquery"}
{"instance_id": "bq161", "instruction": "Calculate the net difference between the number of pancreatic adenocarcinoma (PAAD) patients in TCGA's dataset who are confirmed to have mutations in both KRAS and TP53 genes, and those without mutations in either gene. Utilize patient clinical and follow-up data alongside genomic mutation details from TCGA\u2019s cancer genomics database, focusing specifically on PAAD studies where the mutations have passed quality filters.", "type": "Bigquery"}
{"instance_id": "bq398", "instruction": "What are the top three debt indicators for Russia based on the highest debt values?", "type": "Bigquery"}
{"instance_id": "bq354", "instruction": "Could you provide the percentage of participants for standard acne, atopic dermatitis, psoriasis, and vitiligo defined by the International Classification of Diseases 10-CM(ICD-10-CM), including their subcategories? The ICD-10 codes are: Acne (L70), Atopic dermatitis (L20), Psoriasis (L40), and Vitiligo (L80). ", "type": "Bigquery"}
{"instance_id": "bq166", "instruction": "Analyze the largest copy number of chromosomal aberrations including amplifications, gains, homozygous deletions, heterozygous deletions, and normal copy states across cytogenetic bands in TCGA-KIRC kidney cancer samples. Use segment allelic data to identify the maximum copy number aberrations within each chromosomal segment, and report their frequencies, sorted by chromosome and cytoband.", "type": "Bigquery"}
{"instance_id": "bq159", "instruction": "Calculate the chi-square value to assess the association between histological types and the presence of CDH1 gene mutations in BRCA patients using data from the PanCancer Atlas. Focus on patients with known histological types and consider only reliable mutation entries. Exclude any histological types or mutation statuses with marginal totals less than or equal to 10. Match clinical and mutation data using ParticipantBarcode", "type": "Bigquery"}
{"instance_id": "bq308", "instruction": "Show the number of Stack Overflow questions asked each day of the week in 2021, and find out how many and what percentage of those were answered within one hour.", "type": "Bigquery"}
{"instance_id": "ga010", "instruction": "Can you give me an overview of our website traffic for December 2020? I'm particularly interested in the channel with the fourth highest number of sessions.", "type": "Bigquery"}
{"instance_id": "ga028", "instruction": "Please perform a 7-day retention analysis for users who first used the app during the week starting on July 2, 2018. Provide the total number of these new users and the number of retained users for each week from Week 0 (the initial week) through Week 4.", "type": "Bigquery"}
{"instance_id": "ga017", "instruction": "How many distinct users viewed the most frequently visited page during January 2021?", "type": "Bigquery"}
{"instance_id": "bq102", "instruction": "Identify which start positions are associated with missense variants in the BRCA1 gene on chromosome 17, where the reference base is 'C' and the alternate base is 'T'.", "type": "Bigquery"}
{"instance_id": "bq339", "instruction": "Which month in 2017 had the largest absolute difference between cumulative bike usage minutes for customers and subscribers?", "type": "Bigquery"}
{"instance_id": "ga021", "instruction": "What is the retention rate for users two weeks after their initial quickplay event within the period from July 2, 2018, to July 16, 2018, calculated separately for each quickplay event type?", "type": "Bigquery"}
{"instance_id": "ga019", "instruction": "Could you determine what percentage of users either did not uninstall our app within seven days or never uninstalled it after installing during August and September 2018?", "type": "Bigquery"}
{"instance_id": "bq301", "instruction": "Retrieve details of accepted answers related to JavaScript security topics such as XSS, cross-site scripting, exploits, and cybersecurity, for questions posted in January 2016 on Stack Overflow. For each accepted answer, include the answer's ID, the answerer's reputation, score, and comment count, along with the associated question's tags, score, answer count, the asker's reputation, view count, and comment count.", "type": "Bigquery"}
{"instance_id": "bq167", "instruction": "Please find the giver-and-recipient pair with the most Kaggle forum upvotes. Display their usernames and the respective number of upvotes they gave to each other.", "type": "Bigquery"}
{"instance_id": "bq355", "instruction": "Please tell me the percentage of participants not using quinapril and related medications(Quinapril RxCUI: 35208).", "type": "Bigquery"}
{"instance_id": "bq193", "instruction": "Help me retrieve the top 5 most frequently occurring non-empty, non-commented lines of text in `readme.md` files from GitHub repositories that primarily use Python for development.", "type": "Bigquery"}
{"instance_id": "bq158", "instruction": "Which top five histological types of breast cancer (BRCA) in the PanCancer Atlas exhibit the highest percentage of CDH1 gene mutations?", "type": "Bigquery"}
{"instance_id": "bq194", "instruction": "What is the second most frequently used module (imported library) across Python, R, and IPython script (.ipynb) files in the GitHub sample dataset?", "type": "Bigquery"}
{"instance_id": "bq399", "instruction": "Which high-income country had the highest average crude birth rate respectively in each region, and what are their corresponding average birth rate, during the 1980s?", "type": "Bigquery"}
{"instance_id": "bq390", "instruction": "Please provide the study instance UIDs for studies that include both T2-weighted axial magnetic resonance imaging and anatomical structure segmentations of the peripheral zone, in prostate repeatability collection.", "type": "Bigquery"}
{"instance_id": "bq363", "instruction": "For taxi trips with a duration rounded to the nearest minute, and between 1 and 50 minutes, if the trip durations are divided into 10 quantiles, what are the total number of trips and the average fare for each quantile?", "type": "Bigquery"}
{"instance_id": "bq397", "instruction": "Identify the country with the highest total transactions within each channel grouping, provided that the channel includes transactions from more than one country. What is the transaction total for that country?", "type": "Bigquery"}
{"instance_id": "bq341", "instruction": "Which Ethereum address has the top 3 smallest positive balance from transactions involving the token at address \"0xa92a861fc11b99b24296af880011b47f9cafb5ab\"?", "type": "Bigquery"}
{"instance_id": "bq187", "instruction": "What is the total circulating supply balances of the 'BNB' token for all addresses (excluding the zero address), based on the amount they have received (converted by dividing by 10^18) minus the amount they have sent?", "type": "Bigquery"}
{"instance_id": "bq379", "instruction": "Which target approved symbol has the overall association score closest to the mean score for psoriasis?", "type": "Bigquery"}
{"instance_id": "bq346", "instruction": "Which five segmentation categories appear most frequently in publicly accessible DICOM SEG data, where the modality is \"SEG\" and the SOPClassUID is \"1.2.840.10008.5.1.4.1.1.66.4\"?", "type": "Bigquery"}
{"instance_id": "bq377", "instruction": "Extract and count the frequency of all package names listed in the require section of JSON-formatted content", "type": "Bigquery"}
{"instance_id": "bq383", "instruction": "Could you provide the highest recorded precipitation, minimum temperature, and maximum temperature from the last 15 days of each year from 2013 to 2016 at weather station USW00094846? Ensure each value represents the peak measurement for that period, with precipitation in millimeters and temperatures in degrees Celsius, including only valid and high-quality data.", "type": "Bigquery"}
{"instance_id": "ga004", "instruction": "Can you figure out the average difference in pageviews between users who bought something and those who didnt in December 2020? Just label anyone who was involved in purchase events as a purchaser", "type": "Bigquery"}
{"instance_id": "ga003", "instruction": "I'm trying to evaluate which board types were most effective on September 15, 2018. Can you find out the average scores for each board type from the quick play level completions on that day?.", "type": "Bigquery"}
{"instance_id": "bq127", "instruction": "For each publication family whose earliest publication was first published in January 2015, please provide the earliest publication date, the distinct publication numbers, their country codes, the distinct CPC and IPC codes, distinct families (namely, the ids) that cite and are cited by this publication family. Please present all lists as comma-separated values, sorted by the first letter of the code for clarity.", "type": "Bigquery"}
{"instance_id": "bq349", "instruction": "Which OpenStreetMap ID from the planet features corresponds to the administrative boundary, represented as multipolygons, whose total number of 'amenity'-tagged Points of Interest (POIs) is closest to the median count among all such boundaries?", "type": "Bigquery"}
{"instance_id": "bq376", "instruction": "For each neighborhood in San Francisco, list the number of bike share stations and the total number of crime incidents.", "type": "Bigquery"}
{"instance_id": "bq347", "instruction": "Which modality has the highest count of SOP instances, including MR series with SeriesInstanceUID = \"1.3.6.1.4.1.14519.5.2.1.3671.4754.105976129314091491952445656147\" and all associated segmentation data, along with the total count of instances?", "type": "Bigquery"}
{"instance_id": "bq172", "instruction": "For the drug with the highest total number of prescriptions in New York State during 2014, could you list the top five states with the highest total claim counts for this drug? Please also include their total claim counts and total drug costs. ", "type": "Bigquery"}
{"instance_id": "bq126", "instruction": "What are the titles, artist names, mediums, and original image URLs of objects with 'Photograph' in their names from the 'Photographs' department, created not by an unknown artist, with an object end date of 1839 or earlier?", "type": "Bigquery"}
{"instance_id": "bq119", "instruction": "Please show information of the hurricane with the third longest total travel distance in the North Atlantic during 2020, including its travel coordinates, the cumulative travel distance at each point, and the maximum sustained wind speed at those times.", "type": "Bigquery"}
{"instance_id": "bq121", "instruction": "How do the average reputation and number of badges vary among Stack Overflow users based on the number of complete years they have been members, considering only those who joined on or before October 1, 2021?", "type": "Bigquery"}
{"instance_id": "ga002", "instruction": "Tell me the most purchased other products and their quantities by customers who bought the Google Red Speckled Tee each month for the three months starting from November 2020.", "type": "Bigquery"}
{"instance_id": "bq128", "instruction": "Tell me the patent title and abstract, as well as the publication date, the backward citation and forward citation count within 5 years for those published in January 2014. The detailed requirements are provided in `forward_backward_citation.md`.", "type": "Bigquery"}
{"instance_id": "bq250", "instruction": "What is the total population living on the geography grid which is the farthest from any hospital in Singapore, based on the most recent population data before 2023? Note that geographic grids and distances are calculated based on geospatial data and GIS related functions.", "type": "Bigquery"}
{"instance_id": "bq268", "instruction": "Identify the longest number of days between the first visit and the last recorded event (either the last visit or the first transaction) for a user where the last recorded event was associated with a mobile device.", "type": "Bigquery"}
{"instance_id": "bq091", "instruction": "In which year did the assignee with the most applications in the patent category 'A61' file the most?", "type": "Bigquery"}
{"instance_id": "bq065", "instruction": "Provide the most recent 10 results of symbols and their corresponding rates, adjusted for the multiplier, from oracle requests with the script ID 3.", "type": "Bigquery"}
{"instance_id": "bq295", "instruction": "Among the repositories from the GitHub Archive which include a Python file with less than 15,000 bytes in size and a keyword 'def' in the content, find the top 3 that have the highest number of watch events in 2017?", "type": "Bigquery"}
{"instance_id": "bq430", "instruction": "Find pairs of different molecules tested in the same assay and standard type, where both have 10\u201315 heavy atoms, fewer than 5 activities in that assay, fewer than 2 duplicate activities, non-null standard values, and pChEMBL values over 10. For each pair, report the maximum heavy atom count, the latest publication date (calculated based on the document's rank within the same journal and year, and map it to a synthetic month and day), the highest document ID, classify the change in standard values as 'increase', 'decrease', or 'no-change' based on their values and relations, and generate UUIDs from their activity IDs and canonical SMILES.", "type": "Bigquery"}
{"instance_id": "bq235", "instruction": "Can you tell me which healthcare provider incurs the highest combined average costs for both outpatient and inpatient services in 2014?", "type": "Bigquery"}
{"instance_id": "bq203", "instruction": "What percentage of subway stations in each New York borough have at least one ADA-compliant entrance?", "type": "Bigquery"}
{"instance_id": "bq031", "instruction": "Show me the daily weather data (temperature, precipitation, and wind speed) in Rochester for the first season of year 2019, converted to Celsius, centimeters, and meters per second, respectively. Also, include the moving averages (window size = 8) and the differences between the moving averages for up to 8 days prior (all values rounded to one decimal place, sorted by date in ascending order, and records starting from 2019-01-09).", "type": "Bigquery"}
{"instance_id": "bq455", "instruction": "Find the top 5 CT scan series ID, including their series number, patient ID, and series size (in MiB), where the series are not classified as 'LOCALIZER' or have the specific JPEG compressed transfer syntaxes '1.2.840.10008.1.2.4.70' or '1.2.840.10008.1.2.4.51'. The series must have consistent slice intervals, exposure levels, image orientation, pixel spacing, image positions, and pixel dimensions. Additionally, the z-axis of the image orientation must align with the expected plane (dot product between 0.99 and 1.01).", "type": "Bigquery"}
{"instance_id": "bq452", "instruction": "Identify variants on chromosome 12, calculate their chi-squared scores using allele counts in cases and controls, and return the start, end, chi-squared score (after Yates's correction for continuity) of top variants where the chi-squared score is no less than 29.71679, ensuring that each group has expected counts of at least 5 for the chi-squared calculation.", "type": "Bigquery"}
{"instance_id": "bq099", "instruction": "For patent class A01B3, I want to analyze the information of the top 3 assignees based on the total number of applications. Please provide the following five pieces of information: the name of this assignee, total number of applications, the year with the most applications, the number of applications in that year, and the country code with the most applications during that year.", "type": "Bigquery"}
{"instance_id": "bq260", "instruction": "Find the total number of youngest and oldest users separately for each gender in the e-commerce platform created from January 1, 2019, to April 30, 2022.", "type": "Bigquery"}
{"instance_id": "bq052", "instruction": "I wonder which patents within CPC subsection 'C05' or group 'A01G' in the USA have at least one forward or backward citations within one month of their application dates. Give me the ids, titles, application date, forward/backward citation counts and summary texts.", "type": "Bigquery"}
{"instance_id": "bq294", "instruction": "Can you provide the details of the top 5 longest bike share trips that started during the second half of 2017, including the trip ID, duration in seconds, start date, start station name, route (start station to end station), bike number, subscriber type, member's birth year, age, age classification, gender, and the region name of the start station? Please exclude trips where the start station name, member's birth year, or member's gender is not specified.", "type": "Bigquery"}
{"instance_id": "bq037", "instruction": "About the refined human genetic variations collected in phase 3 on 2015-02-20, I want to know the minimum and maximum start positions as well as the proportions of these two respectively for reference bases 'AT' and 'TA'.", "type": "Bigquery"}
{"instance_id": "bq008", "instruction": "What's the most common next page for visitors who were part of \"Data Share\" campaign and after they accessed the page starting with '/home' in January 2017. And what's the maximum duration time (in seconds) when they visit the corresponding home page?", "type": "Bigquery"}
{"instance_id": "bq233", "instruction": "Can you find the imported Python modules and R libraries from the GitHub sample files and list them along with their occurrence counts? Please sort the results by language and then by the number of occurrences in descending order.", "type": "Bigquery"}
{"instance_id": "bq001", "instruction": "I wonder how many days between the first transaction and the first visit both in Feburary 2017 for each transacting visitor, along with the device used in the transaction.", "type": "Bigquery"}
{"instance_id": "bq421", "instruction": "Can you list all unique pairs of embedding medium and staining substance code meanings, along with the number of occurrences for each pair, based on distinct embedding medium and staining substance codes from the 'SM' modality in the DICOM dataset's un-nested specimen preparation sequences, ensuring that the codes are from the SCT coding scheme?", "type": "Bigquery"}
{"instance_id": "bq248", "instruction": "What is the proportion of files whose paths include 'readme.md' that contain the phrase 'Copyright (c)', among all repositories that do not use any programming language with 'python' in its name", "type": "Bigquery"}
{"instance_id": "bq284", "instruction": "Can you provide a breakdown of the total number of articles into different categories and the percentage of those articles that mention \"education\" within each category from the BBC News?", "type": "Bigquery"}
{"instance_id": "bq270", "instruction": "What were the monthly add-to-cart and purchase conversion rates, calculated as a percentage of pageviews on product details, from January to March 2017?", "type": "Bigquery"}
{"instance_id": "bq419", "instruction": "Which 5 states had the most storm events from 1980 to 1995, considering only the top 1000 states with the highest event counts each year? Please use state abbreviations.", "type": "Bigquery"}
{"instance_id": "bq246", "instruction": "Can you figure out the number of forward citations within 1 years from the application date for the patent that has the most backward citations within 1 years from application among all U.S. patents?", "type": "Bigquery"}
{"instance_id": "bq279", "instruction": "Can you provide the number of distinct active and closed bike share stations for each year 2013 and 2014 in a chronological view?", "type": "Bigquery"}
{"instance_id": "bq444", "instruction": "Can you pull the blockchain timestamp, block number, and transaction hash for the first five mint and burn events from Ethereum logs for the address '0x8ad599c3a0ff1de082011efddc58f1908eb6e6d8'? Please include mint events identified by the topic '0x7a53080ba414158be7ec69b987b5fb7d07dee101fe85488f0853ae16239d0bde' and burn events by '0x0c396cd989a39f4459b5fa1aed6a9a8dcdbc45908acfd67e028cd568da98982c', and order them by block timestamp from the oldest to the newest.", "type": "Bigquery"}
{"instance_id": "bq224", "instruction": "Which repository with an approved license in `licenses.md` had the highest combined total of forks, issues, and watches in April 2022?", "type": "Bigquery"}
{"instance_id": "bq011", "instruction": "How many pseudo users were active in the last 7 days but inactive in the last 2 days as of January 7, 2021?", "type": "Bigquery"}
{"instance_id": "bq223", "instruction": "Which assignees, excluding DENSO CORP itself, have cited patents assigned to DENSO CORP, and what are the titles of the primary CPC subclasses associated with these citations? Provide the name of each citing assignee, the full title of the CPC subclass, and the count of citations grouped by the assignee and the CPC subclass title. Please focus specifically on the main categories of the CPC codes,", "type": "Bigquery"}
{"instance_id": "bq072", "instruction": "Please tell me the total and Black deaths due to vehicle-related incidents and firearms separately, for each age from 12 to 18.", "type": "Bigquery"}
{"instance_id": "bq429", "instruction": "What are the top 5 states with the highest average median income difference from 2015 to 2018? also provide the average number of vulnerable employees across various industries for these states, using data from the ACS 5-Year Estimates for 2017.", "type": "Bigquery"}
{"instance_id": "bq081", "instruction": "Find the latest ride data for each region between 2014 and 2017. I want to know the name of each region, the trip ID of this ride, the ride duration, the start time, the starting station, and the gender of the rider.", "type": "Bigquery"}
{"instance_id": "bq278", "instruction": "Can you provide a detailed comparison of the solar potential for each state, distinguishing between postal code and census tract levels? Include the number of buildings available for solar installations, the percentage covered by Project Sunroof, the percentage suitable for solar, total potential panel count, total kilowatt capacity, energy generation potential, carbon dioxide offset, and the gap in potential installations.", "type": "Bigquery"}
{"instance_id": "bq271", "instruction": "Could you generate a report that, for each month in 2021, provides the number of orders, number of unique purchasers, and profit (calculated as total product retail price minus total cost) grouped by country, product department, and product category?", "type": "Bigquery"}
{"instance_id": "bq043", "instruction": "What are the RNA expression levels of the genes MDM2, TP53, CDKN1A, and CCNE1, along with associated clinical information, in bladder cancer patients with CDKN2A mutations in the 'TCGA-BLCA' project? Use clinical data from the Genomic Data Commons Release 39, data about somatic mutations derived from the hg19 human genome reference in Feb 2017.", "type": "Bigquery"}
{"instance_id": "bq285", "instruction": "Could you provide me with the zip code of the location that has the highest number of bank institutions in Florida?", "type": "Bigquery"}
{"instance_id": "bq282", "instruction": "Can you tell me the numeric value of the active council district in Austin which has the highest number of bike trips that start and end within the same district, but not at the same station?", "type": "Bigquery"}
{"instance_id": "bq420", "instruction": "Can you identify the top 5 patents that were initially rejected under section 101 with no allowed claims, based on the length of their granted claims? The patents should have been granted in the US between 2010 and 2023. Additionally, ensure to select the first office action date for each application.", "type": "Bigquery"}
{"instance_id": "bq276", "instruction": "Can you provide me with the details of all ports affected by tropical storms in region number 6585, including the port name, storm names, and average storm categories? Please consider only named storms in the North Atlantic basin with wind speeds of at least 35 knots and at least minimal tropical storm strength on the SSHS scale. Additionally, ensure that each port is located within a U.S. state boundary.", "type": "Bigquery"}
{"instance_id": "bq222", "instruction": "Find the CPC technology areas in Germany with the highest exponential moving average of patent filings each year (smoothing factor 0.1) for patents granted in December 2016. Show me the full title, CPC group and the best year for each CPC group at level 4.", "type": "Bigquery"}
{"instance_id": "bq010", "instruction": "Find the top-selling product among customers who bought 'Youtube Men\u2019s Vintage Henley' in July 2017, excluding itself.", "type": "Bigquery"}
{"instance_id": "bq028", "instruction": "Considering only the latest release versions of NPM package, which packages are the top 8 most popular based on the Github star number, as well as their versions?", "type": "Bigquery"}
{"instance_id": "bq017", "instruction": "What are the five longest types of highways within the multipolygon boundary of Denmark (as defined by Wikidata ID 'Q35') by total length?", "type": "Bigquery"}
{"instance_id": "bq445", "instruction": "Find the start and end positions of the BRCA1 gene, and retrieve the first missense variants based on their protein positions within this region. The variants must have a consequence type of \"missense_variant\". Using data from the gnomAD v2.1.1 version.", "type": "Bigquery"}
{"instance_id": "bq213", "instruction": "What is the most common 4-digit IPC code among US B2 utility patents granted from June to August in 2022?", "type": "Bigquery"}
{"instance_id": "bq442", "instruction": "Please collect the information of the top 6 trade report with the highest closing prices. Refer to the document for all the information I want.", "type": "Bigquery"}
{"instance_id": "bq392", "instruction": "What are the top 3 dates in October 2009 with the highest average temperature for station number 723758, in the format YYYY-MM-DD?", "type": "Bigquery"}
{"instance_id": "bq359", "instruction": "List the repository names and commit counts for the top two GitHub repositories with JavaScript as the primary language and the highest number of commits.", "type": "Bigquery"}
{"instance_id": "bq395", "instruction": "Which 5 states' percentage change in unsheltered homeless individuals from 2015 to 2018 were top 5 closest to the national average? Please provide the state abbreviation.", "type": "Bigquery"}
{"instance_id": "bq153", "instruction": "Calculate the average log10(normalized_count + 1) expression level of the IGF2 gene for each histology type among LGG patients. Include only patients with valid IGF2 expression data and histology types not enclosed in square brackets. Match gene expression and clinical data using ParticipantBarcode.", "type": "Bigquery"}
{"instance_id": "bq198", "instruction": "What are the top 5 most successful college basketball teams over the seasons from 1900 to 2000, based on the number of times they had the maximum wins in a season?", "type": "Bigquery"}
{"instance_id": "bq350", "instruction": "For the detailed molecule data, Please display the drug id, drug type and withdrawal status for approved drugs with a black box warning and known drug type among 'Keytruda', 'Vioxx', 'Premarin', and 'Humira'", "type": "Bigquery"}
{"instance_id": "bq109", "instruction": "Find the average, variance, max-min difference, and the QTL source(right study) of the maximum log2(h4/h3) for data where right gene id is \"ENSG00000169174\", h4 > 0.8, h3 < 0.02, reported trait includes \"lesterol levels\", right biological feature is \"IPSC\", and the variant is '1_55029009_C_T'.", "type": "Bigquery"}
{"instance_id": "bq303", "instruction": "What are the user IDs and tags for comments, answers, and questions posted by users with IDs between 16712208 and 18712208 on Stack Overflow during July to December 2019?", "type": "Bigquery"}
{"instance_id": "ga012", "instruction": "Find the transaction IDs, total item quantities, and purchase revenues for the item category with the highest tax rate on November 30, 2020.", "type": "Bigquery"}
{"instance_id": "bq100", "instruction": "Find out the most frequently used package in all Go source files.", "type": "Bigquery"}
{"instance_id": "bq155", "instruction": "Help me calculate the t-statistic based on the Pearson correlation coefficient between all possible pairs of gene `SNORA31` in the RNAseq data (Log10 transformation) and unique identifiers in the microRNA data available in TCGA. The cohort for this analysis consists of BRCA patients that are 80 years old or younger at the time of diagnosis and Stage I,II,IIA as pathological state. And only consider samples of size more than 25 and with absolute Pearson correlation at least 0.3, and less than 1.0.", "type": "Bigquery"}
{"instance_id": "bq358", "instruction": "Can you tell me which bike trip in New York City on July 15, 2015, started and ended in ZIP Code areas with the highest average temperature for that day, as recorded by the Central Park weather station '94728'? If there's more than one trip that meets these criteria, I'd like to know about the one that starts in the smallest ZIP Code and ends in the largest ZIP Code.", "type": "Bigquery"}
{"instance_id": "bq334", "instruction": "In my Bitcoin database, there are discrepancies in transaction records. Can you determine the annual differences in average output values calculated from separate input and output records versus a consolidated transactions table, focusing only on the years common to both calculation methods?", "type": "Bigquery"}
{"instance_id": "bq130", "instruction": "Analyze daily new COVID-19 case counts from March to May 2020, identifying the top five states by daily increases. Please compile a ranking based on how often each state appears in these daily top fives. Then, examine the state that ranks fourth overall and identify its top five counties based on their frequency of appearing in the daily top five new case counts.", "type": "Bigquery"}
{"instance_id": "bq108", "instruction": "Calculate the percentage of traffic accidents in 2015 from January to August that involved multiple people and had multiple instances of severe injuries (injury severity 4).", "type": "Bigquery"}
{"instance_id": "ga022", "instruction": "Could you please help me get the weekly customer retention rate in September 2018 for new customers who first used our app within the first week starting from September 1st, 2018 (timezone in Shanghai)? The retention rates should cover the following 3-week period after the initial use and display them in column format.", "type": "Bigquery"}
{"instance_id": "bq115", "instruction": "Which country has the highest percentage of population under the age of 25 in 2017?", "type": "Bigquery"}
{"instance_id": "bq320", "instruction": "What is the total count of StudyInstanceUIDs that have a segmented property type of '15825003' and belong to the 'Community' or 'nsclc_radiomics' collections?", "type": "Bigquery"}
{"instance_id": "bq112", "instruction": "Did the increase on average annual wages for all industries in Allegheny County, Pittsburgh keep pace with inflation of all consumer items between 1998 and 2017? Tell me their growth rates respectively (2 decimals).", "type": "Bigquery"}
{"instance_id": "bq345", "instruction": "How large are the DICOM image files with SEG or RTSTRUCT modalities and the SOP Class UID \"1.2.840.10008.5.1.4.1.1.66.4\", when grouped by collection, study, and series IDs, if they have no references to other series, images, or sources? Can you also provide a viewer URL formatted as \"https://viewer.imaging.datacommons.cancer.gov/viewer/\" followed by the study ID, and list these sizes in kilobytes, sorted from largest to smallest?", "type": "Bigquery"}
{"instance_id": "bq374", "instruction": "Calculates the percentage of new users who, between August 1, 2016, and April 30, 2017, both stayed on the site for more than 5 minutes during their initial visit and made a purchase on a subsequent visit at any later time, relative to the total number of new users in the same period.", "type": "Bigquery"}
{"instance_id": "bq310", "instruction": "What is the title of the most viewed \"how\" question related to Android development on StackOverflow, across specified tags such as 'android-layout', 'android-activity', 'android-intent', and others", "type": "Bigquery"}
{"instance_id": "ga008", "instruction": "Can you give me the average page views per buyer and total page views among those buyers for each day in November 2020?", "type": "Bigquery"}
{"instance_id": "bq328", "instruction": "Which region has the highest median GDP (constant 2015 US$) value?", "type": "Bigquery"}
{"instance_id": "bq321", "instruction": "How many unique StudyInstanceUIDs are there from the DWI, T2 Weighted Axial, Apparent Diffusion Coefficient series, and T2 Weighted Axial Segmentations in the 'qin_prostate_repeatability' collection?", "type": "Bigquery"}
{"instance_id": "bq114", "instruction": "What are the top three cities where the difference between the PM2.5 measurements in 1990 from the EPA and in 2020 from OpenAQ is the greatest, given that the locations are matched with latitude and longitude rounded to two decimal places?", "type": "Bigquery"}
{"instance_id": "ga001", "instruction": "I want to know the preferences of customers who purchased the Google Navy Speckled Tee in December 2020. What other product was purchased with the highest total quantity alongside this item?", "type": "Bigquery"}
{"instance_id": "bq185", "instruction": "What is the average valid trip duration (in minutes) for yellow taxi rides in Brooklyn with more than 3 passengers and a trip distance of at least 10 miles between February 1 and February 7, 2016?", "type": "Bigquery"}
{"instance_id": "bq176", "instruction": "Identify the case barcodes from the TCGA-LAML study with the highest weighted average copy number in cytoband 15q11 on chromosome 15, using segment data and cytoband overlaps from TCGA's genomic and Mitelman databases.", "type": "Bigquery"}
{"instance_id": "bq182", "instruction": "Which primary programming languages, determined by the highest number of bytes in each repository, have the sum of over 100 pull requests on January 18, 2023 in all its repositories?", "type": "Bigquery"}
{"instance_id": "bq236", "instruction": "What are the top 5 zip codes of the areas in the United States that have experienced the most hail storm events in the past 10 years?", "type": "Bigquery"}
{"instance_id": "bq209", "instruction": "Can you find how many utility patents granted in 2010 have exactly one forward citation within the ten years following their application date?", "type": "Bigquery"}
{"instance_id": "bq458", "instruction": "Please help me calculate normalized document vectors for each article by tokenizing the body text into words, obtaining word vectors, and weighting these vectors by the 0.4th root of word frequency. Then, aggregate these vectors to form an article vector, and normalize them to unit length. Finally, retrieve the ID, date, title, and the computed article vector for each entry.", "type": "Bigquery"}
{"instance_id": "bq451", "instruction": "Extract genotype data for single nucleotide polymorphisms (SNPs) from chromosome X , ensuring that the start positions are not between 59999 and 2699519 nor between 154931042 and 155260559. Output the sample ID, counts of homozygous reference alleles, homozygous alternate alleles, heterozygous alternate alleles, the total number of callable sites, the total number of SNVs, the percentage of heterozygous alternate alleles among all SNVs, and the percentage of homozygous alternate alleles among all SNVs.", "type": "Bigquery"}
{"instance_id": "bq035", "instruction": "What is the total distance traveled by each bike in the San Francisco Bikeshare program? Use data from bikeshare trips and stations to calculate this.", "type": "Bigquery"}
{"instance_id": "bq032", "instruction": "Can you provide the latitude of the final coordinates for the hurricane that traveled the second longest distance in the North Atlantic during 2020?", "type": "Bigquery"}
{"instance_id": "bq200", "instruction": "Show the full name of the fastest pitcher on each team with their maximum valid pitch speed, using both regular and post-season data", "type": "Bigquery"}
{"instance_id": "bq254", "instruction": "Can you find the names of the multipolygons with valid ids that rank in the top two in terms of the number of points within their boundaries, among those multipolygons that do not have a Wikidata tag but are located within the same geographic area as the multipolygon associated with Wikidata item Q191?", "type": "Bigquery"}
{"instance_id": "bq066", "instruction": "Could you assess the relationship between the poverty rates from the previous year's census data and the percentage of births without maternal morbidity for the years 2016 to 2018? Use only data for births where no maternal morbidity was reported and for each year, use the 5-year census data from the year before to compute the Pearson correlation coefficient", "type": "Bigquery"}
{"instance_id": "bq061", "instruction": "Which census tract has witnessed the largest increase in median income between 2015 and 2018 in California? Tell me the tract code.", "type": "Bigquery"}
{"instance_id": "bq253", "instruction": "Find the name of the OpenStreetMap relation that encompasses the most features within the same geographic area as the multipolygon tagged with Wikidata item 'Q1095'. The relation should have a specified name without a 'wikidata' tag, and at least one of its included features must have a 'wikidata' tag. Return the name of this relation", "type": "Bigquery"}
{"instance_id": "bq068", "instruction": "What are the maximum and minimum balances across all addresses for different address types on Bitcoin Cash during March 2014?", "type": "Bigquery"}
{"instance_id": "bq057", "instruction": "Which month (e.g., 3) in 2021 witnessed the highest percent of Bitcoin volume that took place in CoinJoin transactions? Also give me the percentage of CoinJoins transactions, the average input and output UTXOs ratio, and the proportion of CoinJoin transaction volume for that month (all 1 decimal).", "type": "Bigquery"}
{"instance_id": "bq265", "instruction": "Can you provide me with the emails of the top 10 users who have the highest average order value, considering only those users who registered in 2019 and made purchases within the same year?", "type": "Bigquery"}
{"instance_id": "bq291", "instruction": "Can you provide a daily weather summary for July 2019 within a 5 km radius of latitude 26.75 and longitude 51.5? I need the maximum, minimum, and average temperatures; total precipitation; average cloud cover between 10 AM and 5 PM; total snowfall (when average temperature is below 32\u00b0F); and total rainfall (when average temperature is 32\u00b0F or above) for each forecast date. The data should correspond to forecasts created in July 2019 for the following day.", "type": "Bigquery"}
{"instance_id": "bq050", "instruction": "Help me look at the total number of bike trips, average trip duration (in minutes), average daily temperature, wind speed, and precipitation when trip starts (rounded to 1 decimal), as well as the month with the most trips (e.g., `4`), categorized by different starting and ending neighborhoods in New York City for the year 2014.", "type": "Bigquery"}
{"instance_id": "bq033", "instruction": "How many U.S. publications related to IoT (where the abstract includes the phrase 'internet of things') were filed each month from 2008 to 2022, including months with no filings?", "type": "Bigquery"}
{"instance_id": "bq457", "instruction": "Get details of repositories that use specific feature toggle libraries. For each repository, include the full name with owner, hosting platform type, size in bytes, primary programming language, fork source name (if any), last update timestamp, the artifact and library names of the feature toggle used, and the library's programming languages. Include repositories that depend on the specified feature toggle libraries, defined by their artifact names, library names, platforms, and languages.", "type": "Bigquery"}
{"instance_id": "bq034", "instruction": "I want to know the IDs, names of weather stations within a 50 km straight-line distance from the center of Chicago (41.8319\u00b0N, 87.6847\u00b0W)", "type": "Bigquery"}
{"instance_id": "bq208", "instruction": "Can you provide weather stations within a 20-mile radius of Chappaqua, New York (Latitude: 41.197, Longitude: -73.764), and tell me the number of valid temperature observations they have recorded from 2011 to 2020?", "type": "Bigquery"}
{"instance_id": "bq263", "instruction": "Produce a 2023 monthly report for the 'Sleep & Lounge' category detailing total sales, costs, completed order counts, profits, and profit margins, ensuring accurate cost alignment with sales data.", "type": "Bigquery"}
{"instance_id": "bq264", "instruction": "Identify the difference in the number of the oldest and youngest users registered between January 1, 2019, and April 30, 2022, from our e-commerce platform data.", "type": "Bigquery"}
{"instance_id": "bq056", "instruction": "How many different pairs of roads classified as motorway, trunk, primary, secondary, or residential in California overlap each other without sharing nodes and do not have a bridge tag, where these roads are tagged with 'highway'", "type": "Bigquery"}
{"instance_id": "bq432", "instruction": "Could you provide me with the cleansed data of food events in January 2015 as listed in the cleansing documentation?", "type": "Bigquery"}
{"instance_id": "bq252", "instruction": "Could you please find the name of the repository that contains the most copied non-binary Swift file in the dataset, ensuring each file is uniquely identified by its ID?", "type": "Bigquery"}
{"instance_id": "bq060", "instruction": "Which top 3 countries had the highest net migration in 2017 among those with an area greater than 500 square kilometers? And what are their migration rates?", "type": "Bigquery"}
{"instance_id": "bq255", "instruction": "How many commit messages are there in repositories that use the 'Shell' programming language and 'apache-2.0' license, where the length of the commit message is more than 5 characters but less than 10,000 characters, and the messages do not start with the word 'merge', 'update' or 'test'?", "type": "Bigquery"}
{"instance_id": "bq093", "instruction": "Tell me the maximum and minimum net changes in balances for Ethereum Classic addresses on October 14, 2016, considering debits, credits, and gas fees, while excluding internal calls like 'delegatecall', 'callcode', and 'staticcall'.", "type": "Bigquery"}
{"instance_id": "sf001", "instruction": "Assuming today is April 1, 2024, I would like to know the daily snowfall amounts greater than 6 inches for each U.S. postal code during the week ending after the first two full weeks of the previous year. Show the postal code, date, and snowfall amount.", "type": "Snowflake"}
{"instance_id": "sf012", "instruction": "What were the total amounts of building and contents damage reported under the National Flood Insurance Program in the City of New York for each year from 2010 to 2019?", "type": "Snowflake"}
{"instance_id": "sf040", "instruction": "Find the top 10 northernmost addresses in Florida's largest zip code area. What are their address numbers, street names, and types?", "type": "Snowflake"}
{"instance_id": "sf014", "instruction": "What is the New York State ZIP code with the highest number of commuters traveling over one hour, according to 2021 ACS data? Include the zip code, the total commuters, state benchmark for this duration, and state population.", "type": "Snowflake"}
{"instance_id": "sf002", "instruction": "As of December 31, 2022, list the top 10 active large banks, each with assets over $10 billion, that have the highest percentage of uninsured assets based on quarterly estimates. Provide the names of these banks and their respective percentages of uninsured assets.", "type": "Snowflake"}
{"instance_id": "sf018", "instruction": "Examine user engagement with push notifications within a specified one-hour window on June 1, 2023.", "type": "Snowflake"}
{"instance_id": "sf011", "instruction": "Determine the population distribution within each block group relative to its census tract in New York State using 2021 ACS data. Include block group ID, census value, state county tract ID, total tract population, and the population ratio of each block group.", "type": "Snowflake"}
{"instance_id": "sf044", "instruction": "What was the percentage change in post-market close prices for the Magnificent 7 tech companies from January 1 to June 30, 2024?", "type": "Snowflake"}
{"instance_id": "local015", "instruction": "Help me respectively caulculate the percentage of motorcycle accident fatalities involving riders who were wearing helmets and those who weren't?", "type": "Local"}
{"instance_id": "local218", "instruction": "Can you calculate the median from the highest season goals of each team?", "type": "Local"}
{"instance_id": "local274", "instruction": "Which products were picked for order 421, and what is the average number of units picked for each product, using FIFO (First-In, First-Out) method?", "type": "Local"}
{"instance_id": "local041", "instruction": "What percentage of trees in the Bronx have a health status of Good?", "type": "Local"}
{"instance_id": "local273", "instruction": "What is the average pick percentage for each product (by name), considering the quantity picked from inventory locations that are ordered by the earliest purchase date and smallest quantity, while ensuring that the picked quantity matches the overlapping range between the order quantity and the available inventory?", "type": "Local"}
{"instance_id": "local022", "instruction": "Show me the names of strikers who scored no less than 100 runs in a match, but their team lost the game?", "type": "Local"}
{"instance_id": "local210", "instruction": "Can you identify the hubs that saw more than a 20% increase in finished orders from February to March?", "type": "Local"}
{"instance_id": "local071", "instruction": "Could you review our records in June 2022 and identify which countries have the longest streak of consecutive inserted city dates? Please list the 2-letter length country codes of these countries.", "type": "Local"}
{"instance_id": "local049", "instruction": "Can you help me calculate the average number of new unicorn companies per year in the top industry from 2019 to 2021?", "type": "Local"}
{"instance_id": "local244", "instruction": "Calculate the duration of each track, classify them as short, medium, or long, output the minimum and maximum time for each kind (in minutes) and the total revenue for each category, group by the category.", "type": "Local"}
{"instance_id": "local300", "instruction": "Could you calculate the highest daily balance each customer had within each month? Treat any negative daily balances as zero. Then, for each month, add up these maximum daily balances across all customers to get a monthly total.", "type": "Local"}
{"instance_id": "local132", "instruction": "Show entertainer and customer pairs where both the first and second style preferences of customers match the first and second strengths of entertainers (or vice versa), displaying only the entertainer's stage name and the customer's last name.", "type": "Local"}
{"instance_id": "local336", "instruction": "How many overtakes of each type occurred during the first five laps of the race?", "type": "Local"}
{"instance_id": "local309", "instruction": "For each year, which driver and which constructor scored the most points? I want the full name of each driver.", "type": "Local"}
{"instance_id": "local168", "instruction": "What is the average salary for remote Data Analyst jobs requiring the top three most in-demand skills?", "type": "Local"}
{"instance_id": "local157", "instruction": "For our upcoming meeting, please provide the daily percentage change in trading volume for all tickers from August 1 to August 10, 2021. This trend analysis is crucial for our strategic planning.", "type": "Local"}
{"instance_id": "local354", "instruction": "Which Formula 1 drivers, during the 1950s, had seasons in which they did not change their constructors at the beginning and end of the year and participated in at least two different race rounds within those seasons?", "type": "Local"}
{"instance_id": "local195", "instruction": "Please find out how widespread the appeal of our top five actors is. What percentage of our customers have rented films featuring these actors?", "type": "Local"}
{"instance_id": "local194", "instruction": "Please provide a list of the top three revenue-generating films for each actor, along with the average revenue per actor in those films, calculated by dividing the total film revenue equally among the actors for each film.", "type": "Local"}
{"instance_id": "local355", "instruction": "Calculate the average first and last rounds of races missed by drivers each year. Only include drivers who missed fewer than three races annually and switched teams between their first and last missed races", "type": "Local"}
{"instance_id": "local156", "instruction": "Can you analyze the yearly average cost of Bitcoin purchases by region, excluding the first year's data? Rank the regions based on these averages each year and calculate the annual percentage change in cost.", "type": "Local"}
{"instance_id": "local065", "instruction": "Calculate the total income from Meat Lovers pizzas priced at $12 and Vegetarian pizzas at $10. Include any extra toppings charged at $1 each. Ensure that canceled orders are filtered out. How much money has Pizza Runner earned in total?", "type": "Local"}
{"instance_id": "local054", "instruction": "Could you tell me the first names of customers who spent less than $1 on albums by the best-selling artist, along with the amounts they spent?", "type": "Local"}
{"instance_id": "local259", "instruction": "For each player, list their ID, name, most frequent role across all matches, batting hand, bowling skill, total runs scored, total matches played, total dismissals, batting average, highest score in a single match, number of matches where their score exceeded 30, 50, and 100, total balls faced in their career, strike rate, total wickets taken, economy rate, and their best performance in a single match (most wickets taken, in the format \"wickets taken-runs given\"). Ignore the extra runs data.", "type": "Local"}
{"instance_id": "local038", "instruction": "Could you help me find the actor who appeared most in English G or PG-rated children's movies no longer than 2 hours, released between 2000 and 2010\uff1fGive me a full name.", "type": "Local"}
{"instance_id": "local009", "instruction": "What is the distance of the longest route where Abakan is either the departure or destination city (in kilometers)?", "type": "Local"}
{"instance_id": "local031", "instruction": "What is the highest monthly delivered orders volume in the year with the lowest annual delivered orders volume among 2016, 2017, and 2018?", "type": "Local"}
{"instance_id": "local099", "instruction": "I need you to look into the actor collaborations and tell me how many actors have made more films with Yash Chopra than with any other director. This will help us understand his influence on the industry better.", "type": "Local"}
{"instance_id": "local063", "instruction": "Which product has the smallest change in sales share for each product from the top 20% of products by total sales between Q4 in 2019 and 2020 in US without any promotion?", "type": "Local"}
{"instance_id": "local064", "instruction": "What is the difference in average month-end balance between the month with the most and the month with the fewest customers having a positive balance in 2020?", "type": "Local"}
{"instance_id": "local269", "instruction": "What is the average total quantity across all final packaging combinations, considering all items contained within each combination?", "type": "Local"}
{"instance_id": "local030", "instruction": "Can you find the average payments and order counts for the five cities with the lowest total payments from delivered orders?", "type": "Local"}
{"instance_id": "local008", "instruction": "I would like to know the given names of baseball players who have achieved the highest value of games played, runs, hits, and home runs, with their corresponding score values.", "type": "Local"}
{"instance_id": "local039", "instruction": "Please help me find the film category with the highest total rental hours in cities where the city's name either starts with \"A\" or contains a hyphen. ", "type": "Local"}
{"instance_id": "local283", "instruction": "Analyze our match data to identify the name, leagues, and countries of the champion team for each season. Include the total points accumulated by each team.", "type": "Local"}
{"instance_id": "local073", "instruction": "Let's generate a report for each pizza order that lists the pizza name followed by \": \", then all the ingredients in alphabetical order. If any ingredient is ordered more than once, indicate it with '2x' directly in front of the ingredient without a space.", "type": "Local"}
{"instance_id": "local081", "instruction": "How many customers were in each spending group in 1998, and what percentage of the total customer base does each group represent?", "type": "Local"}
{"instance_id": "local075", "instruction": "Can you provide a breakdown of how many times each product was viewed, how many times they were added to the shopping cart, and how many times they were left in the cart without being purchased? Also, give me the count of actual purchases for each product. Ensure that products with a page id in (1, 2, 12, 13) are filtered out.", "type": "Local"}
{"instance_id": "local072", "instruction": "Identify the country with data inserted on nine different days in January 2022. Then, find the longest consecutive period with data insertions for this country during January 2022, and calculate the proportion of entries that are from its capital city within this longest consecutive insertion period.", "type": "Local"}
{"instance_id": "local285", "instruction": "For veg whsle data, can you analyze our financial performance over the years 2020 to 2023? I need insights into the average wholesale price, maximum wholesale price, minimum wholesale price, wholesale price difference, total wholesale price, total selling price, average loss rate, total loss, and profit for each category within each year. Round all calculated values to two decimal places.", "type": "Local"}
{"instance_id": "local017", "instruction": "In which year were the two most common causes of traffic accidents different from those in other years?", "type": "Local"}
{"instance_id": "local028", "instruction": "Could you generate a report that shows the number of delivered orders for each month in the years 2016, 2017, and 2018? Each column represents a year, and each row represents a month", "type": "Local"}
{"instance_id": "local010", "instruction": "Distribute all the unique city pairs into the distance ranges 0, 1000, 2000, 3000, 4000, 5000, and 6000+, based on their average distance of all routes between them. Then how many pairs are there in the distance range with the fewest unique city paires?", "type": "Local"}
{"instance_id": "local026", "instruction": "Please help me find the top 3 bowlers who conceded the maximum runs in a single over, along with the corresponding matches.", "type": "Local"}
{"instance_id": "local019", "instruction": "For the NXT title that had the shortest match (excluding titles with \"title change\"), what were the names of the two wrestlers involved?", "type": "Local"}
{"instance_id": "local131", "instruction": "Could you list each musical style with the number of times it appears as a 1st, 2nd, or 3rd preference in a single row per style?", "type": "Local"}
{"instance_id": "local199", "instruction": "Can you identify the year and month with the highest rental orders created by the store's staff for each store? Please list the store ID, the year, the month, and the total rentals for those dates.", "type": "Local"}
{"instance_id": "local152", "instruction": "Can you provide the top 9 directors by movie count, including their ID, name, number of movies, average inter-movie duration (rounded to the nearest integer), average rating (rounded to 2 decimals), total votes, minimum and maximum ratings, and total movie duration? Sort the output first by movie count in descending order and then by total movie duration in descending order.", "type": "Local"}
{"instance_id": "local360", "instruction": "Identify the sessions with the fewest events lacking both '/detail' clicks and '/complete' conversions, considering only non-empty search types. If multiple sessions share the lowest count, include all of them. For each session, display the associated paths and search types.", "type": "Local"}
{"instance_id": "local311", "instruction": "Which constructors had the top 3 combined points from their best driver and team, and in which years did they achieve them?", "type": "Local"}
{"instance_id": "local329", "instruction": "How many unique sessions visited the /regist/input page and then the /regist/confirm page, in that order?", "type": "Local"}
{"instance_id": "local141", "instruction": "How did each salesperson's annual total sales compare to their annual sales quota? Provide the difference between their total sales and the quota for each year, organized by salesperson and year.", "type": "Local"}
{"instance_id": "local003", "instruction": "According to the RFM definition document, how much is the average sales per order for each customer within distinct RFM segments, considering only 'delivered' orders? Please rank the customers into segments to analyze differences in average sales across these segments", "type": "Local"}
{"instance_id": "local209", "instruction": "What is the ratio of completed orders to total orders for the store with the highest number of orders?", "type": "Local"}
{"instance_id": "local004", "instruction": "Could you tell me the number of orders, average payment per order and customer lifespan in weeks of the 3 custumers with the highest average payment per order. Attention: I want the lifespan in float number if it's longer than one week, otherwise set it to be 1.0.", "type": "Local"}
{"instance_id": "local035", "instruction": "Please help me find two adjacent cities with the greatest distance between them.", "type": "Local"}
{"instance_id": "local059", "instruction": "For the calendar year 2021, what is the overall average quantity sold of the top three best-selling hardware products (by total quantity sold) in each division?", "type": "Local"}
{"instance_id": "local034", "instruction": "Could you help me calculate the average of the total payment count for the most preferred payment method in each product category?", "type": "Local"}
{"instance_id": "local002", "instruction": "Can you calculate the 5-day symmetric moving average of predicted toy sales for December 5 to 8, 2018, using daily sales data from January 1, 2017, to August 29, 2018, with a simple linear regression model? Provide the total of the moving averages for those four days.", "type": "Local"}
{"instance_id": "local056", "instruction": "Which customer has the highest average monthly change in payment amounts? Provide the customer's full name.", "type": "Local"}
{"instance_id": "local263", "instruction": "Which L1_model has the highest occurrence for each status ('strong,' where the maximum test score for non-'Stack' models is less than the 'Stack' score, and 'soft,' where it equals the 'Stack' score), and how many times does it occur?", "type": "Local"}
{"instance_id": "local067", "instruction": "Can you provide the highest and lowest profits for Italian customers segmented into ten evenly divided tiers based on their December 2021 sales profits?", "type": "Local"}
{"instance_id": "local058", "instruction": "Can you provide a list of hardware product segments along with their unique product counts for 2020 in the output, ordered by the highest percentage increase in unique fact sales products from 2020 to 2021?", "type": "Local"}
{"instance_id": "local299", "instruction": "Could you calculate each user\u2019s average balance over the past 30 days, computed daily? Then, for each month (based on the 1st of each month), find the highest of these daily averages for each user. Add up these maximum values across all users for each month as the final result. Please use the first month as a baseline for previous balances and exclude it from the output.", "type": "Local"}