-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
pg: colinfo, scan vals, and QueryScan funcs
This adds the ability to scan query results into provided variables instead of relying on pgx Row.Values() to choose the type. This provides some foundational components for table statistics collection. The sql.QueryScanner interface is the advanced version of Execute that uses caller-provided scan values and a function to run for each scanned row: // QueryScanner represents a type that provides the ability to execute an SQL // statement, where for each row: // // 1. result values are scanned into the variables in the scans slice // 2. the provided function is then called // // The function would typically capture the variables in the scans slice, // allowing it to operator on the values. For instance, append the values to // slices allocated by the caller, or perform reduction operations like // sum/mean/min/etc. // // NOTE: This method may end up being included in the Tx interface alongside // Executor since all of the concrete transaction implementations provided by // this package implement this method. type QueryScanner interface { QueryScanFn(ctx context.Context, stmt string, scans []any, fn func() error, args ...any) error } Each transaction type in the pg package satisfies the sql.QueryScanner interface. The pg.QueryRowFunc function executes an SQL statement, handling the rows and returned values as described by the sql.QueryScanner interface. The pg.QueryRowFuncAny is similar to pg.QueryRowFunc, except that no scan values slice is provided. The provided function is called for each row of the result. The caller does not determine the types of the Go variables in the values slice. In this way it behaves similar to Execute, but providing "for each row" semantics so that every row does not need to be loaded into memory. Table statistics collection: beginning with a simplified sql.Statistics struct based on the types proposed in the initial unmerged query cost branch, the pg package provides the following new methods aimed at the (relatively expensive) collection of ground truth table statistics: - RowCount provides an exact row count - colStats computes column-wise statistics - TableStats uses the above functions to build a *sql.Statistics for a table. These methods are will not be used routinely. We will have incremental updates, but there are cases where a full scan may be needed to obtain the ground truth statistics. pg: decimal and uint256 use pgNumericToDecimal helper Use the pgNumericToDecimal helper to reuse the logic to convert from pgtypes.Numeric to either our decimal.Decimal or types.Uint256 in the recent pgtype decoding added to the query helper for interpreting the values returned by row.Values() in pgx.CollectRows. types,decimal: sql scan/value for uint256 and decimal and arrays nulls with uint256 and decimal deps: update pgx module from 5.5.5 to 5.6.0
- Loading branch information
1 parent
a499074
commit 3aecf89
Showing
17 changed files
with
1,792 additions
and
145 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
package sql | ||
|
||
// NOTE: this file is TRANSITIONAL! These types are lifted from the | ||
// unmerged internal/engine/costs/datatypes package. | ||
|
||
import ( | ||
"fmt" | ||
"strings" | ||
) | ||
|
||
// Statistics contains statistics about a table or a Plan. A Statistics can be | ||
// derived directly from the underlying table, or derived from the statistics of | ||
// its children. | ||
type Statistics struct { | ||
RowCount int64 | ||
|
||
ColumnStatistics []ColumnStatistics | ||
|
||
//Selectivity, for plan statistics | ||
} | ||
|
||
func (s *Statistics) String() string { | ||
var st strings.Builder | ||
fmt.Fprintf(&st, "RowCount: %d", s.RowCount) | ||
if len(s.ColumnStatistics) > 0 { | ||
fmt.Fprintln(&st, "") | ||
} | ||
for i, cs := range s.ColumnStatistics { | ||
fmt.Fprintf(&st, " Column %d:\n", i) | ||
fmt.Fprintf(&st, " - Min/Max = %v / %v\n", cs.Min, cs.Max) | ||
fmt.Fprintf(&st, " - NULL count = %v\n", cs.NullCount) | ||
} | ||
return st.String() | ||
} | ||
|
||
type ValCount struct { | ||
Val any | ||
Count int | ||
} | ||
|
||
// ColumnStatistics contains statistics about a column. | ||
type ColumnStatistics struct { | ||
NullCount int64 | ||
|
||
Min any | ||
MinCount int | ||
|
||
Max any | ||
MaxCount int | ||
|
||
// MCVs are the most common values. It should be sorted by the value. It | ||
// should also be limited capacity, which means scan order has to be | ||
// deterministic since we have to throw out same-frequency observations. | ||
// (crap) Solution: multi-pass scan, merge lists, continue until no higher | ||
// freq values observed? OR when capacity reached, use a histogram? Do not | ||
// throw away MCVs, just start putting additional observations in to the | ||
// histogram instead. | ||
// MCVs []ValCount | ||
// MCVs map[cmp.Ordered] | ||
|
||
// MCVals []any | ||
// MCFreqs []int | ||
|
||
// DistinctCount is harder. For example, unless we sub-sample | ||
// (deterministically), tracking distinct values could involve a data | ||
// structure with the same number of elements as rows in the table. | ||
DistinctCount int64 | ||
|
||
AvgSize int64 // maybe: length of text, length of array, otherwise not used for scalar? | ||
|
||
// without histogram, we can make uniformity assumption to simplify the cost model | ||
//Histogram []HistogramBucket | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.