You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a large dataset (~19 billion rows). If I run VW on the data, using 8-10 columns (all but 1 are not numeric), the process completes in about 9 minutes, even with multiple quadratic terms (not shown) in the pass through args.
However, if I take the same data and hash the 8-10 columns so that the resulting feature has ~5.5million distinct values and run the above, it runs forever (I killed the process after 10 hours).
Is there anything to know in terms of running VW on Spark when a name space has a very large and potentially sparse cardinality?
The text was updated successfully, but these errors were encountered:
Spark: 3.2
synapse: com.microsoft.azure:synapseml_2.12:0.11.3
I have a large dataset (~19 billion rows). If I run VW on the data, using 8-10 columns (all but 1 are not numeric), the process completes in about 9 minutes, even with multiple quadratic terms (not shown) in the pass through args.
However, if I take the same data and hash the 8-10 columns so that the resulting feature has ~5.5million distinct values and run the above, it runs forever (I killed the process after 10 hours).
Is there anything to know in terms of running VW on Spark when a name space has a very large and potentially sparse cardinality?
The text was updated successfully, but these errors were encountered: