cassandra.range.batch.size Not Respected #143

spmallette · 2013-09-19T19:04:03Z

Not sure if this a Faunus issue of a Titan issue (or something else for that matter). I've been getting highly inconsistent edge counts with Faunus on a large graph as well as many timeout exceptions when writing a Titan+Cassandra graph to sequence file. Having read a some mailing list posts and other odds and ends I ended up adding this setting:

cassandra.range.batch.size=256

and all my timeout problems went away. however, when i do that, i get highly inconsistent edge counts. On this particular graph, the out edge count exceeds the in-edge count by nearly a half a billion.

When I remove that setting, I get timeoutexceptions:

java.lang.RuntimeException: TimedOutException()
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
    at com.thinkaurelius.faunus.formats.titan.cassandra.TitanCassandraRecordReader.getProgress(TitanCassandraRecordReader.java:70)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:513)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:538)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: TimedOutException()
    at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
    at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
    ... 17 more

Interestingly, the cassandra.range.batch.size, seems to behave more like a cap as opposed to a batch size as when i have it set to 256, the degree distribution shows a maximum of roughly that size. I've confirmed with regular Gremlin that there definitely are vertices with edge counts exceeding 256.

If it behaved as a batch size, then this would solve my timeout problems. anyway, i could be way off base on this...just reporting what i'm seeing as i'm feeling really stuck right now.

Interestingly, if i drop that setting the faunus job fails, but when it fails the discrepancy between in/out vertex counts is far less disparate at the point of failure.

Referencing this, as it seems to all demonstrate that cassandra.range.batch.size is acting like a cap:

#99

Using titan 0.3.1 btw.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cassandra.range.batch.size Not Respected #143

cassandra.range.batch.size Not Respected #143

spmallette commented Sep 19, 2013

cassandra.range.batch.size Not Respected #143

cassandra.range.batch.size Not Respected #143

Comments

spmallette commented Sep 19, 2013