You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this a Faunus issue of a Titan issue (or something else for that matter). I've been getting highly inconsistent edge counts with Faunus on a large graph as well as many timeout exceptions when writing a Titan+Cassandra graph to sequence file. Having read a some mailing list posts and other odds and ends I ended up adding this setting:
cassandra.range.batch.size=256
and all my timeout problems went away. however, when i do that, i get highly inconsistent edge counts. On this particular graph, the out edge count exceeds the in-edge count by nearly a half a billion.
When I remove that setting, I get timeoutexceptions:
java.lang.RuntimeException: TimedOutException()
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
at com.thinkaurelius.faunus.formats.titan.cassandra.TitanCassandraRecordReader.getProgress(TitanCassandraRecordReader.java:70)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:513)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:538)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: TimedOutException()
at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
... 17 more
Interestingly, the cassandra.range.batch.size, seems to behave more like a cap as opposed to a batch size as when i have it set to 256, the degree distribution shows a maximum of roughly that size. I've confirmed with regular Gremlin that there definitely are vertices with edge counts exceeding 256.
If it behaved as a batch size, then this would solve my timeout problems. anyway, i could be way off base on this...just reporting what i'm seeing as i'm feeling really stuck right now.
Interestingly, if i drop that setting the faunus job fails, but when it fails the discrepancy between in/out vertex counts is far less disparate at the point of failure.
Referencing this, as it seems to all demonstrate that cassandra.range.batch.size is acting like a cap:
Not sure if this a Faunus issue of a Titan issue (or something else for that matter). I've been getting highly inconsistent edge counts with Faunus on a large graph as well as many timeout exceptions when writing a Titan+Cassandra graph to sequence file. Having read a some mailing list posts and other odds and ends I ended up adding this setting:
cassandra.range.batch.size=256
and all my timeout problems went away. however, when i do that, i get highly inconsistent edge counts. On this particular graph, the out edge count exceeds the in-edge count by nearly a half a billion.
When I remove that setting, I get timeoutexceptions:
Interestingly, the cassandra.range.batch.size, seems to behave more like a cap as opposed to a batch size as when i have it set to 256, the degree distribution shows a maximum of roughly that size. I've confirmed with regular Gremlin that there definitely are vertices with edge counts exceeding 256.
If it behaved as a batch size, then this would solve my timeout problems. anyway, i could be way off base on this...just reporting what i'm seeing as i'm feeling really stuck right now.
Interestingly, if i drop that setting the faunus job fails, but when it fails the discrepancy between in/out vertex counts is far less disparate at the point of failure.
Referencing this, as it seems to all demonstrate that cassandra.range.batch.size is acting like a cap:
#99
Using titan 0.3.1 btw.
The text was updated successfully, but these errors were encountered: