-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding CombineInputFileFormat; only single use case so far #3
base: master
Are you sure you want to change the base?
Conversation
if (inputFile == null) { | ||
for (Path inputPath : inputPaths) { | ||
inputFile = CalvalusProductIO.copyFileToLocal(inputPath, getConfiguration()); | ||
setInputFile(inputFile); | ||
if (inputFile == null) { | ||
setInputFile(inputFile); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, it seems the logic has unintentionally changed. The test for null was located before the second assignment of the copyFileToLocal before, and will never be true now.
/** | ||
* @author thomas | ||
*/ | ||
public class CombineFileInputFormat extends InputFormat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a relation to org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat ?
* Creates a single split from a given pattern | ||
*/ | ||
@Override | ||
public List<InputSplit> getSplits(JobContext context) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about our other methods to determine inputs, in particular those using the geo-inventory? I know that PatternBasedInputFormat needs refactoring and decomposition but I think the other ways to determine inputs are required.
Thinking of how to refactor PatternBasedInputFormat it may be good to distinguish the way the inputs shall be determined (geo-inventory, opensearch query, path pattern, ...) by different classes as they have different parameters anyway, and whichever parameter is specified the client could automatically select the right class. Then, we could either derive a class for CombineFileSplit generation from each of them, or we make this a parameter. In any case, the old PatternBasedInputFormat could delegate the getSplits() call to the new implementations to keep backwards compatibility.
No description provided.