Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case 10: Discussion #10

Open
brianthomas opened this issue Jan 13, 2015 · 10 comments
Open

Use Case 10: Discussion #10

brianthomas opened this issue Jan 13, 2015 · 10 comments
Assignees

Comments

@brianthomas
Copy link
Member

I've tried to extract requirements from this Use Case but I fear I may have missunderstood the intent. Currently I have extracted requirements 10 and 11 from this use case (they appear different requirements to me). Please review and feedback any needed changes/problems.

@migueldvb
Copy link
Member

I think that it is a good idea to separate these two requirements in use case 10. We can also say that being able to select part of a dataset is important to implement parallel I/O operations by accessing the data with independent processes. Perhaps this could be added in requirement 10 or in a new requirement that describes accessing a dataset in parallel.

@brianthomas
Copy link
Member Author

I've tacked in your parallel I/O wording to Use Case 10, but it seems a little bit like it was bolted on. Can you write a separate Use Case about parallel access I/O, perhaps around a large dataset scenario? I worry we are missing some important aspects of this functionality and capturing requirements for large datasets in general. Note i've also added a parallel I/O requirement too.

@brianthomas
Copy link
Member Author

Quick note that Requirement-12 : Parallel I/O Support seems to overlap Requirement 10: partial read of format. Not sure if these are really different or not. I've linked them in the wiki so that the issue is highlighted but opinions here on this matter would be good.

@telegraphic
Copy link

To add to the discussion, agreed that Parallel I/O is an important requirement.

While parallel I/O would be useful for a lot of things, here's a specific use case for parallel write in radio astronomy: a FX correlator breaks up the cross-correlation into frequency subbands over several compute nodes. To reconstruct the full spectrum each compute node needs to write each subband to a single file (or file-like object).

And for parallel read: a user wishes to image several subbands of a wide-bandwidth visibility dataset produced by a correlator. Data access should be parallelizable over both time and frequency, so that multiple parallel data reduction pipelines can be run at once on the same dataset.

@brianthomas
Copy link
Member Author

@telegraphic Thank you. Those are good details, do you think you could meld them into Use Case 10?

@migueldvb
Copy link
Member

I think that it makes sense to have a separate use case for parallel I/O because selecting part of a large dataset as described in Usecase 10 can have other important applications. The radio astronomy example for parallel data analysis is very good. I can write a usecase for distributed data access that will be related with the new Requirement 12, and please feel free to add more details specific to the radio astronomy case.

@telegraphic
Copy link

Usecase 17 is looking good!

@migueldvb
Copy link
Member

Thanks @telegraphic , could you add the example of parallel data access in radio astronomy to Use Case 17?

@telegraphic
Copy link

Added it in, feel free to edit as required

@migueldvb
Copy link
Member

Great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants