-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-28555 Add blocked reader for unfiltered serial index reading #17551
HPCC-28555 Add blocked reader for unfiltered serial index reading #17551
Conversation
@@ -360,6 +360,7 @@ class CRegistryServer : public CSimpleInterface | |||
msg.append(THOR_VERSION_MAJOR).append(THOR_VERSION_MINOR); | |||
processGroup->serialize(msg); | |||
globals->serialize(msg); | |||
getGlobalConfigSP()->serialize(msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this and related changes are so that workers have access to global config, and in particular the planes.
https://track.hpccsystems.com/browse/HPCC-28555 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit title suggests this is for serial index reads, but my reading of the code is that it will be used for random index reads too.
roxie/ccd/ccdfile.cpp
Outdated
current.setown(f->open(IFOread)); | ||
if (blockedIndexIOSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn't appear to be any code to use the plane settings in Roxie?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, mainly because I thought use case tends to be different in roxie, where unlikely to have some indexes on a fast/non-blob plane and others on a slow/blob plane.. - and therefore a per plane configuration setting would not be required as much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now added
system/jlib/jfile.cpp
Outdated
} | ||
|
||
static constexpr size32_t defaultBlockedIndexIOKB = 1024; | ||
size32_t getBlockedIndexIOSize(const char *planeName, int configUseBlockedIndexIO, size32_t configBlockedIndeIOSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling: configBlockedIndeIOSize should be configBlockedIndexIOSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will correct
system/jlib/jfile.cpp
Outdated
blockedSize = configBlockedIndeIOSize; | ||
if (0 == blockedSize) | ||
{ | ||
// could cache, but not sure it's worth it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like it might be worth it to cache, if Roxie ever called this - it's opening a lot of index files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caching added.
@@ -2046,9 +2047,15 @@ class CLazyFileIO : public CInterfaceOf<IFileIO> | |||
return iFileIO.getClear(); | |||
} | |||
public: | |||
CLazyFileIO(CFileCache &_cache, const char *_filename, const char *_id, IActivityReplicatedFile *_repFile, bool _compressed, IExpander *_expander, const StatisticsMapping & _statMapping=diskLocalStatistics) | |||
: cache(_cache), filename(_filename), id(_id), repFile(_repFile), compressed(_compressed), expander(_expander), fileStats(_statMapping) | |||
CLazyFileIO(CFileCache &_cache, const char *_filename, const char *_id, IActivityReplicatedFile *_repFile, bool _compressed, IExpander *_expander, const StatisticsMapping & _statMapping, size32_t _blockedFileIOSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it will potentially be used for non-index files too. If so, some names may be inappropriate. But not sure you really want to use for non-index files except possibly in FETCH activities, as you are probably already using large buffers anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadn't considered fetch case, but only use case (in hthor and thor) is where it is for index read, where it is expected to potentially be read serially.
name here (in CLazyFileIO) ctor and factory are "blockedFileIOSize", so not specifically for indexes, so I think naming is ok(?), albeit it may never be used for anything but indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only used/passed through for [unfiltered] index reads at the moment.
@@ -7643,3 +7643,110 @@ IAPICopyClient * createApiCopyClient(IStorageApiInfo * source, IStorageApiInfo * | |||
} | |||
return nullptr; | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not review this bit in detail as it looks like it's copied from my prior PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, your original commit is the 1st of the 2 commits in this PR, I made minor changes only to CBlockedFileIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is quite right. There are two issues
i) The size that is read when reading something from an index
ii) The size of reads when reading unfiltered indexes
It is not clear what effect (i) will have on random reading. For branches it would be good, - but they tend to get loaded quickly and stay in memory. We would need some significant tests before were were convinced it was worthwhile. It may also have a significant memory impact if multiple parts are open at once.
Really (ii) is a special case of "what is the most efficient size to read from this storage". Possibly this should be generalised so there was a blockedReadSize on a plane which was used for unfiltered indexes and files.
I'll finish reviewing code and add any specific comments.
system/jlib/jfile.cpp
Outdated
return io->read(pos, len, data); | ||
if (!blockedFileIOBuffer) | ||
{ | ||
blockedFileIOBuffer = malloc(blockSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has problems if there are multiple CBlockedFileIO objects with different blockSizes. Currently it would potentially corrupt memory. If a thread local size was used instead you could prevent the corruption, but would the block size would be determined by whichever file was first read on that thread.
I think the idea may be fundamentally flawed for multithreaded access.
9d59f2b
to
2f26e7d
Compare
9a00180
to
11d18c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith generally looks good. A few minor comments.
system/jlib/jptree.cpp
Outdated
@@ -8878,7 +8878,7 @@ void executeConfigUpdaterCallbacks() | |||
{ | |||
if (!configFileUpdater) // NB: executeConfigUpdaterCallbacks should always be called after configFileUpdater is initialized | |||
return; | |||
configFileUpdater->executeCallbacks(componentConfiguration.getLink(), globalConfiguration.getLink()); | |||
configFileUpdater->executeCallbacks(componentConfiguration, globalConfiguration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a separate fix for a leak. Should it be in a separate PR? (It depends if it also affects 8.12.x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, it is a separate leak fix, and it does effect 8.12.x.
I'll release a separate JIRA/PR: https://track.hpccsystems.com/browse/HPCC-29915
thorlcr/graph/thgraphslave.cpp
Outdated
if (blockedFileIOSize) // enabled | ||
{ | ||
if (compressed || expander) | ||
throw makeStringExceptionV(0, "CLazyFileIO(%s): blockedFileIO cannot be used with compressed files", filename.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ignored rather than throwing an error - in case we start using that default for all sequential file access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith hasn't been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I made the change somewhere, I'll redo.
@@ -7579,3 +7589,103 @@ IAPICopyClient * createApiCopyClient(IStorageApiInfo * source, IStorageApiInfo * | |||
} | |||
return nullptr; | |||
} | |||
|
|||
|
|||
class CBlockedFileIO : public CSimpleInterfaceOf<IFileIO> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have a comment that this is not thread safe, so can only be used from a single thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add comment
system/jlib/jfile.cpp
Outdated
} | ||
virtual size32_t read(offset_t pos, size32_t len, void *data) override | ||
{ | ||
if (len > blockSize || pos+len > fileSize || len==0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the fileSize checking? It means that you may perform an extra stat call which will add latency if it is on blob storage. It should be possible to change the code so it copes with reading at the end of the file.
Also len==0 test is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, refactored to avoid need for fileSize
0e866b8
to
c652d32
Compare
@ghalliday - please review new commit changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith one remaining change. Please squash after fixing so it is ready for merge.
thorlcr/graph/thgraphslave.cpp
Outdated
if (blockedFileIOSize) // enabled | ||
{ | ||
if (compressed || expander) | ||
throw makeStringExceptionV(0, "CLazyFileIO(%s): blockedFileIO cannot be used with compressed files", filename.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith hasn't been changed.
Allow indexes to be read with a large granularity. This is particularly important if reading from storage that charges per read access. Signed-off-by: Jake Smith <[email protected]>
c652d32
to
87d12a6
Compare
addressed and squashed. |
@jakesmith better not to also rebase at the same time as squashing - it makes it hard to spot the changes. |
Allow indexes to be read with a large granularity.
This is particularly important if reading from storage that
charges per read access.
Signed-off-by: Jake Smith [email protected]
Type of change:
Checklist:
Smoketest:
Testing: