-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-30911 Store read and write cost as file attributes #18080
Conversation
https://track.hpccsystems.com/browse/HPCC-30911 |
75bdb0a
to
d00eb36
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good in general, mostly 'future' comments, but 1 issue that needs fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good, aside from opening a related JIRA for : #18080 (comment)
Please add or link the JIRA and squash the commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser generally looks good. A couple of bits of code could be commoned up, but may not be worth it for this PR.
Concern that the correct cluster is used when calculating read/write costs, and that legacy costs for indexes may be wild over-estimates.
// (Costs needs to be calculated from numDiskReads and numDiskWrites for legacy files) | ||
numDiskWrites = attrs->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites)); | ||
numDiskReads = attrs->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskReads)); | ||
doLegacyAccessCostCalc = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly only for files, not indexes if the index stats are currently wildly wrong.
dali/ft/filecopy.cpp
Outdated
readCost = money2cost_type(calcFileAccessCost(distributedSource, 0, totalReads)); | ||
} | ||
else | ||
readCost = money2cost_type(calcFileAccessCost(distributedSource, 0, totalNumReads)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Is this tied down to a single cluster by this point? Worth defining an expensive plane and a cheap plane in our standard config, and then checking the costs is recorded correctly depending on the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision to assume that the source was from the first cluster was made as part of HPCC-27388. I recall that there was some discussion and it was agreed that it would be good enough to assume it was from the first cluster. And I think the basis for that decision was that it was assumed that the cost of each plane would be the same. We can see that this assumption may need to revisited. However, this PR was supposed to about storing the read/write cost to ensure that the keyed join read cost calculations are not overwritten by the legacy dynamic calculations. I think it would be worth considering this issue in separate jira:https://track.hpccsystems.com/browse/HPCC-30946
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think that at some point the clusters are reordered so that the "best" cluster comes first - but I'm not sure if that will be done at this point. Something to investigate further.
ecl/hthor/hthor.cpp
Outdated
if (!hasReadWriteCostFields(&fileAttr) && fileAttr.hasProp(getDFUQResultFieldName(DFUQRFnumDiskReads))) | ||
{ | ||
// Legacy file: calculate readCost using prev disk reads and new disk reads | ||
stat_type prevDiskReads = fileAttr.getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskReads), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar code is repeated. Consider creating a helper function
thorlcr/graph/thgraphmaster.cpp
Outdated
{ | ||
stat_type curDiskReads = stats.getStatisticSum(StNumDiskReads); | ||
StringBuffer clusterName; | ||
file->getClusterName(0, clusterName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be whichever cluster it was actually read from - it is likely to make a difference in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To revisit this: https://track.hpccsystems.com/browse/HPCC-30946
thorlcr/graph/thgraphmaster.cpp
Outdated
diskAccessCost = money2cost_type(calcFileAccessCost(clusterName, numDiskWrites, 0)); | ||
cost_type writeCost = money2cost_type(calcFileAccessCost(clusterName, numDiskWrites, 0)); | ||
props.setPropInt64(getDFUQResultFieldName(DFUQRFwriteCost), writeCost); | ||
diskAccessCost += writeCost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will over-accumulate the costs. Should be
diskAccessCost = writeCost
otherwise each time it is called all previous costs will be added again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe updateFileWriteCostStats is only ever called once. However, I've changed to set the diskAccessCost. (This is likely to change shortly as part of https://track.hpccsystems.com/browse/HPCC-30937 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - noticed a couple of issues which would be best addressed before merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser changes look good.
What other jiras need to be opened to ensure that these attributes are available and displayed in eclwatch? They should be linked to the jira for this issue. (I have linked HPCC-30946)
We should also think about how we can regression test these features. |
The accessCost is being displayed correctly. This jira is to show read and write cost attributes in eclwatch: https://track.hpccsystems.com/browse/HPCC-30930 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - please see comments.
dali/base/dadfs.cpp
Outdated
|
||
accessCost = legacyReadCost + calcFileAccessCost(nodeGroup, numDiskWrites, 0); | ||
} | ||
double atRestCost = calcFileAtRestCost(nodeGroup, sizeGB, fileAgeDays); | ||
file->setPropReal(getDFUQResultFieldName(DFUQRFcost), atRestCost+accessCost); | ||
file->setPropReal(getDFUQResultFieldName(DFUQRFatRestCost), atRestCost); | ||
file->setPropReal(getDFUQResultFieldName(DFUQRFaccessCost), accessCost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, baffled, I wrote a comment on last review submission, but can't find it now..
As the code stands (in this PR), "readCost" and "writeCost" are published as top-level file attributes in cost_type format
e.g.:
cost_type curReadCost = money2cost_type(calcFileAccessCost(file, 0, curDiskReads));
file->addAttrValue(getDFUQResultFieldName(DFUQRFreadCost), legacyReadCost + curReadCost);
But then here, in attributes seen by clients using getDFAttributesTreeIterator, it adds "cost", "atRestCost" and "accessCost" which as reals (converted via cost_type2money).
The net result is that all these attributes will be visible to a caller of getDFAttributesTreeIterator, but some will be in one format and some will be in another.
We should make sure they are consistently in the same format.
I'm not sure it matters which format as long as consistent (I suspect ideally they'd remain as cost_type, until a client usage case needs them).
Either change the new "readCost" "writeCost" attributes so they're published in "money" format, and change calculations accordingly.
Or, keep them as cost_type, but then existing code which expects them as 'money' (e.g. getCost) and the esp service code, would need to change.
dali/base/dadfs.cpp
Outdated
cost_type legacyReadCost = getLegacyReadCost(*file, nodeGroup); | ||
__int64 numDiskWrites = file->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites), 0); | ||
|
||
accessCost = legacyReadCost + calcFileAccessCost(nodeGroup, numDiskWrites, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be:
accessCost = cost_type2money(legacyReadCost) + calcFileAccessCost(nodeGroup, numDiskWrites, 0);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - mostly trivial comments that would not stop this being merged,
but 1 comment worth discussing.
dali/base/dadfs.cpp
Outdated
@@ -13434,6 +13434,8 @@ IPropertyTreeIterator *deserializeFileAttrIterator(MemoryBuffer& mb, unsigned nu | |||
|
|||
void setCost(IPropertyTree* file, const char *nodeGroup) | |||
{ | |||
// Set the following dynamic fields: atRestCost, accessCost, cost and for legacy files: readCost, writeCost | |||
// Dyamically calc and set the atRest cost field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this 2nd comment line helpful? i.e. isn't it covered in 1st 'atRestCost'?
dali/base/dadfs.cpp
Outdated
// Costs need to be calculated from numDiskReads and numDiskWrites for legacy files | ||
numDiskWrites = attrs->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites)); | ||
doLegacyAccessCostCalc = true; | ||
// But legacy index files numDiskReads is likely to be widely inaccurate, so don't use it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: I guess "numDiskReads is likely to be widely inaccurate" is not quite accurate, numDiskReads should be correct, it's that costs can't be reliably calculated using it in the context of indexes.
Perhaps rephrase to:
// NB: Costs of index reading can not be reliably estimated based on 'numDiskReads'
dali/base/dadfs.hpp
Outdated
inline cost_type getLegacyReadCost(const IPropertyTree & fileAttr, Source source) | ||
{ | ||
// Legacy files do not have @readCost attribute, so calculate from numDiskRead | ||
// But don't try to calc legacy cost if index file because numDiskReads is likely to be inaccurate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see prev. comment, perhaps:
// NB: Costs of index reading can not be reliably estimated based on 'numDiskReads'
dali/base/dadfs.cpp
Outdated
double atRestCost, accessCost; | ||
calcFileCost(cluster, 0, 0, numDiskWrites, numDiskReads, atRestCost, accessCost); | ||
return accessCost; | ||
Owned<IPropertyTree> costPT = getCostPropTree(cluster); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: could be Owned<const IPropertyTree>
dali/base/dadfs.cpp
Outdated
|
||
extern da_decl double calcFileAtRestCost(const char * cluster, double sizeGB, double fileAgeDays) | ||
{ | ||
Owned<IPropertyTree> costPT = getCostPropTree(cluster); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: could be Owned<const IPropertyTree>
dali/base/dadfs.hpp
Outdated
&& !isFileKey(fileAttr)) | ||
{ | ||
stat_type prevDiskReads = fileAttr.getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskReads), 0); | ||
return money2cost_type(calcFileAccessCost(source, 0, prevDiskReads)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial/formatting: double space after 'return'
dali/base/dadfs.hpp
Outdated
if (!hasReadWriteCostFields(fileAttr) && fileAttr.hasProp(getDFUQResultFieldName(DFUQRFnumDiskWrites))) | ||
{ | ||
stat_type prevDiskWrites = fileAttr.getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites), 0); | ||
return money2cost_type(calcFileAccessCost(source, prevDiskWrites, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial/formatting: double space after 'return'
@@ -3771,7 +3774,13 @@ void FileSprayer::updateTargetProperties() | |||
if (distributedSource) | |||
{ | |||
if (distributedSource->querySuperFile()==nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not new, but what if it is a super being read?
Also not new, but it's misleading that this code to update distributedSource is in a method called "updateTargetProperties". For clarity (now in this PR), I think it should be moved outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is super file, I believe it uses the aggregate of the cost from all the subfiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per offline conversation - to be investigated.
What happens when 'distributedSource' is a superfile (is it even supported?). If it is, what happens (if anything currently) re. updating cost attributes on the subfiles read.
@shamser - I've opened a subtask for this (HPCC-30970), under HPCC-30965
lFile->setCost(file.getPropReal(getDFUQResultFieldName(DFUQRFcost))); | ||
{ | ||
cost_type cost = file.getPropInt64(getDFUQResultFieldName(DFUQRFcost)); | ||
lFile->setCost(cost_type2money(cost)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to myself - there is no backward compatibility issue here, because DFUQRFcost has been dynamically calculated in this version, via getDFAttributesTreeIterator
thorlcr/graph/thgraphmaster.cpp
Outdated
@@ -649,6 +649,18 @@ void CMasterActivity::done() | |||
|
|||
void CMasterActivity::updateFileReadCostStats() | |||
{ | |||
// Returns updates numDiskReads & readCost in the file attributes and returns the readCost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should say
// Updates numDiskReads & readCost in the file attributes and returns the readCost
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good, please squash.
Signed-off-by: Shamser Ahmed <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good.
} | ||
file->setPropInt64(getDFUQResultFieldName(DFUQRFaccessCost), accessCost); | ||
|
||
// Dymically calc and set the total cost field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial/typo; Dymically -> Dynamically
@@ -3771,7 +3774,13 @@ void FileSprayer::updateTargetProperties() | |||
if (distributedSource) | |||
{ | |||
if (distributedSource->querySuperFile()==nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per offline conversation - to be investigated.
What happens when 'distributedSource' is a superfile (is it even supported?). If it is, what happens (if anything currently) re. updating cost attributes on the subfiles read.
@shamser - I've opened a subtask for this (HPCC-30970), under HPCC-30965
RE: What happens when 'distributedSource' is a superfile (is it even supported?). If it is, what happens (if anything currently) re. updating cost attributes on the subfiles read. Initial investigation shows that costs and other stats are not being updated when the distributedSource is a superfile. I created a ticket to work on this: https://track.hpccsystems.com/browse/HPCC-30973 |
Type of change:
Checklist:
Smoketest:
Testing: