Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30911 Store read and write cost as file attributes #18080

Merged
merged 1 commit into from
Dec 7, 2023

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Nov 24, 2023

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

@shamser shamser marked this pull request as ready for review November 24, 2023 14:51
@shamser shamser requested a review from jakesmith November 24, 2023 14:52
@shamser shamser force-pushed the issue30911 branch 3 times, most recently from 75bdb0a to d00eb36 Compare November 24, 2023 15:25
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good in general, mostly 'future' comments, but 1 issue that needs fixing.

dali/base/dadfs.cpp Outdated Show resolved Hide resolved
dali/base/dadfs.cpp Show resolved Hide resolved
dali/base/dadfs.cpp Outdated Show resolved Hide resolved
dali/ft/filecopy.cpp Show resolved Hide resolved
dali/ft/filecopy.cpp Show resolved Hide resolved
thorlcr/graph/thgraphmaster.cpp Outdated Show resolved Hide resolved
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good, aside from opening a related JIRA for : #18080 (comment)

Please add or link the JIRA and squash the commits.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser generally looks good. A couple of bits of code could be commoned up, but may not be worth it for this PR.
Concern that the correct cluster is used when calculating read/write costs, and that legacy costs for indexes may be wild over-estimates.

// (Costs needs to be calculated from numDiskReads and numDiskWrites for legacy files)
numDiskWrites = attrs->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites));
numDiskReads = attrs->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskReads));
doLegacyAccessCostCalc = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly only for files, not indexes if the index stats are currently wildly wrong.

readCost = money2cost_type(calcFileAccessCost(distributedSource, 0, totalReads));
}
else
readCost = money2cost_type(calcFileAccessCost(distributedSource, 0, totalNumReads));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is this tied down to a single cluster by this point? Worth defining an expensive plane and a cheap plane in our standard config, and then checking the costs is recorded correctly depending on the source.

Copy link
Contributor Author

@shamser shamser Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decision to assume that the source was from the first cluster was made as part of HPCC-27388. I recall that there was some discussion and it was agreed that it would be good enough to assume it was from the first cluster. And I think the basis for that decision was that it was assumed that the cost of each plane would be the same. We can see that this assumption may need to revisited. However, this PR was supposed to about storing the read/write cost to ensure that the keyed join read cost calculations are not overwritten by the legacy dynamic calculations. I think it would be worth considering this issue in separate jira:https://track.hpccsystems.com/browse/HPCC-30946

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think that at some point the clusters are reordered so that the "best" cluster comes first - but I'm not sure if that will be done at this point. Something to investigate further.

if (!hasReadWriteCostFields(&fileAttr) && fileAttr.hasProp(getDFUQResultFieldName(DFUQRFnumDiskReads)))
{
// Legacy file: calculate readCost using prev disk reads and new disk reads
stat_type prevDiskReads = fileAttr.getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskReads), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar code is repeated. Consider creating a helper function

{
stat_type curDiskReads = stats.getStatisticSum(StNumDiskReads);
StringBuffer clusterName;
file->getClusterName(0, clusterName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be whichever cluster it was actually read from - it is likely to make a difference in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diskAccessCost = money2cost_type(calcFileAccessCost(clusterName, numDiskWrites, 0));
cost_type writeCost = money2cost_type(calcFileAccessCost(clusterName, numDiskWrites, 0));
props.setPropInt64(getDFUQResultFieldName(DFUQRFwriteCost), writeCost);
diskAccessCost += writeCost;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will over-accumulate the costs. Should be
diskAccessCost = writeCost
otherwise each time it is called all previous costs will be added again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe updateFileWriteCostStats is only ever called once. However, I've changed to set the diskAccessCost. (This is likely to change shortly as part of https://track.hpccsystems.com/browse/HPCC-30937 )

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - noticed a couple of issues which would be best addressed before merged.

dali/base/dadfs.cpp Outdated Show resolved Hide resolved
dali/base/dadfs.cpp Outdated Show resolved Hide resolved
dali/ft/filecopy.cpp Outdated Show resolved Hide resolved
ecl/hthor/hthor.cpp Outdated Show resolved Hide resolved
thorlcr/graph/thgraphmaster.cpp Outdated Show resolved Hide resolved
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser changes look good.
What other jiras need to be opened to ensure that these attributes are available and displayed in eclwatch? They should be linked to the jira for this issue. (I have linked HPCC-30946)

@ghalliday
Copy link
Member

We should also think about how we can regression test these features.

@shamser
Copy link
Contributor Author

shamser commented Nov 30, 2023

@shamser changes look good. What other jiras need to be opened to ensure that these attributes are available and displayed in eclwatch? They should be linked to the jira for this issue. (I have linked HPCC-30946)

The accessCost is being displayed correctly. This jira is to show read and write cost attributes in eclwatch: https://track.hpccsystems.com/browse/HPCC-30930
@ghalliday

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - please see comments.


accessCost = legacyReadCost + calcFileAccessCost(nodeGroup, numDiskWrites, 0);
}
double atRestCost = calcFileAtRestCost(nodeGroup, sizeGB, fileAgeDays);
file->setPropReal(getDFUQResultFieldName(DFUQRFcost), atRestCost+accessCost);
file->setPropReal(getDFUQResultFieldName(DFUQRFatRestCost), atRestCost);
file->setPropReal(getDFUQResultFieldName(DFUQRFaccessCost), accessCost);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, baffled, I wrote a comment on last review submission, but can't find it now..

As the code stands (in this PR), "readCost" and "writeCost" are published as top-level file attributes in cost_type format
e.g.:

        cost_type curReadCost = money2cost_type(calcFileAccessCost(file, 0, curDiskReads));
        file->addAttrValue(getDFUQResultFieldName(DFUQRFreadCost), legacyReadCost + curReadCost);

But then here, in attributes seen by clients using getDFAttributesTreeIterator, it adds "cost", "atRestCost" and "accessCost" which as reals (converted via cost_type2money).
The net result is that all these attributes will be visible to a caller of getDFAttributesTreeIterator, but some will be in one format and some will be in another.

We should make sure they are consistently in the same format.
I'm not sure it matters which format as long as consistent (I suspect ideally they'd remain as cost_type, until a client usage case needs them).
Either change the new "readCost" "writeCost" attributes so they're published in "money" format, and change calculations accordingly.
Or, keep them as cost_type, but then existing code which expects them as 'money' (e.g. getCost) and the esp service code, would need to change.

cost_type legacyReadCost = getLegacyReadCost(*file, nodeGroup);
__int64 numDiskWrites = file->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites), 0);

accessCost = legacyReadCost + calcFileAccessCost(nodeGroup, numDiskWrites, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be:

 accessCost = cost_type2money(legacyReadCost) + calcFileAccessCost(nodeGroup, numDiskWrites, 0);

?

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - mostly trivial comments that would not stop this being merged,
but 1 comment worth discussing.

@@ -13434,6 +13434,8 @@ IPropertyTreeIterator *deserializeFileAttrIterator(MemoryBuffer& mb, unsigned nu

void setCost(IPropertyTree* file, const char *nodeGroup)
{
// Set the following dynamic fields: atRestCost, accessCost, cost and for legacy files: readCost, writeCost
// Dyamically calc and set the atRest cost field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this 2nd comment line helpful? i.e. isn't it covered in 1st 'atRestCost'?

// Costs need to be calculated from numDiskReads and numDiskWrites for legacy files
numDiskWrites = attrs->getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites));
doLegacyAccessCostCalc = true;
// But legacy index files numDiskReads is likely to be widely inaccurate, so don't use it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: I guess "numDiskReads is likely to be widely inaccurate" is not quite accurate, numDiskReads should be correct, it's that costs can't be reliably calculated using it in the context of indexes.
Perhaps rephrase to:
// NB: Costs of index reading can not be reliably estimated based on 'numDiskReads'

inline cost_type getLegacyReadCost(const IPropertyTree & fileAttr, Source source)
{
// Legacy files do not have @readCost attribute, so calculate from numDiskRead
// But don't try to calc legacy cost if index file because numDiskReads is likely to be inaccurate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see prev. comment, perhaps:
// NB: Costs of index reading can not be reliably estimated based on 'numDiskReads'

double atRestCost, accessCost;
calcFileCost(cluster, 0, 0, numDiskWrites, numDiskReads, atRestCost, accessCost);
return accessCost;
Owned<IPropertyTree> costPT = getCostPropTree(cluster);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: could be Owned<const IPropertyTree>


extern da_decl double calcFileAtRestCost(const char * cluster, double sizeGB, double fileAgeDays)
{
Owned<IPropertyTree> costPT = getCostPropTree(cluster);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: could be Owned<const IPropertyTree>

&& !isFileKey(fileAttr))
{
stat_type prevDiskReads = fileAttr.getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskReads), 0);
return money2cost_type(calcFileAccessCost(source, 0, prevDiskReads));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/formatting: double space after 'return'

if (!hasReadWriteCostFields(fileAttr) && fileAttr.hasProp(getDFUQResultFieldName(DFUQRFnumDiskWrites)))
{
stat_type prevDiskWrites = fileAttr.getPropInt64(getDFUQResultFieldName(DFUQRFnumDiskWrites), 0);
return money2cost_type(calcFileAccessCost(source, prevDiskWrites, 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/formatting: double space after 'return'

@@ -3771,7 +3774,13 @@ void FileSprayer::updateTargetProperties()
if (distributedSource)
{
if (distributedSource->querySuperFile()==nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not new, but what if it is a super being read?
Also not new, but it's misleading that this code to update distributedSource is in a method called "updateTargetProperties". For clarity (now in this PR), I think it should be moved outside of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is super file, I believe it uses the aggregate of the cost from all the subfiles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per offline conversation - to be investigated.
What happens when 'distributedSource' is a superfile (is it even supported?). If it is, what happens (if anything currently) re. updating cost attributes on the subfiles read.
@shamser - I've opened a subtask for this (HPCC-30970), under HPCC-30965

lFile->setCost(file.getPropReal(getDFUQResultFieldName(DFUQRFcost)));
{
cost_type cost = file.getPropInt64(getDFUQResultFieldName(DFUQRFcost));
lFile->setCost(cost_type2money(cost));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to myself - there is no backward compatibility issue here, because DFUQRFcost has been dynamically calculated in this version, via getDFAttributesTreeIterator

@@ -649,6 +649,18 @@ void CMasterActivity::done()

void CMasterActivity::updateFileReadCostStats()
{
// Returns updates numDiskReads & readCost in the file attributes and returns the readCost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should say
// Updates numDiskReads & readCost in the file attributes and returns the readCost

?

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good, please squash.

@shamser shamser requested a review from jakesmith December 6, 2023 09:11
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good.

}
file->setPropInt64(getDFUQResultFieldName(DFUQRFaccessCost), accessCost);

// Dymically calc and set the total cost field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/typo; Dymically -> Dynamically

@@ -3771,7 +3774,13 @@ void FileSprayer::updateTargetProperties()
if (distributedSource)
{
if (distributedSource->querySuperFile()==nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per offline conversation - to be investigated.
What happens when 'distributedSource' is a superfile (is it even supported?). If it is, what happens (if anything currently) re. updating cost attributes on the subfiles read.
@shamser - I've opened a subtask for this (HPCC-30970), under HPCC-30965

@shamser
Copy link
Contributor Author

shamser commented Dec 6, 2023

RE: What happens when 'distributedSource' is a superfile (is it even supported?). If it is, what happens (if anything currently) re. updating cost attributes on the subfiles read.

Initial investigation shows that costs and other stats are not being updated when the distributedSource is a superfile. I created a ticket to work on this: https://track.hpccsystems.com/browse/HPCC-30973

@ghalliday ghalliday merged commit 1d01b18 into hpcc-systems:candidate-9.4.x Dec 7, 2023
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants