Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark rapids tools 1120 fea args #11

Merged
merged 7 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import java.io.{OutputStream, PrintStream}

import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer
import scala.concurrent.duration.{DurationInt, FiniteDuration, NANOSECONDS}
import scala.concurrent.duration.NANOSECONDS

import org.apache.commons.io.output.TeeOutputStream

Expand All @@ -39,9 +39,8 @@ import org.apache.spark.sql.rapids.tool.util.{RuntimeUtil, ToolsTimer}
class Benchmark(
name: String,
valuesPerIteration: Long,
minNumIters: Int = 2,
warmupTime: FiniteDuration = 2.seconds,
minTime: FiniteDuration = 2.seconds,
minNumIters: Int,
warmUpIterations: Int,
outputPerIteration: Boolean = false,
output: Option[OutputStream] = None) {
import Benchmark._
Expand Down Expand Up @@ -121,16 +120,16 @@ class Benchmark(
*/
def measure(num: Long, overrideNumIters: Int)(f: ToolsTimer => Unit): Result = {
System.gc() // ensures garbage from previous cases don't impact this one
val warmupDeadline = warmupTime.fromNow
while (!warmupDeadline.isOverdue) {
var wi = 0
while (wi < warmUpIterations) {
amahussein marked this conversation as resolved.
Show resolved Hide resolved
f(new ToolsTimer(-1))
wi += 1
}
val minIters = if (overrideNumIters != 0) overrideNumIters else minNumIters
val minDuration = if (overrideNumIters != 0) 0 else minTime.toNanos
val runTimes = ArrayBuffer[Long]()
var totalTime = 0L
var i = 0
while (i < minIters || totalTime < minDuration) {
while (i < minIters) {
val timer = new ToolsTimer(i)
f(timer)
val runTime = timer.totalTime()
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package org.apache.spark.rapids.tool.benchmarks
amahussein marked this conversation as resolved.
Show resolved Hide resolved

import org.rogach.scallop.{ScallopConf, ScallopOption}

class BenchmarkArgs(arguments: Seq[String]) extends ScallopConf(arguments) {

banner("""
Benchmarker class for running various benchmarks.
""")
amahussein marked this conversation as resolved.
Show resolved Hide resolved

val iterations: ScallopOption[Int] = opt[Int](name = "iterations", short = 'i', default = Some(5),
descr = "Total iterations to run")
amahussein marked this conversation as resolved.
Show resolved Hide resolved

amahussein marked this conversation as resolved.
Show resolved Hide resolved
val warmupIterations: ScallopOption[Int] = opt[Int](name = "warmupIterations", short = 'w' ,
default = Some(3), descr = "Number of warmup iterations to run")
amahussein marked this conversation as resolved.
Show resolved Hide resolved

val outputDirectory: ScallopOption[String] = opt[String](name = "outputDirectory", short = 'd',
default = Some("."), descr = "Directory to write output to")
amahussein marked this conversation as resolved.
Show resolved Hide resolved

val outputFormat: ScallopOption[String] = opt[String](name = "outputFormat", short = 'o',
default = Some("tbl"), descr = "Format of output ( tbl, json)")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • More details about the options and what each one of them mean.
  • Suggest changing "tbl" to "text"
  • the short version 'o' should be for the outputDirectory. For this option, "'f'" would be more suitable
  • QQ why do we specify the name in all the options? shouldn't Scallop do that automatically?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added more description
  • Changed tbl to text. Current PR does not do any processing on output format. Later iteration will merge the JSON output changes
  • The short formats have been updated
  • The name option is just for verbosity. Saw online implementation using name param hence added it

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive if we leave it without name then scalaOpt will automatically convert the variable name.
For example, outputDirectory becomes output-directory
It will be better to use the snake-case because the tools main classes are following that standard. QualificationMain and ProfilingMain

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the name param to enable snake-case usage in params


val extraArgs: ScallopOption[String] = opt[String](name = "extraArgs" , short = 'a',
required = false,
descr = "Extra arguments to pass to the benchmark")

verify()

amahussein marked this conversation as resolved.
Show resolved Hide resolved
}
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,10 @@ abstract class BenchmarkBase {
* Implementations of this method are supposed to use the wrapper method `runBenchmark`
* for each benchmark scenario.
*/
def runBenchmarkSuite(mainArgs: Array[String]): Unit
def runBenchmarkSuite(iterations: Int,
warmUpIterations: Int,
outputFormat: String,
mainArgs: Array[String]): Unit

final def runBenchmark(benchmarkName: String)(func: => Any): Unit = {
val separator = "=" * 96
Expand All @@ -49,8 +52,10 @@ abstract class BenchmarkBase {
def afterAll(): Unit = {}

def main(args: Array[String]): Unit = {

val conf = new BenchmarkArgs(args)
// TODO: get the dirRoot from the arguments instead
val dirRoot = ""
val dirRoot = conf.outputDirectory().stripSuffix("/")
amahussein marked this conversation as resolved.
Show resolved Hide resolved
val resultFileName = "results.txt"
val dir = new File(s"$dirRoot/$prefix/")
if (!dir.exists()) {
Expand All @@ -61,7 +66,10 @@ abstract class BenchmarkBase {
file.createNewFile()
}
output = Some(new FileOutputStream(file))
runBenchmarkSuite(args)
runBenchmarkSuite(conf.iterations(),
conf.warmupIterations(),
amahussein marked this conversation as resolved.
Show resolved Hide resolved
conf.outputFormat(),
conf.extraArgs().split("\\s+").filter(_.nonEmpty))
output.foreach { o =>
if (o != null) {
o.close()
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
package org.apache.spark.rapids.tool.benchmarks
amahussein marked this conversation as resolved.
Show resolved Hide resolved

import com.nvidia.spark.rapids.tool.qualification.QualificationArgs
import com.nvidia.spark.rapids.tool.qualification.QualificationMain.mainInternal

object QualificationBenchmark extends BenchmarkBase {
amahussein marked this conversation as resolved.
Show resolved Hide resolved
amahussein marked this conversation as resolved.
Show resolved Hide resolved
override def runBenchmarkSuite(iterations: Int,
warmUpIterations: Int,
outputFormat: String,
mainArgs: Array[String]): Unit = {
runBenchmark("QualificationBenchmark") {
val benchmarker =
new Benchmark(
"QualificationBenchmark",
2,
output = output,
outputPerIteration = true,
warmUpIterations = warmUpIterations,
minNumIters = iterations)
benchmarker.addCase("QualificationBenchmark") { _ =>
mainInternal(new QualificationArgs(mainArgs),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am little bit confused here. If we initialize QualificationArgs on line 21, then what are those default arguments that are passed to the benchmarker variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified post discussion. Benchmarker arguments control iterations and warm up iterations. The passed ffunction to the benchmarker is what is benchmarked internally

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will iterate on this later.

printStdout = true, enablePB = true)
}
benchmarker.run()
}
}
}