Skip to content

Outstanding Issues

jpatanooga edited this page Nov 25, 2012 · 10 revisions

ToDo

  • Only supporting a single input file (system breaks the file into splits based on block size)
    • We need to update the Iterative Reduce code to allow for N input files, all being properly split into input splits
    • ( This is close to being complete )
  • Data Locality Issues
    • Currently turned off due to YARN issues, fix should be soon
  • Starting the Knitting Boar job sometimes fails on YARN Clusters
    • Certain clusters have issues with this, the framework can't get enough containers
    • We need to make IterativeReduce smarter here with more clever auto-retry code
[josh@sac01 jars]$ yarn jar iterativereduce-0.1-SNAPSHOT.jar app.properties 
12/11/21 18:05:00 INFO client.Client: Using input path: hdfs:/user/josh/datasets/20news/four_shards/kboar.txt
12/11/21 18:05:00 INFO yarn.ResourceManagerHandler: Connecting to the resource manager (client) at sac01.mtv.cloudera.com/172.29.121.67:8032
12/11/21 18:05:00 INFO yarn.ResourceManagerHandler: Got a new application with id=application_1352770589658_0080
12/11/21 18:05:00 INFO client.Client: Got an application, id=application_1352770589658_0080, appName=IR_SGD_Broski
12/11/21 18:05:00 WARN client.Client: log4j.properties file not found
12/11/21 18:05:00 WARN iterativereduce.Utils: Unable to copy file /tmp/IR_SGD_Broski/application_1352770589658_0080/log4j.properties: File not found.
12/11/21 18:05:00 INFO yarn.ResourceManagerHandler: Submitting application to ASM
12/11/21 18:05:02 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:04 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:06 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:08 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:10 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:12 INFO client.Client: Got applicaton report for, appId=80, state=FINISHED, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=KILLED, trackingUrl=, user=josh
12/11/21 18:05:12 INFO client.Client: Application finished in 13489ms
12/11/21 18:05:12 INFO client.Client: Application completed with en error: 
  • Durability Story
    • Currently if a node fails the framework/job will not be able to recover
    • We're working on this one, patches welcome
  • Vastly improve the input format / vectoization story
  • Finish RCV1 unit test for comparison on speed / accuracy
  • Improve overall runtime by removing some "slack" from the IterativeReduce framework
  • Make the loss function, hypothesis, and update function "pluggable" to enable other models to be learned
  • YARN History Server
    • Currently you can only see the log/stdout of the master
    • We'd like to add a YARN history server that knows how to display all of the worker info
  • Non-fixed number of iterations
    • Allow the user to supply a termination condition function
    • As opposed to an integer for number of iterations
  • More formal way to develop algorithms in a single process unit test which simulates message passing flow
Clone this wiki locally