-
Notifications
You must be signed in to change notification settings - Fork 14
Outstanding Issues
jpatanooga edited this page Nov 25, 2012
·
10 revisions
- Only supporting a single input file (system breaks the file into splits based on block size)
- We need to update the Iterative Reduce code to allow for N input files, all being properly split into input splits
- ( This is close to being complete )
- Data Locality Issues
- Currently turned off due to YARN issues, fix should be soon
- Starting the Knitting Boar job sometimes fails on YARN Clusters
- Certain clusters have issues with this, the framework can't get enough containers
- We need to make IterativeReduce smarter here with more clever auto-retry code
[josh@sac01 jars]$ yarn jar iterativereduce-0.1-SNAPSHOT.jar app.properties
12/11/21 18:05:00 INFO client.Client: Using input path: hdfs:/user/josh/datasets/20news/four_shards/kboar.txt
12/11/21 18:05:00 INFO yarn.ResourceManagerHandler: Connecting to the resource manager (client) at sac01.mtv.cloudera.com/172.29.121.67:8032
12/11/21 18:05:00 INFO yarn.ResourceManagerHandler: Got a new application with id=application_1352770589658_0080
12/11/21 18:05:00 INFO client.Client: Got an application, id=application_1352770589658_0080, appName=IR_SGD_Broski
12/11/21 18:05:00 WARN client.Client: log4j.properties file not found
12/11/21 18:05:00 WARN iterativereduce.Utils: Unable to copy file /tmp/IR_SGD_Broski/application_1352770589658_0080/log4j.properties: File not found.
12/11/21 18:05:00 INFO yarn.ResourceManagerHandler: Submitting application to ASM
12/11/21 18:05:02 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:04 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:06 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:08 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:10 INFO client.Client: Got applicaton report for, appId=80, state=RUNNING, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=UNDEFINED, trackingUrl=sac01.mtv.cloudera.com:8088//some-place.com/some/endpoint, user=josh
12/11/21 18:05:12 INFO client.Client: Got applicaton report for, appId=80, state=FINISHED, amDiag=, masterHost=sac05.mtv.cloudera.com, masterRpcPort=9999, queue=default, startTime=1353549900792, clientToken=null, finalState=KILLED, trackingUrl=, user=josh
12/11/21 18:05:12 INFO client.Client: Application finished in 13489ms
12/11/21 18:05:12 INFO client.Client: Application completed with en error:
- Durability Story
- Currently if a node fails the framework/job will not be able to recover
- We're working on this one, patches welcome
- Vastly improve the input format / vectoization story
- Finish RCV1 unit test for comparison on speed / accuracy
- Improve overall runtime by removing some "slack" from the IterativeReduce framework
- Make the loss function, hypothesis, and update function "pluggable" to enable other models to be learned
- YARN History Server
- Currently you can only see the log/stdout of the master
- We'd like to add a YARN history server that knows how to display all of the worker info
- Non-fixed number of iterations
- Allow the user to supply a termination condition function
- As opposed to an integer for number of iterations
- More formal way to develop algorithms in a single process unit test which simulates message passing flow