-
Notifications
You must be signed in to change notification settings - Fork 0
License
GeoGateway/PythonRDAHMM
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This directory contains data processing, RDAHMM modeling and evaluation workflow for SCRIPPS dataset available at: http://garner.ucsd.edu/pub/timeseries/measures/ats/WesternNorthAmerica/ Each step consists of 2 scripts: - [scripps_ingest|rdahmm_model|rdahmm_eval]_single.py processes a given dataset/tarball; - [scripps_ingest|rdahmm_model|rdahmm_eval].py invokes corresponding _single.py script in Threads to process all datasets in parallel crondownload contains wget command to download all tar balls from SCRIPPS site with striped directory structures. rdahmmcmd is copied from QuakeSim2 setup showing how to use rdahmm binary. properties.py defines global variables that can be adjusted for user setup. The prototype is developed and tested under gf16.ucs.indiana.edu:/home/yuma. Basic workflow: - For each downloaded tarball (i.e. dataset), create a corresponding dataset directory (e.g. WNAM_Clean_DetrendNeuTimeSeries_comb). - All raw data are ingested into a SQLite database named after corresponding tar ball (e.g. WNAM_Clean_DetrendNeuTimeSeries_comb_20120421.sqlite) - In addition, a SQLite db is created for each station (e.g zoa1.sqlite), containing time series of a given station, missing valued filled wth duplicates. - RDAHMM modeling and evaluation are based on individual station databases of a given dataset. - Modeling uses data up to train_epoch (2011-12-31) defined in properties.py, unless it's less than 3 years (365*3) of data, in which case all available data are used. - Modeling start and end time are recorded respectively in daily_project_stationID.input.[start|end]time This record can be used for data re-train later on. - In evaluation, if the result daily_project_stationID_eval-date.Q file contains 0, rdahmm_eval_cmd is re-run with additional "-addstate" parameter. The .Q file is renamed/copied to .all.Q for plotting conventions. To Do: - Generate overall state change XML and txt files of a given dataset, required by plotting functions. Consider reuse existing Java implementation. Per Xiaoming's note, we can write a standalone Java program that scans the evaluation result directory ("eval_path"), use .all.Q and .all.raw file for each station of a given dataset, reuse following functions to generate XML: - writeResToXML(), addResToXMLDoc(), addElePattern(), makeAllStationInput(), addStateChangeNum(), saveStateChangeNums(), plotStateChangeNums() in the DailyRDAHMMThread class - makeFlatInputFile() in the DailyRDAHMMStation class - Set up cron schedule for dataset download and evaluation. This takes some thoughts on directory structures on a production machine. All adjustment can be done by modifying corresponding path in properties.py. - Need to be careful on how to replace the existing evaluation results with the latest one without interrupting data availability through web interface. - May need a script to periodically retrain datasets that have daily_project_stationID.input.endtime newer than train_epoch, i.e. didn't have enough data during the original train. - Existing plotting functions may be out of date, some produce errors on gf16, whose gnuplot library is 5 years newer than gf13. rdahmm_eval_single.py currently has a section at the end to generate silly files just to accommodate existing plotting conventions, it needs to be updated when plotting method changes down the road. Notes: - Looks like the initial train of the complete dataset can take a week or so. gf16.ucs.indiana.edu:/home/yuma/RDAHMM/Model/ results can be used to save time. - checking_missing_data.py script looks for any time series that exists in data_path but not in the corresponding model and evaluation path. It can be used to identify additional datasets to be individually processed without redo the complete model run. - *StrainNeuTimeSeries* contains bad formatted station files, exclude them from processing for now. Emailed Mindy at SCRIPPS to report issues. - Once trained with data up to 2011-12-31, evaluation runs of the complete dataset is pretty fast, took less than an hour.
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published