diff --git a/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml b/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml index b5e8466344b..b33fc7f033c 100644 --- a/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml +++ b/docs/EN_US/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml @@ -721,6 +721,146 @@ workunits in the event of a loss. In addition it would affect every other Thor/Roxie cluster in the same environment if you lose this node. + + + Restoring Dali from backup + + If configured correctly, Dali creates a backup or mirror + copy to a secondary location on another physical server. + (Bare-metal only). + + Systems can be configured with their own scheduled backup to + create a snapshot of the primary store files to a custom location. + The same steps apply when using a snapshot copy of a backup set as + when using the mirror copy. In other words, this technique applies + to either bare-metal or k8s deployments. + + The Dali meta files are comprised of: + + + + store.<NNNN> + (e.g., store.36). This file is a reference to the current Dali + meta file edition. There should never be more than one of + these files. The NNNN is used to determine the current base + and delta files in use. + + + + dalisds<NNNN>.xml + (e.g., dalisds36.xml). This is the main Dali meta info file, + containing all logical file, workunit, and state information. + Sasha (or Dali on save) periodically creates new versions + (with incrementally rising NNNN’s). It will keep the last T + copies (default 10) based on the configuration option + “keepStores”. + + + + daliinc<NNNN>.xml + (e.g., daliinc36.xml). This is the delta transaction log. Dali + continuously writes to this file, recording all changes that + are made to any meta data. It is used to playback changes and + apply them to the base meta info from the + dalisds<NNNN>xml file. + + Specifically, when Sasha creates a new store version, it + loads the base file (e.g., dalisds36.xml), then loads and + applies the delta file (e.g., daliinc36.xml). Sasha then has + its own independent representation of the current state and + saves a new base file (e.g., dalisds(NNNN+1).xml). + + + + dalidet<NNNN>.xml + (e.g., dalidet36.xml). This file is created at the point that + Sasha starts the process of creating a new base file. At which + point it atomically renames the delta transaction file to a + ‘det’ file (short for 'detached'). For example, it renames + daliinc36.xml to dalidet36.xml. Dali then continues to write + new transactions to daliinc36.xml. + + + + dalisds_<MMMM>.bv2 files. These + files are in effect part of the main store (part of + dalisdsNNNN.xml). They are single large values that were + deemed too big to keep in Dali memory, and written to disk + separately instead (and are loaded on demand). + + + + If Dali is shutdown cleanly and saves its files as expected, + the daliinc*.xml and dalidet*.xml files are not needed, since it + saves the entire state of the store directly from internal memory, + and on startup, there is no daliincNNNN.xml or dalidetNNNN.xml + related to the new version. + + These transaction delta files are only used by Sasha when + creating new versions of the base store or if Dali has been + stopped abruptly (e.g., machine rebooted). If Dali restarts after + an unclean exit, there will be a daliincNNN.xml (and possibly a + dalidetNNNN.xml file if Sasha was actively creating a new version + at the time). In those cases, Dali will load these files in + addition to the base file. + + By default Dali’s main data store directory is + /var/lib/HPCCSystems/hpcc-data/dali/ . In other words, all meta + data is written to and read from this location. + + When restoring from a backup: + + Make sure Dali is not running + + + + Make sure the /var/lib/HPCCSystems/hpcc-data/dali + folder is empty. + + + + Copy all pertinent backup file into the + /var/lib/HPCCSystems/hpcc-data/dali folder: + + + + One store.NNNN file + + + + One dalisdsNNNN.xml file + + + + <=1 daliincNNNN.xml file (only if + present) + + + + <=1 dalidetNNNN.xml file (only if + present) + + + + All dalisds_MMMM.bv2 files. + + + + Other/older dalisds/daliinc/dalidet editions could + be copied, but the above are the only ones that will be used. In + other words, only the NNNN version based on the single store.NNNN + file will be loaded. + + The automatic back to a mirror location is bare-metal only. + In a cloud deployment, it is assumed that the storage choices + provided by the cloud provider are providing redundancy, such as + multi-zone replication. + + In either case, and/or if a manual strategy has been used to + copy Dali’s files on a schedule, the process of restoring from a + backup should be the same. +