Skip to content

Shared data based xCAT MN HA mini design

Yuan Bai edited this page May 11, 2018 · 18 revisions

General

Use xcatha-setup and xcatha-failover to automate most functions in user case.

xcatha-setup: We have xcat-automation using ansible to install xCAT now. In setup and configure HA mn part, add xcatha-setup script to fill in the gaps, the xcatha-setup can run standalone, and xcat-automation can integrate it easily. In the future, we can use xcat-automation to setup and configure HA MN nodes.

xcatha-failover: realize activate|deactivate HA MN.

User interface:

  1. xcatha-setup -p <shared-data directory path> -i <nic> -v <virtual ip> [-m <netmask>] [-t <database type>]

  2. activate MN: xcatha-failover -a|--activate -p <shared-data directory path> -i <nic> -v <virtual ip> [-m <netmask>] [-t <database type>]

  3. deactivate MN: xcatha-failover -d|--deactivate -i <nic> -v <virtual ip>

Workflow:

  1. xcatha-setup setup and configure HA mn work flow:

    1. check virtual ip, make sure virtual ip is not used (ping), or else, exit with error message
    2. add virtual ip into its nic
    3. set hostname for virtual ip
    4. check xCAT installed or not:
      1. if xcat is not installed, install xCAT;
      2. if xcat is installed, skip this step
    5. check if the site table master and nameservers and network tftpserver attribute are the Virtual ip, if not, correct them
    6. switch DB to target type if current DB type is different from target type
    7. check if there is xcat data in shared data directories:
      1. if no xcat data in shared data, and shared data directory permission and size are proper, copy xcat data into shared data directories
      2. create symbolic link to share data directories
    8. add the mn into policy table
    9. xcatha-failover deactivate this MN node
  2. xcatha-failover -a|--activate

    1. check virtual ip, make sure virtual ip is not used (ping), or else, exit
    2. add virtual ip into its nic
    3. set hostname to virtual ip
    4. check if current DB type is matched, if not, exit and clean up env
    5. make symbolic link to share data directories, for example:
      /install -> /HA-data/install
      /etc/xcat ->/HA-data/etc/xcat
      /root/.xcat -> /HA-data/root/.xcat
      /var/lib/pgsql -> /HA-data/var/lib/pgsql
      /tftpboot -> /HA-data/tftpboot
      
    6. start/re-configure all related services as followings: a. database (mysql/postgresql/sqlite type) b. xcatd c. named service (makedns -n) d. DHCP service (makedhcp -n, makedhcp -a) e. Console Server f. ... ...
  3. xcatha-failover -d|--deactivate

    1. make sure all related services as followings are down, make sure all related services are configured stop from starting on reboot a. console service b. DHCP service c. named service d. xcatd e. database (mysql/postgresql/sqlite type)
    2. umount/un-link shared data directories on host1
    3. change hostname if needed
    4. remove virtual IP

Function modules

config_vip:

  1. check if vip is used or not
  2. configure virtual ip as non-persistent alias IP address
  3. add the alias ip address into the /etc/resolv.conf as the nameserver

change_hostname:

  1. change the hostname resolution order to be using /etc/hosts before using name server
  2. change hostname to the hostname that resolves to the specific ip address
  3. add the specific ip address and its hostname into /etc/hosts

unconfig_vip: remove virtual ip, call change_hostname to original hostname

check_xcat_attribute: check attribute value is the virtual ip (master and nameservers in site table, tftpserver in networks table)

run_service: start|stop service, if success, return [Passed], or else , retry, after retry 5 times and get failed , return [Failed]

config_shared_data:

  1. check if xcat data is in shared data directory or not, if not:
    1. check shared data directory permission
    2. check shared data size
    3. create xcat data structure in shared data directory
    4. copy xcat data into shared data directories
  2. create symbolic link to share data directories

unconfig_shared_data: unlink shared data directories

clean_up_env: if some service is failed, call unconfig_vip, call change_hostname, to restore original hostname

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally