Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-25569

Reassess Ambari Metrics data migration

    XMLWordPrintableJSON

Details

    Description

      The data migration process of Ambari Metrics as described at https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/bk_ambari-upgrade-major/content/upgrading_HDP_post_upgrade_tasks.html

      is causing issues, like not migrating data that would be expected by the user. (e.g. Yarn Queue metrics other than the root queue's.)

      The data migration is usually called by the

       

      /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ upgrade_start /etc/ambari-metrics-collector/conf/metrics_whitelist "31556952000"
      

      command where the whitelist is specified.

      The migration code only looks for the metrics that are present in the whitelist file. This is true even in the case when the AMS Whitelisting is not enabled. The user will only have those metrics migrated that are present in the whitelist file, which is usually not all that are required.

       

      I suggest the following change:

      • If whitelist file parameter is provided then
        • migrate only the metrics that are in the whitelist file
      • if --allmetrics value is provided in place of whitelist file parameter then
        • migrate all metrics regardless of other configuration settings
      • if whitelist file parameter is not provided ( and the time period for data migration is also not provided) then
        • if whitelisting is enabled then
          • discover the whitelist file configured in AMS and migrate only the metrics that are in the whitelist file
        • if whitelisting is disabled then
          • migrate all the metrics present in the database

      Examples:

      • Migrate the metrics present in the whitelist file that are not older than one year (365 days)
        /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ upgrade_start /etc/ambari-metrics-collector/conf/metrics_whitelist "365"
      • Migrate the metrics present in the whitelist file that are not older than the default one month (30 days)
        /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ upgrade_start /etc/ambari-metrics-collector/conf/metrics_whitelist
      • Migrate all metrics that are not older than one year (365 days)
        /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ upgrade_start --allmetrics "365"
      • Migrate all metrics that are not older than the default one month (30 days)
        /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ upgrade_start --allmetrics
      • If whitelisting is enabled then migrate the metrics present in the whitelist file configured in Ambari that are not older than the default one month (30 days). If whitelisting is disabled Migrate all metrics that are not older than the default one month.
        /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ upgrade_start

       

      1. Introduce an '--allmetrics' to enforce migration of all metrics regardless of other settings.
      Due to the suboptimal argument handling, if one wants to define an argument that comes after the 'whitelist file'
      argument - like the 'starttime' - the 'whitelist file' argument must be defined.
      But when we don't want to use the whitelist data because we need to migrate all the metrics the '--allmetrics' argument can be provided instead of 'whitelist file'.

      Example: migrate all the metrics from the last year
      /usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ upgrade_start --allmetrics "365"

      2. The start time handling should be fixed and changed

      • The code is intended to migrate data from the "last x milliseconds" as the handling of the default data shows where the startTime is subtracted from the current timestamp.
        public static final long DEFAULT_START_TIME = System.currentTimeMillis() - ONE_MONTH_MILLIS; //Last month
        But when the user externally provided the startTime value it was not subtracted from the current timestamp, but was used as it is, which is indeed erroneous.
      • Also, I suggest using days instead of milliseconds to define the required migration time window, because it is a more realistic and convenient granularity. Like in the above example the command will migrate data from the last 365 days.

      3. Furthermore, the migration process frequently dies silently while saving the metadata.

      The log message "Saving metadata to store..." is present in the logs but the "Metadata was saved." is mostly never there, but there are no other error messages. I suggest revising the current solution where the saving of the metadata is triggered in a Shutdown hook.

      Attachments

        Activity

          People

            payert Tamas Payer
            payert Tamas Payer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 50m
                1h 50m