Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-508

Too high cardinality is not suitable for dictionary!

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • v0.7.1
    • Job Engine

    Description

      Hi !
      With building a cube faild, it throws some error.
      ```
      [QuartzScheduler_Worker-22]:[2015-01-08 00:21:38,468][INFO][com.kylinolap.dict.DictionaryGenerator.buildDictionaryFromValueList(DictionaryGenerator.java:72)] - Dictionary cardinality 9999956
      [QuartzScheduler_Worker-22]:[2015-01-08 00:21:38,468][ERROR][com.kylinolap.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:55)] - Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??
      java.lang.IllegalArgumentException: Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??
      at com.kylinolap.dict.DictionaryGenerator.buildDictionaryFromValueList(DictionaryGenerator.java:75)
      at com.kylinolap.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:110)
      at com.kylinolap.dict.DictionaryManager.buildDictionary(DictionaryManager.java:166)
      at com.kylinolap.cube.CubeManager.buildDictionary(CubeManager.java:171)
      ```
      in source code
      ```
      /**

      • @author yangli9
        */
        @SuppressWarnings( { "rawtypes", "unchecked" }

        )
        public class DictionaryGenerator {

      private static final Logger logger = LoggerFactory.getLogger(DictionaryGenerator.class);

      private static final String[] DATE_PATTERNS = new String[]

      { "yyyy-MM-dd" }

      ;

      public static Dictionary<?> buildDictionaryFromValueList(DictionaryInfo info, List<byte[]> values) {
      info.setCardinality(values.size());
      ...
      // log a few samples
      StringBuilder buf = new StringBuilder();
      for (Object s : samples)

      { if (buf.length() > 0) buf.append(", "); buf.append(s.toString()).append("=>").append(dict.getIdFromValue(s)); }

      logger.info("Dictionary value samples: " + buf.toString());
      logger.info("Dictionary cardinality " + info.getCardinality());

      if (values.size() > 1000000)
      throw new IllegalArgumentException("Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??");

      return dict;
      ...
      ```
      Here is limit to 1000000, what is it means?

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/KylinOLAP/Kylin/issues/364
      Created by: Yancey1989
      Labels:
      Created at: Thu Jan 08 00:27:15 CST 2015
      State: open

      Attachments

        Activity

          People

            liyang.gmt8@gmail.com liyang
            lukehan Luke Han
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: