Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7860

v1new ReadFromDatastore returns duplicates if keys are of mixed types

Details

    • Bug
    • Status: Resolved
    • P0
    • Resolution: Fixed
    • 2.13.0
    • 2.15.0
    • io-py-gcp
    • None
    • Python 2.7
      Python 3.7

    Description

      In the presence of mixed type keys, v1new ReadFromDatastore may return duplicate items. The attached example returns 4 records, not the expected 3.

       

      // code placeholder
      from __future__ import unicode_literals
      import apache_beam as beam
      from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
      from apache_beam.io.gcp.datastore.v1new import datastoreio
      
      
      config = dict(project='your-google-project', namespace='test')
      
      
      def test_mixed():
          keys = [
              Key(['mixed', '10038260-iperm_eservice'], **config),
              Key(['mixed', 4812224868188160], **config),
              Key(['mixed', '99152975-pointshop'], **config)
          ]
      
          entities = map(lambda key: Entity(key=key), keys)
      
          with beam.Pipeline() as p:
              (p
                  | beam.Create(entities)
                  | datastoreio.WriteToDatastore(project=config['project'])
              )
      
          query = Query(kind='mixed', **config)
      
          with beam.Pipeline() as p:
              (p
                  | datastoreio.ReadFromDatastore(query=query, num_splits=4)
                  | beam.io.WriteToText('tmp.txt', num_shards=1, shard_name_template='')
          )
      
          items = open('tmp.txt').read().strip().split('\n')
          assert len(items) == 3, 'incorrect number of items'
      
      
      

      Attachments

        Issue Links

          Activity

            People

              udim Udi Meiri
              innohead Niels Stender
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m