Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7860

v1new ReadFromDatastore returns duplicates if keys are of mixed types

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.13.0
    • Fix Version/s: 2.15.0
    • Component/s: io-py-gcp
    • Labels:
      None
    • Environment:
      Python 2.7
      Python 3.7

      Description

      In the presence of mixed type keys, v1new ReadFromDatastore may return duplicate items. The attached example returns 4 records, not the expected 3.

       

      // code placeholder
      from __future__ import unicode_literals
      import apache_beam as beam
      from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
      from apache_beam.io.gcp.datastore.v1new import datastoreio
      
      
      config = dict(project='your-google-project', namespace='test')
      
      
      def test_mixed():
          keys = [
              Key(['mixed', '10038260-iperm_eservice'], **config),
              Key(['mixed', 4812224868188160], **config),
              Key(['mixed', '99152975-pointshop'], **config)
          ]
      
          entities = map(lambda key: Entity(key=key), keys)
      
          with beam.Pipeline() as p:
              (p
                  | beam.Create(entities)
                  | datastoreio.WriteToDatastore(project=config['project'])
              )
      
          query = Query(kind='mixed', **config)
      
          with beam.Pipeline() as p:
              (p
                  | datastoreio.ReadFromDatastore(query=query, num_splits=4)
                  | beam.io.WriteToText('tmp.txt', num_shards=1, shard_name_template='')
          )
      
          items = open('tmp.txt').read().strip().split('\n')
          assert len(items) == 3, 'incorrect number of items'
      
      
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                udim Udi Meiri
                Reporter:
                innohead Niels Stender
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m