Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38319

Implement Strict Mode to prevent QUERY the entire table

    XMLWordPrintableJSON

Details

    Description

      We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.

      We would like to restrict users from querying the entire table and force them to use WHERE clause in the query based on partition column (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) and  LIMIT the output of the query when ORDER BY is used.

      This behaviour is similar to what hive exposes as configuration

      hive.strict.checks.no.partition.filter

      hive.strict.checks.orderby.no.limit

      and is described here: https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812

      and

      https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1816

       

      This is a pretty common usecase / feature that we meet in other tools as well,  like in BigQuery for example: https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries  .

      It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            kanoute dimtiris kanoute
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: