Details
Description
We are using Spark Thrift Server as a service to run Spark SQL queries along with Hive metastore as the metadata service.
We would like to restrict users from querying the entire table and force them to use WHERE clause in the query based on partition column (i.e. SELECT * FROM TABLE WHERE partition_column=<column_value>) and LIMIT the output of the query when ORDER BY is used.
This behaviour is similar to what hive exposes as configuration
hive.strict.checks.no.partition.filter
hive.strict.checks.orderby.no.limit
and is described here: https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1812
and
This is a pretty common usecase / feature that we meet in other tools as well, like in BigQuery for example: https://cloud.google.com/bigquery/docs/querying-partitioned-tables#require_a_partition_filter_in_queries .
It would be nice to have this feature implemented in Spark when hive support is enabled in a spark session.