The jobs.query method should not be used in the client libraries.
Reasons: the query can time out, but it still might actually run, costing money & resources. Unless you poll for the results (not obvious you have to do this) you won't see them.
Also, the jobs/query API does not accept a job ID. This means that a retry on failure could result in duplicate queries being run, causing extra resource usage & charges.
Proposal:
- Remove all uses of
jobs/query. Instead use jobs/insert.
- Deprecate
BigQuery.query(QueryRequest). Providing a synchronous method without an explicit job ID is dangerous for the reasons listed above. Alternatively, we could generate a job ID client-side, which would protect against duplicate queries from retries from the client libraries.
- Add a method
BigQuery.query(QueryRequest, JobId) so users can explicitly provide a job ID for safe retries (even against program crashes if they save the job ID somewhere -- important for really big/expensive queries).
The jobs.query method should not be used in the client libraries.
Reasons: the query can time out, but it still might actually run, costing money & resources. Unless you poll for the results (not obvious you have to do this) you won't see them.
Also, the
jobs/queryAPI does not accept a job ID. This means that a retry on failure could result in duplicate queries being run, causing extra resource usage & charges.Proposal:
jobs/query. Instead usejobs/insert.BigQuery.query(QueryRequest). Providing a synchronous method without an explicit job ID is dangerous for the reasons listed above. Alternatively, we could generate a job ID client-side, which would protect against duplicate queries from retries from the client libraries.BigQuery.query(QueryRequest, JobId)so users can explicitly provide a job ID for safe retries (even against program crashes if they save the job ID somewhere -- important for really big/expensive queries).