Description
Add provider-maintained deferrable support for invoking and waiting on Google Cloud Functions / HTTP Cloud Run functions from apache-airflow-providers-google.
Today, the Google provider supports deferrable execution for several Google Cloud services, including BigQuery, GCS, Dataflow, Pub/Sub, and Cloud Run Jobs via CloudRunExecuteJobOperator. However, there does not appear to be an equivalent deferrable operator/sensor/trigger pattern for HTTP Cloud Functions or HTTP Cloud Run functions.
CloudFunctionInvokeFunctionOperator is synchronous and documented as intended for testing purposes with limited traffic. For production workflows that trigger a function and then need to wait for asynchronous completion, users currently need to either:
maintain custom Airflow trigger/sensor/operator code, including Google auth, polling, timeout, retry, and failure semantics; or
implement an indirect durable-status pattern, such as having the function write completion state to BigQuery/GCS and waiting on that state with an existing deferrable sensor.
It would be useful to have a first-class deferrable pattern in the Google provider for this use case, for example a deferrable Cloud Functions / HTTP Cloud Run function operator or sensor that handles invocation, authenticated HTTP requests, polling/completion checks, timeout handling, retries, and failure propagation.
Use case/motivation
We have Airflow DAGs that trigger Google Cloud Functions / HTTP Cloud Run functions to perform asynchronous work. The function invocation itself is short, but the downstream processing can take longer, and Airflow needs to wait for the work to complete before continuing the DAG.
Because there is no provider-maintained deferrable Cloud Functions / HTTP Cloud Run function sensor/trigger today, using a synchronous task or regular sensor would occupy worker resources while waiting. To avoid that, we currently use a workaround where the function writes completion/status data to BigQuery, and Airflow waits on that status using an existing deferrable BigQuery sensor.
This works, but it adds extra infrastructure and indirection only to compensate for the missing deferrable Cloud Functions / HTTP function pattern. A provider-supported deferrable operator/sensor would reduce maintenance burden, avoid custom triggerer code, and make this pattern more consistent with other Google provider integrations such as BigQuery, GCS, Pub/Sub, Dataflow, dbt-style async workflows, and Cloud Run Jobs.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Description
Add provider-maintained deferrable support for invoking and waiting on Google Cloud Functions / HTTP Cloud Run functions from apache-airflow-providers-google.
Today, the Google provider supports deferrable execution for several Google Cloud services, including BigQuery, GCS, Dataflow, Pub/Sub, and Cloud Run Jobs via CloudRunExecuteJobOperator. However, there does not appear to be an equivalent deferrable operator/sensor/trigger pattern for HTTP Cloud Functions or HTTP Cloud Run functions.
CloudFunctionInvokeFunctionOperator is synchronous and documented as intended for testing purposes with limited traffic. For production workflows that trigger a function and then need to wait for asynchronous completion, users currently need to either:
maintain custom Airflow trigger/sensor/operator code, including Google auth, polling, timeout, retry, and failure semantics; or
implement an indirect durable-status pattern, such as having the function write completion state to BigQuery/GCS and waiting on that state with an existing deferrable sensor.
It would be useful to have a first-class deferrable pattern in the Google provider for this use case, for example a deferrable Cloud Functions / HTTP Cloud Run function operator or sensor that handles invocation, authenticated HTTP requests, polling/completion checks, timeout handling, retries, and failure propagation.
Use case/motivation
We have Airflow DAGs that trigger Google Cloud Functions / HTTP Cloud Run functions to perform asynchronous work. The function invocation itself is short, but the downstream processing can take longer, and Airflow needs to wait for the work to complete before continuing the DAG.
Because there is no provider-maintained deferrable Cloud Functions / HTTP Cloud Run function sensor/trigger today, using a synchronous task or regular sensor would occupy worker resources while waiting. To avoid that, we currently use a workaround where the function writes completion/status data to BigQuery, and Airflow waits on that status using an existing deferrable BigQuery sensor.
This works, but it adds extra infrastructure and indirection only to compensate for the missing deferrable Cloud Functions / HTTP function pattern. A provider-supported deferrable operator/sensor would reduce maintenance burden, avoid custom triggerer code, and make this pattern more consistent with other Google provider integrations such as BigQuery, GCS, Pub/Sub, Dataflow, dbt-style async workflows, and Cloud Run Jobs.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct