Welcome to gcloud-utils’s documentation!¶
Indices and tables¶
User Guide¶
There are two ways to use some functions with GCloud-utils: as CLI or as a python package.
CLI¶
The are some functions that can be used as a CLI.
- Saving query result in a BigQuery table
query_to_table dataset table json_key YYYYMMDD query_file -Aquery_arg1=arg -Aquery_arg2=arg"
- The command parameters are:
YYYMMMDD: date of the script (current time is the default value).-A: parameter to pass args to the query or the query’s file.json_key: credentials to bigquery service.
The CLI allows put some fixed variables in queries:
previous_date: previous date of declared current date (YYYYMMDD)start_date: declared current date (YYYYMMDD)next_date: next date of declared current date (YYYYMMDD)
- Importing table from BigQuery to Cloud Storage
table_to_gcs dataset table bucket cloudstorage_filename json_key YYYYMMDD time_delta export_format compression_format
- Where the parameters are:
dataset: Name of the dataset savebucket: Name of the bucket savecloudstorage_filename: Path from Google Storage to savejson_key: Path to the Google Credentials Application fileYYYMMMDD: Date of the scripttime_delta: Amount of days before current date to get the table
- Save table from BigQuery in GoogleStorage
gcs_to_table bucket cloudstorage_filename dataset table json_key YYYYMMDD
- Where the parameters are:
YYYMMMDD: Date of the scriptjson_key: Credentials to bigquery service
Python package¶
BigQuery¶
Simple query
from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery
query = "SELECT * FROM bq_table"
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
result = bq_client.query(self, query, **kwargs)
Query with parameters
from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery
from gcloud_utils.bigquery.query_builder import QueryBuilder
query = QueryBuilder("SELECT * FROM ${my_table}")
query.with_vars(my_table="bq_table")
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
result = bq_client.query(self, query)
Saving Query in BigQuery
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
bq_client.query_to_table(
query_or_object,
dataset_id,
table_id,
write_disposition="WRITE_TRUNCATE",
job_config=None,
)
Saving BigQuery’s table in Cloud Storage
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
bq_client.table_to_cloud_storage(
dataset_id,
table_id,
bucket_name,
filename,
job_config=None,
export_format="csv",
compression_format="gz",
location="US",
)
Salving Cloud Storage in BigQuery’s table
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
bq_client.cloud_storage_to_table(
bucket_name,
filename,
dataset_id,
table_id,
job_config=None,
location="US",
)
Cloud Function¶
Create a function
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
function_name = 'my-function-name'
function_path = '/path/to/function.py'
function_runtime = 'python37'
functions_handler.create_function(
function_name,
function_runtime,
function_path,
)
List all functions
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
for function in functions_handler.list_functions():
print(function.name)
Describe a specific function
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
function_detail = functions_handler.describe_function('my-function-name')
print('Status: {}'.format(function_detail.status))
print('Last update: {}'.format(function_detail.updateTime))
Trigger a function
import json
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
data = json.dumps({'example': 'example'})
functions_handler.call_function('my-function-name', data)
API Reference¶
MLEngine¶
Submit Job to ML Engine
-
class
gcloud_utils.ml_engine.MlEngine(project, bucket_name, region, package_path='packages', job_dir='jobs', http=None, credentials_path=None)[source]¶ Google-ml-engine handler
-
create_model_version(model_name, version, job_id, python_version='', runtime_version='', framework='')[source]¶ Increase Model version
-
delete_older_model_versions(model_name, n_versions_to_keep)[source]¶ Keep the most recents model versions and delete older ones. The number of models to keep is specified by the parameter n_versions_to_keep
-
export_model(clf, model_path='model.pkl')[source]¶ Export a classifier/pipeline to model path. Frameworks supported : XGBoost booster, Scikit-learn estimator and pipelines.
-
increase_model_version(model_name, job_id, python_version='', runtime_version='', framework='')[source]¶ Increase Model version
-
predict_json(project, model, instances, version=None)[source]¶ Send json data to a deployed model for prediction.
-
Compute¶
Module to handle Google Compute Service
Cloud Function¶
Module to handle Google Cloud Functions Service
BigQuery¶
Module to handle Google BigQuery Service
-
class
gcloud_utils.bigquery.bigquery.Bigquery(client=None, log_level=40)[source]¶ Google-Bigquery handler
-
cloud_storage_to_table(bucket_name, filename, dataset_id, table_id, job_config=None, import_format='csv', location='US', **kwargs)[source]¶ Extract table from GoogleStorage and send to BigQuery
-
Dataproc¶
Module to handle with Dataproc cluster
-
class
gcloud_utils.dataproc.Dataproc(project, region, http=None)[source]¶ Module to handle with Dataproc cluster
-
create_cluster(name, workers, workers_names=None, image_version='1.2.54-deb8', disk_size_in_gb=10, metadata=None, initialization_actions=None)[source]¶ Create a cluster
-
Storage¶
Module to download and use files from Google Storage
-
class
gcloud_utils.storage.Storage(bucket, client=None, log_level=40)[source]¶ Google-Storage handler
-
download_file(storage_path, local_path)[source]¶ Download Storage file to local path, creating a path at local_path if nedded
-
get_files_in_path(path, local_path)[source]¶ Download all files from path in Google Storage and return a list with those files
-