Welcome to gcloud-utils’s documentation!

Indices and tables

Quick Start

To install the package you must run

1
pip install gcloud_utils

User Guide

There are two ways to use some functions with GCloud-utils: as CLI or as a python package.

CLI

The are some functions that can be used as a CLI.
  • Saving query result in a BigQuery table
query_to_table dataset table json_key YYYYMMDD query_file -Aquery_arg1=arg -Aquery_arg2=arg"
The command parameters are:
  • YYYMMMDD: date of the script (current time is the default value).
  • -A: parameter to pass args to the query or the query’s file.
  • json_key: credentials to bigquery service.

The CLI allows put some fixed variables in queries:

  • previous_date: previous date of declared current date (YYYYMMDD)
  • start_date: declared current date (YYYYMMDD)
  • next_date: next date of declared current date (YYYYMMDD)
  • Importing table from BigQuery to Cloud Storage
table_to_gcs dataset table bucket cloudstorage_filename json_key YYYYMMDD time_delta export_format compression_format
Where the parameters are:
  • dataset: Name of the dataset save
  • bucket: Name of the bucket save
  • cloudstorage_filename: Path from Google Storage to save
  • json_key: Path to the Google Credentials Application file
  • YYYMMMDD: Date of the script
  • time_delta: Amount of days before current date to get the table
  • Save table from BigQuery in GoogleStorage
gcs_to_table bucket cloudstorage_filename dataset table json_key YYYYMMDD
Where the parameters are:
  • YYYMMMDD: Date of the script
  • json_key: Credentials to bigquery service

Python package

BigQuery

Simple query

from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery

query = "SELECT * FROM bq_table"

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

result = bq_client.query(self, query, **kwargs)

Query with parameters

from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery
from gcloud_utils.bigquery.query_builder import QueryBuilder

query = QueryBuilder("SELECT * FROM ${my_table}")
query.with_vars(my_table="bq_table")

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

result = bq_client.query(self, query)

Saving Query in BigQuery

from google.cloud import bigquery

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

bq_client.query_to_table(
    query_or_object,
    dataset_id,
    table_id,
    write_disposition="WRITE_TRUNCATE",
    job_config=None,
)

Saving BigQuery’s table in Cloud Storage

from google.cloud import bigquery

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

bq_client.table_to_cloud_storage(
    dataset_id,
    table_id,
    bucket_name,
    filename,
    job_config=None,
    export_format="csv",
    compression_format="gz",
    location="US",
)

Salving Cloud Storage in BigQuery’s table

from google.cloud import bigquery

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

bq_client.cloud_storage_to_table(
    bucket_name,
    filename,
    dataset_id,
    table_id,
    job_config=None,
    location="US",
)

Cloud Function

Create a function

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')

function_name = 'my-function-name'
function_path = '/path/to/function.py'
function_runtime = 'python37'

functions_handler.create_function(
    function_name,
    function_runtime,
    function_path,
)

List all functions

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')

for function in functions_handler.list_functions():
    print(function.name)

Describe a specific function

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')
function_detail = functions_handler.describe_function('my-function-name')

print('Status: {}'.format(function_detail.status))
print('Last update: {}'.format(function_detail.updateTime))

Trigger a function

import json

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')

data = json.dumps({'example': 'example'})
functions_handler.call_function('my-function-name', data)

API Reference

MLEngine

Submit Job to ML Engine

class gcloud_utils.ml_engine.MlEngine(project, bucket_name, region, package_path='packages', job_dir='jobs', http=None, credentials_path=None)[source]

Google-ml-engine handler

create_model_version(model_name, version, job_id, python_version='', runtime_version='', framework='')[source]

Increase Model version

create_new_model(name, description='Ml model')[source]

Create new model

delete_model_version(model_name, version)[source]

Delete Model version

delete_older_model_versions(model_name, n_versions_to_keep)[source]

Keep the most recents model versions and delete older ones. The number of models to keep is specified by the parameter n_versions_to_keep

export_model(clf, model_path='model.pkl')[source]

Export a classifier/pipeline to model path. Frameworks supported : XGBoost booster, Scikit-learn estimator and pipelines.

get_job(job_id)[source]

Describes a job

get_model_versions(model_name)[source]

Return all versions

increase_model_version(model_name, job_id, python_version='', runtime_version='', framework='')[source]

Increase Model version

list_jobs(filter_final_state='SUCCEEDED')[source]

List all models in project

list_models()[source]

List all models in project

predict_json(project, model, instances, version=None)[source]

Send json data to a deployed model for prediction.

set_version_as_default(model, version)[source]

Set a model version as default

start_predict_job(job_id_prefix, model, input_path, output_path)[source]

start a prediction job

start_training_job(job_id_prefix, package_name, module, extra_packages=None, runtime_version='1.0', python_version='', scale_tier='', master_type='', worker_type='', parameter_server_type='', worker_count='', parameter_server_count='', **args)[source]

Start a training job

wait_job_to_finish(job_id, sleep_time=60, tries=3)[source]

Waits job to finish

Compute

Module to handle Google Compute Service

class gcloud_utils.compute.Compute(project, zone)[source]

Google-compute-engine handler

start_instance(instance_name)[source]

Start VM by name

stop_instance(instance_name)[source]

Stop VM by name

Cloud Function

Module to handle Google Cloud Functions Service

class gcloud_utils.functions.Functions(project, zone)[source]

Google-Cloud-Functions handler

call_function(name, data)[source]

Call a Cloud Function

create_function(name, runtime, path='/home/docs/checkouts/readthedocs.org/user_builds/gcloud-utils/checkouts/latest/docs/source')[source]

Create and Deploy a Cloud Function

describe_function(name)[source]

Describe a function

list_functions()[source]

List the cloud functions

BigQuery

Module to handle Google BigQuery Service

class gcloud_utils.bigquery.bigquery.Bigquery(client=None, log_level=40)[source]

Google-Bigquery handler

cloud_storage_to_table(bucket_name, filename, dataset_id, table_id, job_config=None, import_format='csv', location='US', **kwargs)[source]

Extract table from GoogleStorage and send to BigQuery

create_dataset(dataset_id)[source]

Create a dataset

create_table(dataset_id, table_id)[source]

Create a table based on dataset

query(query_or_object, **kwargs)[source]

Execute a query

query_to_table(query_or_object, dataset_id, table_id, write_disposition='WRITE_TRUNCATE', job_config=None, **kwargs)[source]

Execute a query in a especific table

table_exists(table_id, dataset_id, project_id=None)[source]

Check if tables exists

table_to_cloud_storage(dataset_id, table_id, bucket_name, filename, job_config=None, export_format='csv', compression_format='gz', location='US', **kwargs)[source]

Extract a table from BigQuery and send to GoogleStorage

Dataproc

Module to handle with Dataproc cluster

class gcloud_utils.dataproc.Dataproc(project, region, http=None)[source]

Module to handle with Dataproc cluster

create_cluster(name, workers, workers_names=None, image_version='1.2.54-deb8', disk_size_in_gb=10, metadata=None, initialization_actions=None)[source]

Create a cluster

delete_cluster(name)[source]

Delete cluster by name

list_clusters()[source]

List all clusters

submit_pyspark_job(cluster_name, gs_bucket, list_args, main_pyspark_file, python_files, archive_uris=None, properties=None)[source]

Submit the pyspark job to cluster, assuming py files at python_files list has already been uploaded to `gs_bucket

submit_spark_job(cluster_name, gs_bucket, list_args, jar_paths, main_class, properties=None)[source]

Submits the Spark job to the cluster, assuming jars at jar_paths list has already been uploaded to gs_bucket

Storage

Module to download and use files from Google Storage

class gcloud_utils.storage.Storage(bucket, client=None, log_level=40)[source]

Google-Storage handler

delete_file(storage_path)[source]

Deletes a blob from the bucket.

delete_path(storage_path)[source]

Deletes all the blobs with storage_path prefix

download_file(storage_path, local_path)[source]

Download Storage file to local path, creating a path at local_path if nedded

download_files(path, local_path, filter_suffix=None)[source]

Download all files in path

get_abs_path(storage_path)[source]

get abs path from GStorage

get_file(file_path, local_path)[source]

Get all files from Storage path

get_files_in_path(path, local_path)[source]

Download all files from path in Google Storage and return a list with those files

list_files(path, filter_suffix=None)[source]

List all blobs in path

ls(path)[source]

List files directly under specified path

path_exists_storage(path)[source]

Check if path exists on Storage

rename_files(storage_prefix, new_path)[source]

Renames all the blobs with storage_prefix prefix

upload_file(storage_path, local_path)[source]

Upload one local file to Storage

upload_path(storage_path_base, local_path_base)[source]

Upload all filer from local path to Storage

upload_value(storage_path, value)[source]

Upload a value to Storage