Welcome to gcloud-utils’s documentation!¶

Indices and tables¶

Quick Start¶

To install the package you must run

1	pip install gcloud_utils

User Guide¶

There are two ways to use some functions with GCloud-utils: as CLI or as a python package.

CLI¶

The are some functions that can be used as a CLI.

Saving query result in a BigQuery table

query_to_table dataset table json_key YYYYMMDD query_file -Aquery_arg1=arg -Aquery_arg2=arg"

The command parameters are:

YYYMMMDD: date of the script (current time is the default value).
-A: parameter to pass args to the query or the query’s file.
json_key: credentials to bigquery service.

The CLI allows put some fixed variables in queries:

previous_date: previous date of declared current date (YYYYMMDD)

start_date: declared current date (YYYYMMDD)

next_date: next date of declared current date (YYYYMMDD)

Importing table from BigQuery to Cloud Storage

table_to_gcs dataset table bucket cloudstorage_filename json_key YYYYMMDD time_delta export_format compression_format

Where the parameters are:

dataset: Name of the dataset save
bucket: Name of the bucket save
cloudstorage_filename: Path from Google Storage to save
json_key: Path to the Google Credentials Application file
YYYMMMDD: Date of the script
time_delta: Amount of days before current date to get the table

Save table from BigQuery in GoogleStorage

gcs_to_table bucket cloudstorage_filename dataset table json_key YYYYMMDD

Where the parameters are:

YYYMMMDD: Date of the script
json_key: Credentials to bigquery service

Python package¶

BigQuery¶

Simple query

from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery

query = "SELECT * FROM bq_table"

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

result = bq_client.query(self, query, **kwargs)

Query with parameters

from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery
from gcloud_utils.bigquery.query_builder import QueryBuilder

query = QueryBuilder("SELECT * FROM ${my_table}")
query.with_vars(my_table="bq_table")

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

result = bq_client.query(self, query)

Saving Query in BigQuery

from google.cloud import bigquery

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

bq_client.query_to_table(
    query_or_object,
    dataset_id,
    table_id,
    write_disposition="WRITE_TRUNCATE",
    job_config=None,
)

Saving BigQuery’s table in Cloud Storage

from google.cloud import bigquery

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

bq_client.table_to_cloud_storage(
    dataset_id,
    table_id,
    bucket_name,
    filename,
    job_config=None,
    export_format="csv",
    compression_format="gz",
    location="US",
)

Salving Cloud Storage in BigQuery’s table

from google.cloud import bigquery

client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)

bq_client.cloud_storage_to_table(
    bucket_name,
    filename,
    dataset_id,
    table_id,
    job_config=None,
    location="US",
)

Cloud Function¶

Create a function

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')

function_name = 'my-function-name'
function_path = '/path/to/function.py'
function_runtime = 'python37'

functions_handler.create_function(
    function_name,
    function_runtime,
    function_path,
)

List all functions

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')

for function in functions_handler.list_functions():
    print(function.name)

Describe a specific function

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')
function_detail = functions_handler.describe_function('my-function-name')

print('Status: {}'.format(function_detail.status))
print('Last update: {}'.format(function_detail.updateTime))

Trigger a function

import json

from gcloud_utils.functions import Functions

functions_handler = Functions('my-project', 'us-central1-a')

data = json.dumps({'example': 'example'})
functions_handler.call_function('my-function-name', data)

API Reference¶

MLEngine¶

Submit Job to ML Engine

class gcloud_utils.ml_engine.MlEngine(project, bucket_name, region, package_path='packages', job_dir='jobs', http=None, credentials_path=None)[source]¶

Google-ml-engine handler

create_model_version(model_name, version, job_id, python_version='', runtime_version='', framework='')[source]¶: Increase Model version

create_new_model(name, description='Ml model')[source]¶: Create new model

delete_model_version(model_name, version)[source]¶: Delete Model version

delete_older_model_versions(model_name, n_versions_to_keep)[source]¶: Keep the most recents model versions and delete older ones. The number of models to keep is specified by the parameter n_versions_to_keep

export_model(clf, model_path='model.pkl')[source]¶: Export a classifier/pipeline to model path. Frameworks supported : XGBoost booster, Scikit-learn estimator and pipelines.

get_job(job_id)[source]¶: Describes a job

get_model_versions(model_name)[source]¶: Return all versions

increase_model_version(model_name, job_id, python_version='', runtime_version='', framework='')[source]¶: Increase Model version

list_jobs(filter_final_state='SUCCEEDED')[source]¶: List all models in project

list_models()[source]¶: List all models in project

predict_json(project, model, instances, version=None)[source]¶: Send json data to a deployed model for prediction.

set_version_as_default(model, version)[source]¶: Set a model version as default

start_predict_job(job_id_prefix, model, input_path, output_path)[source]¶: start a prediction job

start_training_job(job_id_prefix, package_name, module, extra_packages=None, runtime_version='1.0', python_version='', scale_tier='', master_type='', worker_type='', parameter_server_type='', worker_count='', parameter_server_count='', **args)[source]¶: Start a training job

wait_job_to_finish(job_id, sleep_time=60, tries=3)[source]¶: Waits job to finish

Compute¶

Module to handle Google Compute Service

class gcloud_utils.compute.Compute(project, zone)[source]¶

Google-compute-engine handler

start_instance(instance_name)[source]¶: Start VM by name

stop_instance(instance_name)[source]¶: Stop VM by name

Cloud Function¶

Module to handle Google Cloud Functions Service

class gcloud_utils.functions.Functions(project, zone)[source]¶

Google-Cloud-Functions handler

call_function(name, data)[source]¶: Call a Cloud Function

create_function(name, runtime, path='/home/docs/checkouts/readthedocs.org/user_builds/gcloud-utils/checkouts/latest/docs/source')[source]¶: Create and Deploy a Cloud Function

describe_function(name)[source]¶: Describe a function

list_functions()[source]¶: List the cloud functions

BigQuery¶

Module to handle Google BigQuery Service

class gcloud_utils.bigquery.bigquery.Bigquery(client=None, log_level=40)[source]¶

Google-Bigquery handler

cloud_storage_to_table(bucket_name, filename, dataset_id, table_id, job_config=None, import_format='csv', location='US', **kwargs)[source]¶: Extract table from GoogleStorage and send to BigQuery

create_dataset(dataset_id)[source]¶: Create a dataset

create_table(dataset_id, table_id)[source]¶: Create a table based on dataset

query(query_or_object, **kwargs)[source]¶: Execute a query

query_to_table(query_or_object, dataset_id, table_id, write_disposition='WRITE_TRUNCATE', job_config=None, **kwargs)[source]¶: Execute a query in a especific table

table_exists(table_id, dataset_id, project_id=None)[source]¶: Check if tables exists

table_to_cloud_storage(dataset_id, table_id, bucket_name, filename, job_config=None, export_format='csv', compression_format='gz', location='US', **kwargs)[source]¶: Extract a table from BigQuery and send to GoogleStorage

Dataproc¶

Module to handle with Dataproc cluster

class gcloud_utils.dataproc.Dataproc(project, region, http=None)[source]¶

Module to handle with Dataproc cluster

create_cluster(name, workers, workers_names=None, image_version='1.2.54-deb8', disk_size_in_gb=10, metadata=None, initialization_actions=None)[source]¶: Create a cluster

delete_cluster(name)[source]¶: Delete cluster by name

list_clusters()[source]¶: List all clusters

submit_pyspark_job(cluster_name, gs_bucket, list_args, main_pyspark_file, python_files, archive_uris=None, properties=None)[source]¶: Submit the pyspark job to cluster, assuming py files at python_files list has already been uploaded to `gs_bucket

submit_spark_job(cluster_name, gs_bucket, list_args, jar_paths, main_class, properties=None)[source]¶: Submits the Spark job to the cluster, assuming jars at jar_paths list has already been uploaded to gs_bucket

Storage¶

Module to download and use files from Google Storage

class gcloud_utils.storage.Storage(bucket, client=None, log_level=40)[source]¶

Google-Storage handler

delete_file(storage_path)[source]¶: Deletes a blob from the bucket.

delete_path(storage_path)[source]¶: Deletes all the blobs with storage_path prefix

download_file(storage_path, local_path)[source]¶: Download Storage file to local path, creating a path at local_path if nedded

download_files(path, local_path, filter_suffix=None)[source]¶: Download all files in path

get_abs_path(storage_path)[source]¶: get abs path from GStorage

get_file(file_path, local_path)[source]¶: Get all files from Storage path

get_files_in_path(path, local_path)[source]¶: Download all files from path in Google Storage and return a list with those files

list_files(path, filter_suffix=None)[source]¶: List all blobs in path

ls(path)[source]¶: List files directly under specified path

path_exists_storage(path)[source]¶: Check if path exists on Storage

rename_files(storage_prefix, new_path)[source]¶: Renames all the blobs with storage_prefix prefix

upload_file(storage_path, local_path)[source]¶: Upload one local file to Storage

upload_path(storage_path_base, local_path_base)[source]¶: Upload all filer from local path to Storage

upload_value(storage_path, value)[source]¶: Upload a value to Storage

Welcome to gcloud-utils’s documentation!¶

Indices and tables¶

Quick Start¶

User Guide¶

CLI¶

Python package¶

BigQuery¶

Cloud Function¶

API Reference¶

MLEngine¶

Compute¶

Cloud Function¶

BigQuery¶

Dataproc¶

Storage¶

gcloud-utils

Navigation

Related Topics