Welcome to gcloud-utils’s documentation!¶
Indices and tables¶
User Guide¶
There are two ways to use some functions with GCloud-utils: as CLI or as a python package.
CLI¶
The are some functions that can be used as a CLI.
- Saving query result in a BigQuery table
query_to_table dataset table json_key YYYYMMDD query_file -Aquery_arg1=arg -Aquery_arg2=arg"
- The command parameters are:
YYYMMMDD
: date of the script (current time is the default value).-A
: parameter to pass args to the query or the query’s file.json_key
: credentials to bigquery service.
The CLI allows put some fixed variables in queries:
previous_date
: previous date of declared current date (YYYYMMDD)start_date
: declared current date (YYYYMMDD)next_date
: next date of declared current date (YYYYMMDD)
- Importing table from BigQuery to Cloud Storage
table_to_gcs dataset table bucket cloudstorage_filename json_key YYYYMMDD time_delta export_format compression_format
- Where the parameters are:
dataset
: Name of the dataset savebucket
: Name of the bucket savecloudstorage_filename
: Path from Google Storage to savejson_key
: Path to the Google Credentials Application fileYYYMMMDD
: Date of the scripttime_delta
: Amount of days before current date to get the table
- Save table from BigQuery in GoogleStorage
gcs_to_table bucket cloudstorage_filename dataset table json_key YYYYMMDD
- Where the parameters are:
YYYMMMDD
: Date of the scriptjson_key
: Credentials to bigquery service
Python package¶
BigQuery¶
Simple query
from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery
query = "SELECT * FROM bq_table"
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
result = bq_client.query(self, query, **kwargs)
Query with parameters
from google.cloud import bigquery
from gcloud_utils.bigquery.bigquery import Bigquery
from gcloud_utils.bigquery.query_builder import QueryBuilder
query = QueryBuilder("SELECT * FROM ${my_table}")
query.with_vars(my_table="bq_table")
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
result = bq_client.query(self, query)
Saving Query in BigQuery
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
bq_client.query_to_table(
query_or_object,
dataset_id,
table_id,
write_disposition="WRITE_TRUNCATE",
job_config=None,
)
Saving BigQuery’s table in Cloud Storage
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
bq_client.table_to_cloud_storage(
dataset_id,
table_id,
bucket_name,
filename,
job_config=None,
export_format="csv",
compression_format="gz",
location="US",
)
Salving Cloud Storage in BigQuery’s table
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(args.gcs_key_json)
bq_client = Bigquery(client)
bq_client.cloud_storage_to_table(
bucket_name,
filename,
dataset_id,
table_id,
job_config=None,
location="US",
)
Cloud Function¶
Create a function
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
function_name = 'my-function-name'
function_path = '/path/to/function.py'
function_runtime = 'python37'
functions_handler.create_function(
function_name,
function_runtime,
function_path,
)
List all functions
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
for function in functions_handler.list_functions():
print(function.name)
Describe a specific function
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
function_detail = functions_handler.describe_function('my-function-name')
print('Status: {}'.format(function_detail.status))
print('Last update: {}'.format(function_detail.updateTime))
Trigger a function
import json
from gcloud_utils.functions import Functions
functions_handler = Functions('my-project', 'us-central1-a')
data = json.dumps({'example': 'example'})
functions_handler.call_function('my-function-name', data)
API Reference¶
MLEngine¶
Submit Job to ML Engine
-
class
gcloud_utils.ml_engine.
MlEngine
(project, bucket_name, region, package_path='packages', job_dir='jobs', http=None, credentials_path=None)[source]¶ Google-ml-engine handler
-
create_model_version
(model_name, version, job_id, python_version='', runtime_version='', framework='')[source]¶ Increase Model version
-
delete_older_model_versions
(model_name, n_versions_to_keep)[source]¶ Keep the most recents model versions and delete older ones. The number of models to keep is specified by the parameter n_versions_to_keep
-
export_model
(clf, model_path='model.pkl')[source]¶ Export a classifier/pipeline to model path. Frameworks supported : XGBoost booster, Scikit-learn estimator and pipelines.
-
increase_model_version
(model_name, job_id, python_version='', runtime_version='', framework='')[source]¶ Increase Model version
-
predict_json
(project, model, instances, version=None)[source]¶ Send json data to a deployed model for prediction.
-
Compute¶
Module to handle Google Compute Service
Cloud Function¶
Module to handle Google Cloud Functions Service
BigQuery¶
Module to handle Google BigQuery Service
-
class
gcloud_utils.bigquery.bigquery.
Bigquery
(client=None, log_level=40)[source]¶ Google-Bigquery handler
-
cloud_storage_to_table
(bucket_name, filename, dataset_id, table_id, job_config=None, import_format='csv', location='US', **kwargs)[source]¶ Extract table from GoogleStorage and send to BigQuery
-
Dataproc¶
Module to handle with Dataproc cluster
-
class
gcloud_utils.dataproc.
Dataproc
(project, region, http=None)[source]¶ Module to handle with Dataproc cluster
-
create_cluster
(name, workers, workers_names=None, image_version='1.2.54-deb8', disk_size_in_gb=10, metadata=None, initialization_actions=None)[source]¶ Create a cluster
-
Storage¶
Module to download and use files from Google Storage
-
class
gcloud_utils.storage.
Storage
(bucket, client=None, log_level=40)[source]¶ Google-Storage handler
-
download_file
(storage_path, local_path)[source]¶ Download Storage file to local path, creating a path at local_path if nedded
-
get_files_in_path
(path, local_path)[source]¶ Download all files from path in Google Storage and return a list with those files
-