Toolbox Documentation ===================== ``busy_cluster`` **************** :: Commands relating to make a cluster busy with lot of resources * :doc:`cleanup ` Cleanups namespaces to make a cluster un-busy * :doc:`create_configmaps ` Creates configmaps and secrets to make a cluster busy * :doc:`create_deployments ` Creates configmaps and secrets to make a cluster busy * :doc:`create_jobs ` Creates jobs to make a cluster busy * :doc:`create_namespaces ` Creates namespaces to make a cluster busy * :doc:`status ` Shows the busyness of the cluster ``cluster`` *********** :: Commands relating to cluster scaling, upgrading and environment capture * :doc:`build_push_image ` Build and publish an image to quay using either a Dockerfile or git repo. * :doc:`capture_environment ` Captures the cluster environment * :doc:`create_htpasswd_adminuser ` Create an htpasswd admin user. * :doc:`create_osd ` Create an OpenShift Dedicated cluster. * :doc:`deploy_operator ` Deploy an operator from OperatorHub catalog entry. * :doc:`destroy_ocp ` Destroy an OpenShift cluster * :doc:`destroy_osd ` Destroy an OpenShift Dedicated cluster. * :doc:`dump_prometheus_db ` Dump Prometheus database into a file * :doc:`fill_workernodes ` Fills the worker nodes with place-holder Pods with the maximum available amount of a given resource name. * :doc:`preload_image ` Preload a container image on all the nodes of a cluster. * :doc:`query_prometheus_db ` Query Prometheus with a list of PromQueries read in a file * :doc:`reset_prometheus_db ` Resets Prometheus database, by destroying its Pod * :doc:`set_project_annotation ` Set an annotation on a given project, or for any new projects. * :doc:`set_scale ` Ensures that the cluster has exactly `scale` nodes with instance_type `instance_type` * :doc:`update_pods_per_node ` Update the maximum number of Pods per Nodes, and Pods per Core See alse: https://docs.openshift.com/container-platform/4.14/nodes/nodes/nodes-nodes-managing-max-pods.html * :doc:`upgrade_to_image ` Upgrades the cluster to the given image * :doc:`wait_fully_awake ` Waits for the cluster to be fully awake after Hive restart ``configure`` ************* :: Commands relating to TOPSAIL testing configuration * :doc:`apply ` Applies a preset (or a list of presets) to the current configuration file * :doc:`enter ` Enter into a custom configuration file for a TOPSAIL project * :doc:`get ` Gives the value of a given key, in the current configuration file * :doc:`name ` Gives the name of the current configuration ``cpt`` ******* :: Commands relating to continuous performance testing management * :doc:`deploy_cpt_dashboard ` Deploy and configure the CPT Dashboard ``fine_tuning`` *************** :: Commands relating to RHOAI scheduler testing * :doc:`ray_fine_tuning_job ` Run a simple Ray fine-tuning Job. * :doc:`run_fine_tuning_job ` Run a simple fine-tuning Job. * :doc:`run_quality_evaluation ` Run a simple fine-tuning Job. ``run`` ******* :: Run `topsail` toolbox commands from a single config file. ``gpu_operator`` **************** :: Commands for deploying, building and testing the GPU operator in various ways * :doc:`capture_deployment_state ` Captures the GPU operator deployment state * :doc:`deploy_cluster_policy ` Creates the ClusterPolicy from the OLM ClusterServiceVersion * :doc:`deploy_from_bundle ` Deploys the GPU Operator from a bundle * :doc:`deploy_from_operatorhub ` Deploys the GPU operator from OperatorHub * :doc:`enable_time_sharing ` Enable time-sharing in the GPU Operator ClusterPolicy * :doc:`extend_metrics ` Enable time-sharing in the GPU Operator ClusterPolicy * :doc:`get_csv_version ` Get the version of the GPU Operator currently installed from OLM Stores the version in the 'ARTIFACT_EXTRA_LOGS_DIR' artifacts directory. * :doc:`run_gpu_burn ` Runs the GPU burn on the cluster * :doc:`undeploy_from_operatorhub ` Undeploys a GPU-operator that was deployed from OperatorHub * :doc:`wait_deployment ` Waits for the GPU operator to deploy * :doc:`wait_stack_deployed ` Waits for the GPU Operator stack to be deployed on the GPU nodes ``kepler`` ********** :: Commands relating to kepler deployment * :doc:`deploy_kepler ` Deploy the Kepler operator and monitor to track energy consumption * :doc:`undeploy_kepler ` Cleanup the Kepler operator and associated resources ``kserve`` ********** :: Commands relating to RHOAI KServe component * :doc:`capture_operators_state ` Captures the state of the operators of the KServe serving stack * :doc:`capture_state ` Captures the state of the KServe stack in a given namespace * :doc:`deploy_model ` Deploy a KServe model * :doc:`extract_protos ` Extracts the protos of an inference service * :doc:`extract_protos_grpcurl ` Extracts the protos of an inference service, with GRPCurl observe * :doc:`undeploy_model ` Undeploy a KServe model * :doc:`validate_model ` Validate the proper deployment of a KServe model ``kubemark`` ************ :: Commands relating to kubemark deployment * :doc:`deploy_capi_provider ` Deploy the Kubemark Cluster-API provider * :doc:`deploy_nodes ` Deploy a set of Kubemark nodes ``kwok`` ******** :: Commands relating to KWOK deployment * :doc:`deploy_kwok_controller ` Deploy the KWOK hollow node provider * :doc:`set_scale ` Deploy a set of KWOK nodes ``llm_load_test`` ***************** :: Commands relating to llm-load-test * :doc:`run ` Load test the wisdom model ``local_ci`` ************ :: Commands to run the CI scripts in a container environment similar to the one used by the CI * :doc:`run ` Runs a given CI command * :doc:`run_multi ` Runs a given CI command in parallel from multiple Pods ``nfd`` ******* :: Commands for NFD related tasks * :doc:`has_gpu_nodes ` Checks if the cluster has GPU nodes * :doc:`has_labels ` Checks if the cluster has NFD labels * :doc:`wait_gpu_nodes ` Wait until nfd find GPU nodes * :doc:`wait_labels ` Wait until nfd labels the nodes ``nfd_operator`` **************** :: Commands for deploying, building and testing the NFD operator in various ways * :doc:`deploy_from_operatorhub ` Deploys the NFD Operator from OperatorHub * :doc:`undeploy_from_operatorhub ` Undeploys an NFD-operator that was deployed from OperatorHub ``notebooks`` ************* :: Commands relating to RHOAI Notebooks * :doc:`benchmark_performance ` Benchmark the performance of a notebook image. * :doc:`capture_state ` Capture information about the cluster and the RHODS notebooks deployment * :doc:`cleanup ` Clean up the resources created along with the notebooks, during the scale tests. * :doc:`dashboard_scale_test ` End-to-end scale testing of ROAI dashboard scale test, at user level. * :doc:`locust_scale_test ` End-to-end testing of RHOAI notebooks at scale, at API level * :doc:`ods_ci_scale_test ` End-to-end scale testing of ROAI notebooks, at user level. ``pipelines`` ************* :: Commands relating to RHODS * :doc:`capture_state ` Captures the state of a Data Science Pipeline Application in a given namespace. * :doc:`deploy_application ` Deploy a Data Science Pipeline Application in a given namespace. * :doc:`run_kfp_notebook ` Run a notebook in a given notebook image. ``repo`` ******** :: Commands to perform consistency validations on this repo itself * :doc:`generate_ansible_default_settings ` Generate the `defaults/main/config.yml` file of the Ansible roles, based on the Python definition. * :doc:`generate_middleware_ci_secret_boilerplate ` Generate the boilerplace code to include a new secret in the Middleware CI configuration * :doc:`generate_toolbox_related_files ` Generate the rst document and Ansible default settings, based on the Toolbox Python definition. * :doc:`generate_toolbox_rst_documentation ` Generate the `doc/toolbox.generated/*.rst` file, based on the Toolbox Python definition. * :doc:`send_job_completion_notification ` Send a *job completion* notification to github and/or slack about the completion of a test job. * :doc:`validate_no_broken_link ` Ensure that all the symlinks point to a file * :doc:`validate_no_wip ` Ensures that none of the commits have the WIP flag in their message title. * :doc:`validate_role_files ` Ensures that all the Ansible variables defining a filepath (`project/*/toolbox/`) do point to an existing file. * :doc:`validate_role_vars_used ` Ensure that all the Ansible variables defined are actually used in their role (with an exception for symlinks) ``rhods`` ********* :: Commands relating to RHODS * :doc:`capture_state ` Captures the state of the RHOAI deployment * :doc:`delete_ods ` Forces ODS operator deletion * :doc:`deploy_addon ` Installs the RHODS OCM addon * :doc:`deploy_ods ` Deploy ODS operator from its custom catalog * :doc:`dump_prometheus_db ` Dump Prometheus database into a file * :doc:`reset_prometheus_db ` Resets RHODS Prometheus database, by destroying its Pod. * :doc:`undeploy_ods ` Undeploy ODS operator * :doc:`update_datasciencecluster ` Update RHOAI datasciencecluster resource * :doc:`wait_odh ` Wait for ODH to finish its deployment * :doc:`wait_ods ` Wait for ODS to finish its deployment ``scheduler`` ************* :: Commands relating to RHOAI scheduler testing * :doc:`cleanup ` Clean up the scheduler load namespace * :doc:`create_mcad_canary ` Create a canary for MCAD Appwrappers and track the time it takes to be scheduled * :doc:`deploy_mcad_from_helm ` Deploys MCAD from helm * :doc:`generate_load ` Generate scheduler load ``server`` ********** :: Commands relating to the deployment of servers on OpenShift * :doc:`deploy_ldap ` Deploy OpenLDAP and LDAP Oauth * :doc:`deploy_minio_s3_server ` Deploy Minio S3 server * :doc:`deploy_nginx_server ` Deploy an NGINX HTTP server * :doc:`deploy_opensearch ` Deploy OpenSearch and OpenSearch-Dashboards * :doc:`deploy_redis_server ` Deploy a redis server * :doc:`undeploy_ldap ` Undeploy OpenLDAP and LDAP Oauth ``storage`` *********** :: Commands relating to OpenShift file storage * :doc:`deploy_aws_efs ` Deploy AWS EFS CSI driver and configure AWS accordingly. * :doc:`deploy_nfs_provisioner ` Deploy NFS Provisioner * :doc:`download_to_pvc ` Downloads the a dataset into a PVC of the cluster