Examples

Training

Training jobs

You can also use the Python client to submit training jobs to the ScalarLM server.

import scalarlm

scalarlm.api_url = "https://llama8btensorwave.cray-lm.com"

def get_dataset():
    dataset = []

    count = 5

    for i in range(count):
        dataset.append(
            {"input": f"What is {i} + {i}?", "output": str(i + i)}
        )

    return dataset


llm = scalarlm.SupermassiveIntelligence()

dataset = get_dataset()

status = llm.train(dataset, train_args={"max_steps": 200, "learning_rate": 3e-3})

print(status)

You get a command line output like this:

(environment) gregorydiamos@Air-Gregory cray % python test/deployment/train.py
{'job_id': '1', 'status': 'QUEUED', 'message': 'Training job launched', 'dataset_id': 'dataset', 'job_directory': '/app/cray/jobs/69118a251a074f9f9d37a2ddc903243e428d30c3c31ad019cbf62ac777e42e6e', 'model_name': '69118a251a074f9f9d37a2ddc903243e428d30c3c31ad019cbf62ac777e42e6e'}

Multi-GPU Training

llm.train(dataset, 
  train_args={
    "max_steps": 200, 
    "learning_rate": 3e-3, 
    "gpus" : 2
  }
)

To use multiple GPUs, you can specify the "gpus" argument to llm.train. Jobs will automatically be distributed among GPUs in your ScalarLM deployment.

Custom Training

When you submit a training job using ScalarLM, it will train an LLM using the source code for the data loader, training loop, and pytorch model from the ml/ directory.

You can check out the source code to see how it works at: https://github.com/tensorwavecloud/ScalarLM/tree/main/ml

ScalarLM also allows you to use your own custom model code, e.g. if you want to adjust the model or its hyperparameters. If you check out the ml/ directory from the repo, and put it in the same directory as the your training script, your local version will be uploaded to the ScalarLM server and used for your training job.

For example, consider a directory structure like this:

./train.py
./ml/... # custom ml directory containing your custom training code

In this case, your custom ./ml directory will be used for training jobs submitted by the ./train.py script.

Getting Started

Examples

Command Line

Deployment

Faq

Training

Training jobs

Multi-GPU Training

Custom Training

On this page