

It does not appear to impact output quality. Including torch_dtype=torch.bfloat16 is generally recommended if this type is supported in order to reduce memory usage. This loads a custom InstructionTextGenerationPipelineįound in the model repo here, which is why trust_remote_code=True is required.

The instruction following pipeline can be loaded using the pipeline function as shown below. To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers and accelerate libraries installed. On a ~15K record instruction corpus generated by Databricks employees and released under a permissive license (CC-BY-SA) Running inference for various GPU configurations.ĭolly-v2-12b is a 12 billion parameter causal language model created by Databricks that is derived from Please refer to the dolly GitHub repo for tips on
