Data format for batch inferencing of finetuned phi-3 model
I created a batch endpoint and deployed the model there. I created a job and provided the data that I formatted in the manner shown in the second image. I simple passed the whole prompt as a string in a single column dataframe which I converted into csv before writing it in a .csv file.
The data format (.csv) that I am using for inferencing produces the error : “each data point should be a conversation array” when running the batch scoring job. All the documentations provided online deal with image data so I am unable to figure out the format for my data. I am trying to create a job from the “Create job” button under the batch endpoint. I am using Azure ML platform and I have fine-tuned a phi-3-mini-4k-instruct model using the “eurlex” data available on huggingface. The training process required a jsonl format for the training data. However, while trying the run the batch inference, the data assets are only stored in csv,png etc. formats.The training data looked something like the first image. I converted this into jsonl format. I created a batch endpoint and deployed the model there. I created a job and provided the data that I formatted in the manner shown in the second image. I simple passed the whole prompt as a string in a single column dataframe which I converted into csv before writing it in a .csv file. I have also tried making a dataframe with 3 columns – system,assistant,user but it doesn’t work either. Read More