Running Pipelines: Using your PlantCV Pipeline to Batch Process Images

Introduction

This document provides instructions for running script on multiple images (parallelizing workflow) as well as running a plantCV pipeline over a flat directory of images.

 

How to Run Pipelines

Running Script on Multiple Images (Parallelizing Workflow)

Once a satisfactory workflow has been developed and tested in Jupyter, the next step is to translate it from a Jupyter notebook to a Python script so the workflow can be used on many images (batch image analysis).

Step 1. Convert the Jupyter notebook into a Python script.

To make a script that is compatible with the plantcv-workflow.py program, you first must convert the Jupyter notebook to a Python script.

This can be done through the Jupyter web interface or on the command line.

Option 1. Convert using the Jupyter web interface. If converting to a .py file through the Jupyter web interface, go to File > Download as > Python (.py) (Fig. 1A). The converted file will be saved to your “downloads” folder.

Fig. 1A. File > Download as > Python (.py).

Option 2. Convert using the command line. If converting to a .py file through the command line, in Anaconda select the green arrow next to the environment you are working in. Then select, “Open terminal” (Fig. 1B).

Fig. 1B. In Anaconda, select the green arrow next to plantCV (or whatever environment you are working in). Then select, “Open Terminal”.
  • Use the “cd” function to navigate to the location where your Jupyter notebook is located.

  • Once in the correct location in the terminal, run the following command: jupyter nbconvert --to python notebook.ipynb

    • In the above example, the resulting Python script will be named “notebook.py” and will be saved in the same location as your Jupyter notebook script.

Step 2. Open the Python script in a text editor.

Open the Python script in a text editor such as IDLE, Sublime Text, etc.

Recommendation: If you don’t have a text editor installed, BI suggests “Sublime Text”, available to download for free at https://www.sublimetext.com.

Step 3. Modify the Python script.

Several modifications to the python script are needed before you can successfully run the script on your images:

  • Remove get_ipython().magic('matplotlib inline') and add import argparse (Fig. 2).

  • Make sure your list of imported packages are all at the very top of the script (not distributed throughout the script as you may have assembled it in your Jupyter notebook) (Fig. 2).

  • Remove or comment (#) out any mention of pcv.params.debug =”plot” in your script and add pcv.params.debug = None to the head of the document (Fig. 2).

Warning. If you fail to remove or comment out all instances of pcv.params.debug =”plot”, your script will run in Python until it gets to the point where this command indicates a plot should be printed off for debugging. Then, the only way the script will proceed is if the user manually closes out each plot that is called.

  • Next, remove the following lines of code that were in the Jupyter notebook and were necessary to run single images (Fig. 4):

  • Replace all the lines that were removed (Fig. 4) with a function for parsing command line options using argparse, shown in Figure 5 below:

  • Following the calls to import required packages and the code for the parsing function (Fig. 5) that should be at the head of the script, all of the remaining script needs to be indented within a function called main.

    • To do this, add a main function and indent all of the code within main.

Example:

def main():

# all the code from Jupyter notebook

if __name__ == '__main__':

    main()

 

  • Within the main function, call the optionsfunction to get the values of the command-line options, as shown in Figure 6.

View the fully modified script.

The fully modified Python script should be structured like the example below.

Running a PlantCV Pipeline over a Flat Directory of Images

Once the workflow has been translated and properly formatted into a Python script, the next task is to parallelize that script over a set of images. PlantCV can analyze images in parallel that are stored in a directory (including subdirectories).

To run a pipeline over a directory of images, we need to set up a configuration file. A major purpose of this configuration file is telling plantCV how to interpret file names of images. Consistency in the naming convention of images is key.

Ideally, image filenames are constructed of metadata information separated by a consistent delimiter.

 

Step 1. Set up a configuration file.

In the terminal, setup a configuration file to give instructions on

  • where the images are located (input directory; “input_dir”),

  • where to save output (image output directory; “img_outdir”),

  • where the workflow (.py pipeline file) is located (“workflow”),  

  • where we want to run this process (“cluster”)

  • the configuration for running this process (“cluser_config”)

  • the image naming convention (“filename_metadata”, “timestampformat”)

Save the configuration file in .json format.

Step 2. Verify all resources are located in a single folder.

Check to make sure that you have all of the following resources in the same folder:

  • .py workflow,

  • .json configuration file

  • all the images to be analyzed

Step 3. Run the workflow in the terminal.

In a terminal launched from your Anaconda “plantcv” environment, run your pipeline using the following command: plantcv-workflow.py –config ConfigurationFileName.json

  • In the above command, edit the “ConfigurationFileName.json” to reflect the name of your configuration file.

 

Related articles