在 WDL 中撰寫工作流程在 Nextflow 中撰寫工作流程在 CWL 中撰寫工作流程工作流程定義範例 WDL 工作流程定義範例

撰寫 HealthOmics 工作流程的工作流程定義

HealthOmics 支援以 WDL、Nextflow 或 CWL 撰寫的工作流程定義。若要進一步了解這些工作流程語言，請參閱 WDL、Nextflow 或 CWL 的規格。

HealthOmics 支援三種工作流程定義語言的版本管理。如需詳細資訊，請參閱HealthOmics 工作流程定義語言的版本支援。

主題

在 WDL 中撰寫工作流程
在 Nextflow 中撰寫工作流程
在 CWL 中撰寫工作流程
工作流程定義範例
WDL 工作流程定義範例

在 WDL 中撰寫工作流程

下表顯示 WDL 中的輸入如何對應至相符的基本類型或複雜的 JSON 類型。類型強制受到限制，且盡可能明確類型。

基本類型
WDL 類型	JSON 類型	範例 WDL	範例 JSON 金鑰和值	備註
`Boolean`	`boolean`	`Boolean b`	`"b": true`	值必須是小寫且沒有引號。
`Int`	`integer`	`Int i`	`"i": 7`	必須取消引號。
`Float`	`number`	`Float f`	`"f": 42.2`	必須取消引號。
`String`	`string`	`String s`	`"s": "characters"`	做為 URI 的 JSON 字串必須映射至要匯入的 WDL 檔案。
`File`	`string`	`File f`	`"f": "s3://amzn-s3-demo-bucket1/path/to/file"`	只要為工作流程提供的 IAM 角色具有這些物件的讀取存取權，Amazon S3 和 HealthOmics 儲存 URIs 就會匯入。不支援其他 URI 結構描述（例如 `file://`、 `https://`和 `ftp://`)。URI 必須指定物件。它不能是目錄，表示它不能以結尾`/`。
`Directory`	`string`	`Directory d`	`"d": "s3://bucket/path/"`	`Directory` 類型不包含在 WDL 1.0 或 1.1 中，因此您需要將 `version development`新增至 WDL 檔案的標頭。URI 必須是 Amazon S3 URI，且字首結尾為 '/'。目錄的所有內容都會以單一下載方式遞迴複製到工作流程。`Directory` 應該只包含與工作流程相關的檔案。

WDL 中的複雜類型是由基本類型組成的資料結構。清單等資料結構將轉換為陣列。

複雜類型
WDL 類型	JSON 類型	範例 WDL	範例 JSON 金鑰和值	備註
`Array`	`array`	`Array[Int] nums`	`“nums": [1, 2, 3]`	陣列的成員必須遵循 WDL 陣列類型的格式。
`Pair`	`object`	`Pair[String, Int] str_to_i`	`“str_to_i": {"left": "0", "right": 1}`	配對的每個值都必須使用其相符 WDL 類型的 JSON 格式。
`Map`	`object`	`Map[Int, String] int_to_string`	`"int_to_string": { 2: "hello", 1: "goodbye" }`	映射中的每個項目都必須使用其相符 WDL 類型的 JSON 格式。
`Struct`	`object`	`struct SampleBamAndIndex { String sample_name File bam File bam_index } SampleBamAndIndex b_and_i`	`"b_and_i": { "sample_name": "NA12878", "bam": "s3://amzn-s3-demo-bucket1/NA12878.bam", "bam_index": "s3://amzn-s3-demo-bucket1/NA12878.bam.bai" }`	結構成員的名稱必須與 JSON 物件金鑰的名稱完全相符。每個值必須使用相符 WDL 類型的 JSON 格式。
`Object`	N/A	N/A	N/A	WDL `Object`類型已過期，`Struct`在所有情況下都應該由取代。

HealthOmics 工作流程引擎不支援合格或命名空間的輸入參數。WDL 語言不會指定合格參數的處理及其對 WDL 參數的映射，而且可能模棱兩可。基於這些原因，最佳實務是宣告頂層（主要）工作流程定義檔案中的所有輸入參數，並使用標準 WDL 機制將其傳遞至子工作流程呼叫。

在 Nextflow 中撰寫工作流程

HealthOmics 支援 Nextflow DSL1 和 DSL2。如需詳細資訊，請參閱Nextflow 版本支援。

Nextflow DSL2 是以 Groovy 程式設計語言為基礎，因此參數是動態的，而且類型強制可以使用與 Groovy 相同的規則。輸入 JSON 提供的參數和值可在工作流程的參數 (params) 映射中使用。

注意

HealthOmics 支援 Nextflow 版本 v23.10 （但不支援 v22.04) 的 nf-schema和 nf-validation 外掛程式。

下列資訊與將這些外掛程式與 Nextflow v23.10 工作流程搭配使用相關：

HealthOmics 會預先安裝 nf-schema@2.3.0 和 nf-validation@1.1.1 外掛程式。HealthOmics 會忽略您在 nextflow.config 檔案中指定的任何其他外掛程式版本。
您無法在工作流程執行期間擷取其他外掛程式。
在 Nextflow v24.04 及更高版本中，nf-validation外掛程式會重新命名為 nf-schema。如需詳細資訊，請參閱 Nextflow GitHub 儲存庫中的 nf-schema。

使用 Amazon S3 或 HealthOmics URI 建構 Nextflow 檔案或路徑物件時，只要授予讀取存取權，就會將相符的物件提供給工作流程。允許 Amazon S3 URIs 使用字首或目錄。如需範例，請參閱 Amazon S3 輸入參數格式。

HealthOmics 支援在 Amazon S3 URIs 或 HealthOmics 儲存 URIs 中使用 glob 模式。在工作流程定義中使用 Glob 模式來建立 path或 file頻道。

對於以 Nextflow 撰寫的工作流程，請定義 publishDir 指令，將任務內容匯出至輸出 Amazon S3 儲存貯體。如下列範例所示，將 publishDir 值設定為 /mnt/workflow/pubdir。若要將檔案匯出至 Amazon S3，檔案必須位於此目錄中。


 nextflow.enable.dsl=2
            
workflow {
  CramToBamTask(params.ref_fasta, params.ref_fasta_index, params.ref_dict, params.input_cram, params.sample_name)
  ValidateSamFile(CramToBamTask.out.outputBam)
}

process CramToBamTask {
  container "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-cloud"

  publishDir "/mnt/workflow/pubdir"

  input:
      path ref_fasta
      path ref_fasta_index
      path ref_dict
      path input_cram
      val sample_name

  output:
      path "${sample_name}.bam", emit: outputBam
      path "${sample_name}.bai", emit: outputBai

  script:
  """
      set -eo pipefail

      samtools view -h -T $ref_fasta $input_cram |
      samtools view -b -o ${sample_name}.bam -
      samtools index -b ${sample_name}.bam
      mv ${sample_name}.bam.bai ${sample_name}.bai
  """
}

process ValidateSamFile {
  container "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-cloud"

  publishDir "/mnt/workflow/pubdir"

  input:
      file input_bam

  output:
      path "validation_report"

  script:
  """
      java -Xmx3G -jar /usr/gitc/picard.jar \
      ValidateSamFile \
      INPUT=${input_bam} \
      OUTPUT=validation_report \
      MODE=SUMMARY \
      IS_BISULFITE_SEQUENCED=false
  """
}

在 CWL 中撰寫工作流程

以通用工作流程語言或 CWL 編寫的工作流程提供與以 WDL 和 Nextflow 編寫的工作流程類似的功能。您可以使用 Amazon S3 或 HealthOmics 儲存 URIs做為輸入參數。

如果您在子工作流程的 secondaryFile 中定義輸入，請在主要工作流程中新增相同的定義。

HealthOmics 工作流程不支援操作程序。若要進一步了解 CWL 工作流程中的操作程序，請參閱 CWL 文件。

若要將現有的 CWL 工作流程定義檔案轉換為使用 HealthOmics，請進行下列變更：

將所有 Docker 容器 URIs取代為 Amazon ECR URIs。
請確保在主要工作流程中將所有工作流程檔案宣告為輸入，並明確定義所有變數。
確保所有 JavaScript 程式碼都是嚴格模式投訴。

應該為每個使用的容器定義 CWL 工作流程。不建議使用固定的 Amazon ECR URI 硬式編碼 dockerPull 項目。

以下是以 CWL 撰寫的工作流程範例。



cwlVersion: v1.2
class: Workflow

inputs:
in_file:
  type: File
  secondaryFiles: [.fai]
 
out_filename: string
docker_image: string


outputs:
copied_file:
  type: File
  outputSource: copy_step/copied_file

steps:
copy_step:
  in:
    in_file: in_file
    out_filename: out_filename
    docker_image: docker_image
  out: [copied_file]
  run: copy.cwl

下列檔案定義 copy.cwl任務。



cwlVersion: v1.2
class: CommandLineTool
baseCommand: cp

inputs:
in_file:
  type: File
  secondaryFiles: [.fai]
  inputBinding:
    position: 1

out_filename:
  type: string
  inputBinding:
    position: 2
docker_image:
  type: string

outputs:
copied_file:
  type: File
  outputBinding:
      glob: $(inputs.out_filename)

requirements:
InlineJavascriptRequirement: {}
DockerRequirement:
  dockerPull: "$(inputs.docker_image)"

以下是以 CWL 撰寫且具有 GPU 需求的工作流程範例。


cwlVersion: v1.2
class: CommandLineTool
baseCommand: ["/bin/bash", "docm_haplotypeCaller.sh"]
$namespaces:
cwltool: http://commonwl.org/cwltool#
requirements:
cwltool:CUDARequirement:
  cudaDeviceCountMin: 1
  cudaComputeCapability: "nvidia-tesla-t4" 
  cudaVersionMin: "1.0"
InlineJavascriptRequirement: {}
InitialWorkDirRequirement:
  listing:
  - entryname: 'docm_haplotypeCaller.sh'
    entry: |
            nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv   

inputs: []
outputs: []

工作流程定義範例

下列範例顯示 WDL、Nextflow 和 CWL 中的相同工作流程定義。

WDL


version 1.1

task my_task {
   runtime { ... }
   inputs {
       File input_file
       String name
       Int threshold
   }
   
   command <<<
   my_tool --name ~{name} --threshold ~{threshold} ~{input_file}
   >>>
   
   output {
       File results = "results.txt"
   }
}

workflow my_workflow {
   inputs {
       File input_file
       String name
       Int threshold = 50
   }
   
   call my_task {
       input:
          input_file = input_file,
          name = name,
          threshold = threshold
   }
   outputs {
       File results = my_task.results
   }
}

Nextflow


nextflow.enable.dsl = 2

params.input_file = null
params.name = null
params.threshold = 50

process my_task {
   // <directives>
   
   input:
     path input_file
     val name
     val threshold
   
   output:
     path 'results.txt', emit: results
   
   script:
     """
     my_tool --name ${name} --threshold ${threshold} ${input_file}
     """
     
   
}

workflow MY_WORKFLOW {
   my_task(
       params.input_file,
       params.name,
       params.threshold
   )
}

workflow {
   MY_WORKFLOW()
}

CWL


cwlVersion: v1.2
class: Workflow

requirements:
    InlineJavascriptRequirement: {}

inputs:
   input_file: File
   name: string
   threshold: int

outputs:
    result:
        type: ...
        outputSource: ...

steps:
    my_task:
        run:
            class: CommandLineTool
            baseCommand: my_tool
            requirements:
                ...
            inputs:
                name:
                    type: string
                    inputBinding:
                        prefix: "--name"
                threshold:
                    type: int
                    inputBinding:
                        prefix: "--threshold"
                input_file:
                    type: File
                    inputBinding: {}
            outputs:
                results:
                    type: File
                    outputBinding:
                        glob: results.txt

WDL 工作流程定義範例

下列範例顯示用於在 WDL CRAMBAM中從轉換為的私有工作流程定義。CRAM 工作流程的 BAM定義了兩個任務，並使用來自 genomes-in-the-cloud容器的工具，如範例所示，可公開取得。

下列範例示範如何將 Amazon ECR 容器包含為參數。這可讓 HealthOmics 在啟動執行之前驗證容器的存取許可。


{
     ...
     "gotc_docker":"<account_id>.dkr.ecr.<region>.amazonaws.com/genomes-in-the-cloud:2.4.7-1603303710"
  }

下列範例顯示當檔案位於 Amazon S3 儲存貯體時，如何指定要在執行中使用的檔案。


{
      "input_cram": "s3://amzn-s3-demo-bucket1/inputs/NA12878.cram",
      "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
      "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
      "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
      "sample_name": "NA12878"
  }

如果您想要從序列存放區指定檔案，請指出，如下列範例所示，使用序列存放區的 URI。


{
      "input_cram": "omics://429915189008.storage.us-west-2.amazonaws.com/111122223333/readSet/4500843795/source1",
      "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
      "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
      "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
      "sample_name": "NA12878"
  }

然後，您可以在 WDL 中定義工作流程，如下所示。


 version 1.0
  workflow CramToBamFlow {
      input {
          File ref_fasta
          File ref_fasta_index
          File ref_dict
          File input_cram
          String sample_name
          String gotc_docker = "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-
  cloud:latest"
      }
      #Converts CRAM to SAM to BAM and makes BAI.
      call CramToBamTask{
           input:
              ref_fasta = ref_fasta,
              ref_fasta_index = ref_fasta_index,
              ref_dict = ref_dict,
              input_cram = input_cram,
              sample_name = sample_name,
              docker_image = gotc_docker,
       }
       #Validates Bam.
       call ValidateSamFile{
          input:
             input_bam = CramToBamTask.outputBam,
             docker_image = gotc_docker,
       }
       #Outputs Bam, Bai, and validation report to the FireCloud data model.
       output {
           File outputBam = CramToBamTask.outputBam
           File outputBai = CramToBamTask.outputBai
           File validation_report = ValidateSamFile.report
        }
  }
  #Task definitions.
  task CramToBamTask {
      input {
         # Command parameters
         File ref_fasta
         File ref_fasta_index
         File ref_dict
         File input_cram
         String sample_name
         # Runtime parameters
         String docker_image
      }
     #Calls samtools view to do the conversion.
     command {
         set -eo pipefail
  
         samtools view -h -T ~{ref_fasta} ~{input_cram} |
         samtools view -b -o ~{sample_name}.bam -
         samtools index -b ~{sample_name}.bam
         mv ~{sample_name}.bam.bai ~{sample_name}.bai
      }
      
      #Runtime attributes:
      runtime {
          docker: docker_image
      }
  
      #Outputs a BAM and BAI with the same sample name
       output {
           File outputBam = "~{sample_name}.bam"
           File outputBai = "~{sample_name}.bai"
      }
  }
  
  #Validates BAM output to ensure it wasn't corrupted during the file conversion.
  task ValidateSamFile {
     input {
        File input_bam
        Int machine_mem_size = 4
        String docker_image
     }
     String output_name = basename(input_bam, ".bam") + ".validation_report"
     Int command_mem_size = machine_mem_size - 1
     command {
         java -Xmx~{command_mem_size}G -jar /usr/gitc/picard.jar \
         ValidateSamFile \
         INPUT=~{input_bam} \
         OUTPUT=~{output_name} \
         MODE=SUMMARY \
         IS_BISULFITE_SEQUENCED=false
      }
      runtime {
      docker: docker_image
      }
     #A text file is generated that lists errors or warnings that apply.
      output {
          File report = "~{output_name}"
      }
  }

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

任務加速器

參數範本檔案