Excluding or including by awk












2












$begingroup$


I have a gtf file as attached enter link description here



By this command one could extract coding parts of genome



awk '{if($3=="transcript" && $20==""protein_coding";"){print $0}}' gencode.gtf


How I could exclude coding parts from this file keeping non coding regions










share|improve this question









$endgroup$

















    2












    $begingroup$


    I have a gtf file as attached enter link description here



    By this command one could extract coding parts of genome



    awk '{if($3=="transcript" && $20==""protein_coding";"){print $0}}' gencode.gtf


    How I could exclude coding parts from this file keeping non coding regions










    share|improve this question









    $endgroup$















      2












      2








      2


      1



      $begingroup$


      I have a gtf file as attached enter link description here



      By this command one could extract coding parts of genome



      awk '{if($3=="transcript" && $20==""protein_coding";"){print $0}}' gencode.gtf


      How I could exclude coding parts from this file keeping non coding regions










      share|improve this question









      $endgroup$




      I have a gtf file as attached enter link description here



      By this command one could extract coding parts of genome



      awk '{if($3=="transcript" && $20==""protein_coding";"){print $0}}' gencode.gtf


      How I could exclude coding parts from this file keeping non coding regions







      linux wgs bash






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 2 hours ago









      Feresh TehFeresh Teh

      38111




      38111






















          1 Answer
          1






          active

          oldest

          votes


















          3












          $begingroup$

          Getting the non coding regions of a protein coding transcript, sounds like you are looking for UTR.



          UTR has its own feature in the gtf file. So you can do this:



          $ awk -v FS="t" '$3=="UTR"' gencode.gtf


          If the gtf file is compressed use this instead:



          $ zcat gencode.gtf.gz | awk -v FS="t" '$3=="UTR"'


          BTW: Why are you using such an old release of gencode? The current version is v29.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Sorry, literally I need non coding regions of human genome, but for asking my question here I referred to coding parts too
            $endgroup$
            – Feresh Teh
            1 hour ago










          • $begingroup$
            Sorry I tried hat but my output is empty
            $endgroup$
            – Feresh Teh
            1 hour ago






          • 1




            $begingroup$
            As @Wouter tells you, the non coding region of a genome is the complement of the coding regions. Coding regions have its own feature in the gtf file. You can get them with $ awk -v FS="t" '$3=="CDS"' gencode.gtf. Reading the manual for bedtools complement is your task.
            $endgroup$
            – finswimmer
            1 hour ago












          • $begingroup$
            Thank you but both of your commands return nothing :(
            $endgroup$
            – Feresh Teh
            1 hour ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "676"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7098%2fexcluding-or-including-by-awk%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3












          $begingroup$

          Getting the non coding regions of a protein coding transcript, sounds like you are looking for UTR.



          UTR has its own feature in the gtf file. So you can do this:



          $ awk -v FS="t" '$3=="UTR"' gencode.gtf


          If the gtf file is compressed use this instead:



          $ zcat gencode.gtf.gz | awk -v FS="t" '$3=="UTR"'


          BTW: Why are you using such an old release of gencode? The current version is v29.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Sorry, literally I need non coding regions of human genome, but for asking my question here I referred to coding parts too
            $endgroup$
            – Feresh Teh
            1 hour ago










          • $begingroup$
            Sorry I tried hat but my output is empty
            $endgroup$
            – Feresh Teh
            1 hour ago






          • 1




            $begingroup$
            As @Wouter tells you, the non coding region of a genome is the complement of the coding regions. Coding regions have its own feature in the gtf file. You can get them with $ awk -v FS="t" '$3=="CDS"' gencode.gtf. Reading the manual for bedtools complement is your task.
            $endgroup$
            – finswimmer
            1 hour ago












          • $begingroup$
            Thank you but both of your commands return nothing :(
            $endgroup$
            – Feresh Teh
            1 hour ago
















          3












          $begingroup$

          Getting the non coding regions of a protein coding transcript, sounds like you are looking for UTR.



          UTR has its own feature in the gtf file. So you can do this:



          $ awk -v FS="t" '$3=="UTR"' gencode.gtf


          If the gtf file is compressed use this instead:



          $ zcat gencode.gtf.gz | awk -v FS="t" '$3=="UTR"'


          BTW: Why are you using such an old release of gencode? The current version is v29.






          share|improve this answer











          $endgroup$













          • $begingroup$
            Sorry, literally I need non coding regions of human genome, but for asking my question here I referred to coding parts too
            $endgroup$
            – Feresh Teh
            1 hour ago










          • $begingroup$
            Sorry I tried hat but my output is empty
            $endgroup$
            – Feresh Teh
            1 hour ago






          • 1




            $begingroup$
            As @Wouter tells you, the non coding region of a genome is the complement of the coding regions. Coding regions have its own feature in the gtf file. You can get them with $ awk -v FS="t" '$3=="CDS"' gencode.gtf. Reading the manual for bedtools complement is your task.
            $endgroup$
            – finswimmer
            1 hour ago












          • $begingroup$
            Thank you but both of your commands return nothing :(
            $endgroup$
            – Feresh Teh
            1 hour ago














          3












          3








          3





          $begingroup$

          Getting the non coding regions of a protein coding transcript, sounds like you are looking for UTR.



          UTR has its own feature in the gtf file. So you can do this:



          $ awk -v FS="t" '$3=="UTR"' gencode.gtf


          If the gtf file is compressed use this instead:



          $ zcat gencode.gtf.gz | awk -v FS="t" '$3=="UTR"'


          BTW: Why are you using such an old release of gencode? The current version is v29.






          share|improve this answer











          $endgroup$



          Getting the non coding regions of a protein coding transcript, sounds like you are looking for UTR.



          UTR has its own feature in the gtf file. So you can do this:



          $ awk -v FS="t" '$3=="UTR"' gencode.gtf


          If the gtf file is compressed use this instead:



          $ zcat gencode.gtf.gz | awk -v FS="t" '$3=="UTR"'


          BTW: Why are you using such an old release of gencode? The current version is v29.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 1 hour ago

























          answered 1 hour ago









          finswimmerfinswimmer

          962210




          962210












          • $begingroup$
            Sorry, literally I need non coding regions of human genome, but for asking my question here I referred to coding parts too
            $endgroup$
            – Feresh Teh
            1 hour ago










          • $begingroup$
            Sorry I tried hat but my output is empty
            $endgroup$
            – Feresh Teh
            1 hour ago






          • 1




            $begingroup$
            As @Wouter tells you, the non coding region of a genome is the complement of the coding regions. Coding regions have its own feature in the gtf file. You can get them with $ awk -v FS="t" '$3=="CDS"' gencode.gtf. Reading the manual for bedtools complement is your task.
            $endgroup$
            – finswimmer
            1 hour ago












          • $begingroup$
            Thank you but both of your commands return nothing :(
            $endgroup$
            – Feresh Teh
            1 hour ago


















          • $begingroup$
            Sorry, literally I need non coding regions of human genome, but for asking my question here I referred to coding parts too
            $endgroup$
            – Feresh Teh
            1 hour ago










          • $begingroup$
            Sorry I tried hat but my output is empty
            $endgroup$
            – Feresh Teh
            1 hour ago






          • 1




            $begingroup$
            As @Wouter tells you, the non coding region of a genome is the complement of the coding regions. Coding regions have its own feature in the gtf file. You can get them with $ awk -v FS="t" '$3=="CDS"' gencode.gtf. Reading the manual for bedtools complement is your task.
            $endgroup$
            – finswimmer
            1 hour ago












          • $begingroup$
            Thank you but both of your commands return nothing :(
            $endgroup$
            – Feresh Teh
            1 hour ago
















          $begingroup$
          Sorry, literally I need non coding regions of human genome, but for asking my question here I referred to coding parts too
          $endgroup$
          – Feresh Teh
          1 hour ago




          $begingroup$
          Sorry, literally I need non coding regions of human genome, but for asking my question here I referred to coding parts too
          $endgroup$
          – Feresh Teh
          1 hour ago












          $begingroup$
          Sorry I tried hat but my output is empty
          $endgroup$
          – Feresh Teh
          1 hour ago




          $begingroup$
          Sorry I tried hat but my output is empty
          $endgroup$
          – Feresh Teh
          1 hour ago




          1




          1




          $begingroup$
          As @Wouter tells you, the non coding region of a genome is the complement of the coding regions. Coding regions have its own feature in the gtf file. You can get them with $ awk -v FS="t" '$3=="CDS"' gencode.gtf. Reading the manual for bedtools complement is your task.
          $endgroup$
          – finswimmer
          1 hour ago






          $begingroup$
          As @Wouter tells you, the non coding region of a genome is the complement of the coding regions. Coding regions have its own feature in the gtf file. You can get them with $ awk -v FS="t" '$3=="CDS"' gencode.gtf. Reading the manual for bedtools complement is your task.
          $endgroup$
          – finswimmer
          1 hour ago














          $begingroup$
          Thank you but both of your commands return nothing :(
          $endgroup$
          – Feresh Teh
          1 hour ago




          $begingroup$
          Thank you but both of your commands return nothing :(
          $endgroup$
          – Feresh Teh
          1 hour ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Bioinformatics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7098%2fexcluding-or-including-by-awk%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Polycentropodidae

          Magento 2 Error message: Invalid state change requested

          Paulmy