As I mentioned in the last post, I’m mapping the filtered transcripts to the mus musculus genome (our lab mice) to remove more reads I’m not interested in before subsampling. For reference I used the complete list of annotated mus musculus genes from the KEGG database we have on axiom. With this step done the files should be fully curated and ready to move forward with.

Here’s the commands for mapping just for reference:

Paired-end read alignment:

/home/mljenior/bin/bowtie/bowtie /mnt/EXT/Schloss-data/matt/metatranscriptomes_HiSeq/mus_musculus/mus_db -f -1 ${sample_name}.read1.pool.trim.filt_rRNA.fasta -p 4 -2 ${sample_name}.read2.pool.trim.filt_rRNA.fasta --un ${sample_name}.filter.trimmed.read.fasta
mv ${sample_name}.filter.trimmed.read_1.fasta cefoperazone_630.read1.pool.trim.filt_rRNA.filt_mus.fasta
mv ${sample_name}.filter.trimmed.read_2.fasta cefoperazone_630.read2.pool.trim.filt_rRNA.filt_mus.fasta

Orphaned read alignment:

1
/home/mljenior/bin/bowtie/bowtie /mnt/EXT/Schloss-data/matt/metatranscriptomes_HiSeq/mus_musculus/mus_db -f ${sample_name}.orphan.pool.trim.filt_rRNA.fasta -p 4 --un ${sample_name}.orphan.pool.trim.filt_rRNA.filt_mus.fasta

Unmapped reads from mapping against mouse genome

1
2
3
4
5
6
7
8
9
10
11
# condition1_plus.read1.pool.trim.filt_rRNA.filt_mus.fasta
# Total sequences: 164655029
# Total bases: 8470.99 Mb

# condition1_plus.read2.pool.trim.filt_rRNA.filt_mus.fasta
# Total sequences: 164655029
# Total bases: 8428.80 Mb

# condition1_plus.orphan.pool.trim.filt_rRNA.filt_mus.fasta
# Total sequences: 14957436
# Total bases: 666.06 Mb

Percent of data removed by mouse filter

1
2
3
4
5
6
7
8
9
Sequences
read 1:  1.26%
read 2:  1.26%
orphan:  1.96%

Bases
read 1:  1.26%
read 2:  1.26%
orphan:  1.89%

This shows that not a lot of mouse transcript make it into the cecal content and mask the signal I’m hoping to get from the datasets.


Total percent of data removed

1
2
3
4
5
6
7
8
9
Sequences
read 1:  1.37%
read 2:  1.37%
orphan:  2.23%

Bases
read 1:  1.37%
read 2:  1.37%
orphan:  2.14%

This is great news. It looks like I most likely have primarily bacterial sequence (excluding any possible viral and archeal reads).


Also, it’s important to say that these numbers are pretty representative of the filtering for all the other experimental groups.
Basically I’ve lost less than 3% of the data by the end of the two step filtering process!

Review Now Online!

We were recently asked to contribute a review to Current Opinion in Microbiology on computational approaches that have been used to under...… Continue reading

UVA Postdoc Symposium

Published on November 23, 2021

Targeting K. pneumoniae Metabolism

Published on November 02, 2021