Skip to content

Commit

Permalink
Fixed the way similar sequences were merged together based on cutoff …
Browse files Browse the repository at this point in the history
…distance. (There was an error where sequence numbers were lost.)
  • Loading branch information
dleebhsai committed Jan 29, 2019
1 parent cb9a044 commit 2b28159
Show file tree
Hide file tree
Showing 6 changed files with 99 additions and 46 deletions.
5 changes: 5 additions & 0 deletions PatchInfo.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Patch notice for version 3.5.7
- Adjusted the filter for non-functional VDJ so that <5 AA and > 30 AA CDR3s are excluded, and any V 3', D 5', D 3', J 5' nt deletion > 15 are suspected as improper annotations.
- Fixed the tree cutting algorithm to preserve the child-parent relation when merging sequence <= Cutoff hamming distances.
- Add flippedLegend for future plotting functions

## Patch notice for version 3.5.5
- Added Rainbow Trout database to BRILIA
- General code fixes
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# BRILIA v3.5.5
# BRILIA v3.5.7

**B-cell Repertoire Inductive Lineage and Immunosequence Annotator**

Expand Down
24 changes: 11 additions & 13 deletions Src/AnalysisTools/Lineage/Tree/plotTree.m
Original file line number Diff line number Diff line change
Expand Up @@ -200,20 +200,18 @@

%Create the tree search table
VDJdata = joinData(VDJdata, VDJheader, 'stable');
if nargout >= 1
TreeSearchTable = getTreeSearchTable(VDJdata, VDJheader);
[~, ~, TableIdx] = intersect(ImgGrpNums, cell2mat(TreeSearchTable(2:end, 1)));
TreeSearchTable = TreeSearchTable([1; TableIdx+1], :);
for j = 1:length(ImgNames) %Remove the NewFilePath from the TreeFiles
[FilePath, FileName, ~] = parseFileName(ImgNames{j});
ImgNames{j} = FileName;
end
TreeSearchTable(2:end, end) = ImgNames;
writeDlmFile(TreeSearchTable, fullfile(FilePath, 'TreeSearchTable.csv'));
varargout{1} = ImgGrpNums;
varargout{2} = ImgNames;
varargout{3} = TreeSearchTable;
TreeSearchTable = getTreeSearchTable(VDJdata, VDJheader);
[~, ~, TableIdx] = intersect(ImgGrpNums, cell2mat(TreeSearchTable(2:end, 1)));
TreeSearchTable = TreeSearchTable([1; TableIdx+1], :);
for j = 1:length(ImgNames) %Remove the NewFilePath from the TreeFiles
[FilePath, FileName, ~] = parseFileName(ImgNames{j});
ImgNames{j} = FileName;
end
TreeSearchTable(2:end, end) = ImgNames;
writeDlmFile(TreeSearchTable, fullfile(FilePath, 'TreeSearchTable.csv'));
varargout{1} = ImgGrpNums;
varargout{2} = ImgNames;
varargout{3} = TreeSearchTable;

function Gx = plotSingleTree(VDJdata, VDJheader, ExpPu, DistanceUnit, DotMaxSize, DotMinSize, DotScalor, DotColorMap, Legend, LegendFontSize, Xmax, Yincr, FigMaxHeight, FigWidth, FigSpacer, Visible)

Expand Down
19 changes: 14 additions & 5 deletions Src/AnnotationTools/labelSeqQuality.m
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
%a productive/nonproductive VDJ, or is invalid/incomplete. The label
%results are:
%'N': Sequence has stop codon in the CDR3 or out of frame error
%'Y': Fully translatable to AA and all annotation values exists
%'M': Sequence has stop codon in V and J. Could be sequencing error or
% pseudogene.
%'I': Invalid or incomplete annotation, mostly likely caused by error,
% partial sequence, or short non-VDJ seq that resembles a junction, or
% excessive mismatches with the predicted germline sequence
% partial sequence, or short non-VDJ seq that resembles a junction,
% excessive mismatches with the predicted germline sequence, or > 15
% deletions on a V/D/J segment, or CDR length < 4
%'Y': Fully translatable to AA and all annotation values exists
%
% VDJdata = labelSeqQuality(VDJdata, Map)
%
Expand All @@ -20,6 +21,7 @@
%
function VDJdata = labelSeqQuality(VDJdata, Map, ThreshHold)
if isempty(VDJdata); return; end
Map = getVDJmapper(Map);
Chain = lower(Map.Chain);
if nargin < 3
ThreshHold = 0.4;
Expand All @@ -33,8 +35,10 @@
GeneNameIdx = Map.([Chain(c) 'GeneName']);
CDR3Idx = Map.([Chain(c) 'CDR3']);
FunctIdx = Map.([Chain(c) 'Funct']);
DelIdx = Map.([Chain(c) 'Del']);

for j = 1:size(VDJdata,1)

%Extract necessary information
Seq = VDJdata{j, SeqIdx};
VMDNJ = cell2mat(VDJdata(j,LengthIdx));
Expand All @@ -43,18 +47,23 @@
CDR3S = VDJdata{j, CDR3Idx(3)};
CDR3E = VDJdata{j, CDR3Idx(4)};
CDR3Len = VDJdata{j, CDR3Idx(2)}; %CDR3 length is >= 5 AA and <= 30 AA including 104C and 118W
Del = cell2mat(VDJdata(j, DelIdx));

%Make sure all necessary information is available
VDJdata{j, FunctIdx} = 'I'; %Start with an incomplete, and then determine if it's a Y

if isempty(Seq); continue; end
if isempty(VNum); continue; end
if isempty(VName); continue; end
if isempty(CDR3S); continue; end
if isempty(CDR3E); continue; end
if isempty(CDR3Len); continue; end
if isempty(Del); continue; end

%Make sure info makes sense (I for invalid)
if min(VMDNJ) < 0 || sum(VMDNJ) ~= length(Seq) || CDR3S < 1 || CDR3E > length(Seq) || CDR3Len < 5 || CDR3Len > 30
VDJdata{j,FunctIdx} = 'I';
%NOTE! CDR3Len must add 2 because this is for the PRE_IMGT_CDR3Length!
if min(VMDNJ) < 0 || sum(VMDNJ) ~= length(Seq) || CDR3S < 1 || CDR3E > length(Seq) || CDR3Len < (5+2) || CDR3Len > (30+2) || max(Del) > 15
VDJdata{j, FunctIdx} = 'I';
continue
end

Expand Down
93 changes: 67 additions & 26 deletions Src/AnnotationTools/mergeSimilarSeq.m
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
%
function VDJdata = mergeSimilarSeq(varargin)
VDJdata = [];
%To enable post-processing capabilities
varargin = cleanCommandLineInput(varargin{:});
varargin = cleanCommandLineInput(varargin{:}); %To enable post-processing capabilities

if numel(varargin) >= 3
[VDJdata, Map, Cutoff] = deal(varargin{1:3});
VDJdata = mergeSimilarSeq_Calc(VDJdata, Map, Cutoff);
Expand All @@ -46,18 +46,15 @@
NewNum = size(VDJdata, 1);
if CurNum ~= NewNum
[Path, ~, Ext, Pre] = parseFileName(FileNames{f});
if Cutoff >= 1
SaveName = fullfile(Path, sprintf('%s.Merge%d%s', Pre, Cutoff, Ext));
else
SaveName = fullfile(Path, sprintf('%s.Merge%0.2f%s', Pre, Cutoff, Ext));
end
Ptrn = ternary(Cutoff >= 1, '%s.Merge%d%s', '%s.Merge%0.2f%s');
SaveName = fullfile(Path, sprintf(Ptrn, Pre, Cutoff, Ext));
saveSeqData(SaveName, VDJdata, VDJheader);
fprintf('%s: Finished with "%s".\n Reduced %d similar sequences.\n', mfilename, FileNames{f}, CurNum - NewNum);
else
fprintf('%s: No sequences to merge in "%s".\n', mfilename, FileNames{f});
end
end
VDJdata = []; %Nothing to return;
VDJdata = []; %For multiple files, return nothing.
end

function VDJdata = mergeSimilarSeq_Calc(VDJdata, Map, Cutoff)
Expand All @@ -72,24 +69,38 @@
parfor y = 1:length(VDJdata)
VDJdata{y} = mergeSimilarSeqPerGroup(VDJdata{y}, Map, Cutoff);
end
VDJdata = joinData(VDJdata, Map); %Have to join to ensure cluster-based splicing w/ correct GrpNum
if IsSpliced
VDJdata = spliceData(VDJdata, Map); %Re-splice to ensure cluster-based splicing

if ~IsSpliced %Rejoin to ensure cluster-based splicing is preserved when passing this function
VDJdata = joinData(VDJdata, Map); %Have to join to ensure cluster-based splicing w/ correct GrpNum
end

%CODINGNOTE: Test code for mergSimilarSeqPerGroup
% AncMap = [2 0 2 2; 4 2 2 2 ; 6 4 1 2; 8 4 1 2; 10 8 1 2];
% HSeq = {'ACGTACGTACGT'; 'ACGTCCCTACGT'; 'ACGGACCTCCCT'; 'TTTTACCTACGT'; 'TTCAACCTACGT'}
%
% VDJdata = [num2cell(AncMap) HSeq];
% Map.SeqNum = 1;
% Map.ParNum = 2;
% Map.Ham = 3;
% Map.Template = 4;
% Map.lSeq = 0;
% Map.hSeq = 5;
% CutLen = 2;

function VDJdata = mergeSimilarSeqPerGroup(VDJdata, Map, Cutoff)
if size(VDJdata, 1) <= 1; return; end

AncMap = zeros(size(VDJdata, 1), 4);
AncMap(:, [1:2, 4]) = cell2mat(VDJdata(:, [Map.SeqNum, Map.ParNum, Map.Template]));
OrigSeqNum = AncMap(:, 1); %Need this later at end to preserve SeqNum!
AncMap = renumberAncMap(AncMap); %So you can use relative indexing

%CODING_NOTE: instead of redoing this comparison, consider saving info in VDJdata after lineage inference
SeqIdx = nonzeros([Map.hSeq; Map.lSeq]);
Seq1 = ''; %Keep this to prevent an unassigned Seq1 length later
for j = 1:size(VDJdata, 1)
for j = 1:size(AncMap, 1)
if AncMap(j, 2) > 0
Seq1 = horzcat(VDJdata{j, SeqIdx}); %The children
Seq2 = horzcat(VDJdata{AncMap(j, 2) == AncMap(:, 1), SeqIdx}); %The parent
Seq1 = horzcat(VDJdata{AncMap(j, 1), SeqIdx}); %The children
Seq2 = horzcat(VDJdata{AncMap(j, 2), SeqIdx}); %The parent
AncMap(j, 3) = length(Seq1) - sum(cmprSeqMEX(Seq1, Seq2));
end
end
Expand All @@ -101,16 +112,46 @@
end

%Those that fall under the cutoff will be regrouped
Idx = find(AncMap(:, 3) <= CutLen & AncMap(:, 2) > 0);
for j = 1:length(Idx)
MainPar = AncMap(Idx(j), 2); %This is the sequence that will remain
MainParIdx = find(AncMap(:, 1) == MainPar);
SamePar = AncMap(Idx(j), 1); %This is the sequence that will be lost
MergeLoc = AncMap(:, 2) == SamePar; %These sequences must have new parents
AncMap(MergeLoc, 2) = MainPar; %Reassigned parent to orphans
AncMap(MainParIdx, 4) = AncMap(MainParIdx, 4) + AncMap(Idx(j), 4); %Added up the templates
ChildIdx = find(AncMap(:, 3) <= CutLen & AncMap(:, 2) > 0);
for j = 1:length(ChildIdx)
ParNum = AncMap(ChildIdx(j), 2); %This is the sequence that will remain
ParIdx = find(AncMap(:, 1) == ParNum);

AncMap(ParIdx, 4) = AncMap(ParIdx, 4) + AncMap(ChildIdx(j), 4);
AncMap(ChildIdx(j), 4) = 0;
AncMap(ChildIdx(j), 1) = -ParNum; %use negative to prevent find it

OldChildLoc = AncMap(:, 2) == ChildIdx(j);
AncMap(OldChildLoc, 2) = ParNum;
end
AncMap(ChildIdx, :) = [];
VDJdata(ChildIdx, :) = [];

AncMap(Idx, :) = [];
VDJdata(Idx, :) = [];
VDJdata(:, [Map.SeqNum; Map.ParNum; Map.Template]) = num2cell(AncMap(:, [1:2 4]));
%Need to get the real sequence numbers
Idx = find(AncMap(:, 1:2) > 0);
AncMap(Idx) = OrigSeqNum(AncMap(Idx));
VDJdata(:, [Map.SeqNum; Map.ParNum; Map.Template]) = num2cell(AncMap(:, [1:2 4]));
%
% %
% % SamePar = AncMap(ChildIdx(j), 1); %This is the sequence that will be lost
% % MergeLoc = AncMap(:, 2) == SamePar; %These sequences must have new parents
% % AncMap(MergeLoc, 2) = ParNum; %Reassigned parent to orphans
% % AncMap(ParIdx, 4) = AncMap(ParIdx, 4) + AncMap(ChildIdx(j), 4); %Added up the templates
% % end
% %
%
%
% %Those that fall under the cutoff will be regrouped
% Idx = find(AncMap(:, 3) <= CutLen & AncMap(:, 2) > 0);
% for j = 1:length(Idx)
% MainPar = AncMap(Idx(j), 2); %This is the sequence that will remain
% MainParIdx = find(AncMap(:, 1) == MainPar);
% SamePar = AncMap(Idx(j), 1); %This is the sequence that will be lost
% MergeLoc = AncMap(:, 2) == SamePar; %These sequences must have new parents
% AncMap(MergeLoc, 2) = MainPar; %Reassigned parent to orphans
% AncMap(MainParIdx, 4) = AncMap(MainParIdx, 4) + AncMap(Idx(j), 4); %Added up the templates
% end
%
% AncMap(Idx, :) = [];
% VDJdata(Idx, :) = [];
% VDJdata(:, [Map.SeqNum; Map.ParNum; Map.Template]) = num2cell(AncMap(:, [1:2 4]));
2 changes: 1 addition & 1 deletion Src/BRILIA.m
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@
% y Exit BRILIA local environment when job completes

function varargout = BRILIA(varargin)
Version = '3.5.5';
Version = '3.5.7';
varargout = cell(1, nargout);
HasShownCredit = false;

Expand Down

0 comments on commit 2b28159

Please sign in to comment.