This vignette will help you analyse haplotype data by dealing with the inflation of rare alleles due to low-frequency genotyping errors.

If you have any issue with it please let me know through my github page: https://github.com/alethere/hapgroup

0. Data used

There are two essential pieces of information necessary for this package to work:

  1. A data.frame of “phases”, SNP data, expressed as a combination of 0s and 1s (0 = reference allele, 1 = alternative allele). Markers are in rows, individuals in columns. Each individual has ploidy columns (so for a diploid, there will be two columns per individual).
  2. A vector of individual names, unrepeated, in the same order as in the “phases” data.frame.

There is a third piece of information that is quite useful, a “position” data.frame(), which contains marker information. This data.frame must have columns “POS”, “CHROM”, “REF” and “ALT”.

These three datasets are provided as example data: exPhase, exInds, exPos.

#Table of phases
knitr::kable(exPhase[1:5,1:10])
V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 1 0 0 1 1 1 1 1 1
1 1 0 0 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0
1 1 0 0 1 1 1 1 1 1
#Vector of individual names
exInds[1:5]
## [1] "Ind0001" "Ind0002" "Ind0003" "Ind0004" "Ind0005"
#Position data.frame
knitr::kable(exPos[1:5,])
CHROM POS REF ALT
chr1 42755 T C
chr1 47251 G A
chr1 47258 C T
chr1 48755 T C
chr1 48757 A G

1. Join markers into blocks

Haploblocks are defined as groups of markers that are “blocked” or “joined together”, which combined can be turned into haplotype alleles. These can be defined in many ways and you can choose your own way. The hapBlock object performs haplotyping one block at a time, so defining which markers fall into a haploblock is totally up to you as the user.

I do provide a simple function that should help you with some easy definitions of haploblocks, the calcBlock() functions. This function can divide markers into blocks based on:

  1. Number of markers within a block (n). Every n markers a new block is started.
  2. Distance between markers (window). Starting from the first marker, all markers within a window of size window are grouped into a block.

You can choose to exclude blocks below a number of markers of exclude.size.

To compute these blocks you must provide a position data.frame as described above.

The result will be a list of data.frames (a split data.frame), with an added column index which can be useful to subset the phase data.

#For example, we can compute blocks with a window size
winBlocks <- calcBlocks(exPos, window = 20000, exclude.size = 2)
## A total of 37 haploblocks found.
## Aerage window size = 16526.73
## Average markers per block = 33.11
## 3 blocks with <3 markers excluded
winBlocks[1:3]
## $chr1.0
##    CHROM   POS REF ALT index
## 1   chr1 42755   T   C     1
## 2   chr1 47251   G   A     2
## 3   chr1 47258   C   T     3
## 4   chr1 48755   T   C     4
## 5   chr1 48757   A   G     5
## 6   chr1 48763   A   G     6
## 7   chr1 49459   C   T     7
## 8   chr1 50326   G   A     8
## 9   chr1 50679   G   C     9
## 10  chr1 50855   C   A    10
## 11  chr1 50856   C   T    11
## 12  chr1 60772   C   A    12
## 13  chr1 60776   G   A    13
## 14  chr1 61157   T   C    14
## 
## $chr1.1
##    CHROM   POS REF ALT index
## 15  chr1 65992   G   A    15
## 16  chr1 66801   C   T    16
## 17  chr1 66804   A   G    17
## 18  chr1 67120   C   T    18
## 19  chr1 67123   A   G    19
## 20  chr1 67521   T   C    20
## 21  chr1 67541   G   A    21
## 22  chr1 68254   G   A    22
## 23  chr1 74706   A   C    23
## 24  chr1 74717   C   T    24
## 25  chr1 74727   C   G    25
## 26  chr1 77829   A   G    26
## 27  chr1 77850   G   A    27
## 28  chr1 77861   G   A    28
## 29  chr1 78974   C   A    29
## 30  chr1 79379   A   G    30
## 
## $chr1.2
##    CHROM    POS REF ALT index
## 31  chr1  93353   C   T    31
## 32  chr1  94496   C   T    32
## 33  chr1  96931   C   T    33
## 34  chr1  97580   T   C    34
## 35  chr1  97585   T   A    35
## 36  chr1  98586   A   G    36
## 37  chr1 100710   A   G    37
## 38  chr1 100713   A   G    38
## 39  chr1 100980   G   T    39
## 40  chr1 101082   T   C    40
## 41  chr1 101103   G   T    41
#We can also compute blocks specifying a specific number of markers within a window
#I also exclude all those below the chosen size
nBlocks <- calcBlocks(exPos, n = 10, exclude.size = 9)
## A total of 121 haploblocks found.
## Aerage window size = 6915.64
## Average markers per block = 10.00
## 3 blocks with <10 markers excluded
nBlocks[1:3]
## $chr1.0
##    CHROM   POS REF ALT index
## 1   chr1 42755   T   C     1
## 2   chr1 47251   G   A     2
## 3   chr1 47258   C   T     3
## 4   chr1 48755   T   C     4
## 5   chr1 48757   A   G     5
## 6   chr1 48763   A   G     6
## 7   chr1 49459   C   T     7
## 8   chr1 50326   G   A     8
## 9   chr1 50679   G   C     9
## 10  chr1 50855   C   A    10
## 
## $chr1.1
##    CHROM   POS REF ALT index
## 11  chr1 50856   C   T    11
## 12  chr1 60772   C   A    12
## 13  chr1 60776   G   A    13
## 14  chr1 61157   T   C    14
## 15  chr1 65992   G   A    15
## 16  chr1 66801   C   T    16
## 17  chr1 66804   A   G    17
## 18  chr1 67120   C   T    18
## 19  chr1 67123   A   G    19
## 20  chr1 67521   T   C    20
## 
## $chr1.2
##    CHROM   POS REF ALT index
## 21  chr1 67541   G   A    21
## 22  chr1 68254   G   A    22
## 23  chr1 74706   A   C    23
## 24  chr1 74717   C   T    24
## 25  chr1 74727   C   G    25
## 26  chr1 77829   A   G    26
## 27  chr1 77850   G   A    27
## 28  chr1 77861   G   A    28
## 29  chr1 78974   C   A    29
## 30  chr1 79379   A   G    30

You may notice that window sizes do not exaclty match what we asked for: this is unavoidable. If you set a start at position x and you take all markers from x to x + 10000 the last marker will not be at a distance of 10000 from the first marker (because the first marker is not at x and the last marker is not at x + 10000).

2. Haplogrouping

The bread and butter of this package is the R6 class hapBlock. You may not have used R6 classes before but they are quite simple to use (you can find more information here ). They are basically a list, with some of its items being functions (which are often called methods), and some hidden items (containing all the provided data).

Let us look at a single, small haploblock.

b <- winBlocks[[4]]

#hapBlock objects (and R6 objects in general) are always started with the function new()
myBlock <- hapBlock$new(phase = exPhase[b$index,], inds = exInds, ploidy = 2, pos = b)

myBlock
## Haploblock object with 31 markers for 1229 individuals (with ploidy 2 = 2458 alleles)

This haploblock right now only contains the alleles as we have defined them, no grouping has been performed yet. To perform the grouping we must use the haplogroup() method. This will group alleles based on an expected error rate (by default 0.01 or 1%).

myBlock$haplogroup()
## With a sequencing error of 0.01  the expected number of mismatches is: 0.61 (CI: -0.91 , 2.13) out of 31
## Number of haplogroups detected:  12

We can check the expected number of mismatches given one (or more) error rates using the mismatch() method.

myBlock$mismatch(c(0.01,0.02,0.05,0.1))
##           lc    exp       uc
## 1 -0.9064915 0.6138 2.134091
## 2 -0.9026561 1.2152 3.333056
## 3 -0.2548013 2.9450 6.144801
## 4  1.3874322 5.5800 9.772568

By default, the upper boundary of the CI is used. For this block, this is 2.13 out of 31. We can override this if necessary with the argument mismatch

myBlock$haplogroup(mismatch = 2)
## Using provided number of mismatches: 2 out of 31 
## Number of haplogroups detected:  12

2.1 Haplogroup evaluation

You can obtain some information about each haplogroup using the function haplostats(). The result provides a data.frame with the following columns:

  • group: haplogroup code, numbered from most to least frequent
  • meanD: mean distance (mismatches) between variants within the group
  • nvars: number of distinct variants within the group
  • groupFreq: number of times the group is observed in the total population
  • consensus: consensus variant, where each position is the most frequent. If less than 3/4 of the observed variants have the same allele in a position, an N is returned instead.
  • freqVar: the most frequent variant within the haplogroup
  • freq: the frequency, within the haplogroup, of the most frequent variant
myBlock$haplostats()
##    group    meanD nvars groupFreq                       consensus
## 1    H01 2.780220    14      1658 1101001001111001011111100100111
## 2    H02 2.757576    12       428 0000000100000000000000000000000
## 3    H03 2.363636    12       132 1101001001011001000001000100011
## 4    H04 3.309942    19       111 111101111101101100010101N101011
## 5    H05 2.666667     9        48 1101001111011001100101000110011
## 6    H06 1.666667     4        25 100000100N0010010000000001000N0
## 7    H07 2.071429     8        23 1101001001111001000111100100N11
## 8    H08 1.600000     5        22 1101001111011001000101000100011
## 9    H09 1.333333     3         8 N0N1001001111001011111100100111
## 10   H10 0.000000     1         1 0000000100000000000101011000000
## 11   H11 0.000000     1         1 0001011111011001000101010101011
## 12   H12 0.000000     1         1 1101000101011001000000000000011
##                            freqVar      freq
## 1  1101001001111001011111100100111 0.9517491
## 2  0000000100000000000000000000000 0.7640187
## 3  1101001001011001000001000100011 0.6893939
## 4  1111011111011011000101011101011 0.3783784
## 5  1101001111011001100101000110011 0.5208333
## 6  1000001001001001000000000100010 0.5200000
## 7  1101001001111001000111100100111 0.3913043
## 8  1101001111011001000101000100011 0.6818182
## 9  0001001001111001011111100100111 0.3750000
## 10 0000000100000000000101011000000 1.0000000
## 11 0001011111011001000101010101011 1.0000000
## 12 1101000101011001000000000000011 1.0000000

You can also visualize the grouping of the found variants using the plotGrouping() method. You may need to adjust the vertex size:

par(mar = c(0,0,0,2))
myBlock$plotGrouping(vertex.size = 8)

Notice that haplogroups that represent a single variant are not plotted in the clustering graph plot.

2.2 Retrieving alleles

Once groups have been established (or before), you may want to obtain the specific alleles for each individuals. You can obtain these alleles in several forms:

  • allele: binary code where 0 means reference allele and 1 means alternative allele. For example: 00010, 00110, 11110 …
  • variant: alphanumeric variant codes, where each unique variant is given a specific name. For example: V010, V011, V123…
  • haplotype: alphanumeric haplotype codes, based on a pre-calculated haplotype grouping using haplogroup(). For example: H01, H02, H03… Haplotype numbers are based on frequency (H01 is always the most frequent).
  • sequence: DNA sequences based on reference and alternative alleles, as well as on a reference sequence. Can only be used after using the method buildSequences(). For example: “AACCTTTG”, “AACCATTG”, “AACCATTC”…
#Notice that each individual has two entries, corresponding to its two alleles

#These are alleles in binary code
myBlock$alleles("al")[1:8]
##                           Ind0001                           Ind0001 
## "1101001001111001011111100100011" "1101001001111001011111100100111" 
##                           Ind0002                           Ind0002 
## "0000000100000000000000000000000" "0000000100000000000000000000000" 
##                           Ind0003                           Ind0003 
## "1101001001111001011111100100111" "1101001001111001011111100100111" 
##                           Ind0004                           Ind0004 
## "1101001001111001011111100100111" "1101001001111001011111100100111"
#In variant codes
myBlock$alleles("var")[1:20]
## Ind0001 Ind0001 Ind0002 Ind0002 Ind0003 Ind0003 Ind0004 Ind0004 Ind0005 Ind0005 
##   "V53"   "V56"   "V03"   "V03"   "V56"   "V56"   "V56"   "V56"   "V53"   "V56" 
## Ind0006 Ind0006 Ind0007 Ind0007 Ind0008 Ind0008 Ind0009 Ind0009 Ind0010 Ind0010 
##   "V56"   "V56"   "V56"   "V56"   "V56"   "V56"   "V56"   "V56"   "V56"   "V56"
#In haplotype code
myBlock$alleles("hap")[1:20]
## Ind0001 Ind0001 Ind0002 Ind0002 Ind0003 Ind0003 Ind0004 Ind0004 Ind0005 Ind0005 
##   "H01"   "H01"   "H02"   "H02"   "H01"   "H01"   "H01"   "H01"   "H01"   "H01" 
## Ind0006 Ind0006 Ind0007 Ind0007 Ind0008 Ind0008 Ind0009 Ind0009 Ind0010 Ind0010 
##   "H01"   "H01"   "H01"   "H01"   "H01"   "H01"   "H01"   "H01"   "H01"   "H01"
#In sequence code we cannot get it yet
myBlock$alleles("seq")[1:10]
## Warning in myBlock$alleles("seq"): Sequences cannot be computed. Please run the 'buildSequences' method.
## Returning variant alleles.
## Ind0001 Ind0001 Ind0002 Ind0002 Ind0003 Ind0003 Ind0004 Ind0004 Ind0005 Ind0005 
##   "V53"   "V56"   "V03"   "V03"   "V56"   "V56"   "V56"   "V56"   "V53"   "V56"

To see the “sequence” alleles directly we must provide a template sequence. In this case we will just use a toy example full of “_” so that you can easily see what is happening under the hood.

start <- min(b$POS)
end <- max(b$POS)

seq <- paste(rep("_", end - start + 10), collapse = "")
myBlock$buildSequences(sequence = seq, start = start - 9)
## New sequences can be found in the 'codes' data.frame. 
## Use alleles('sequence') to obtain allele sequences for each individual

Now we can obtain the sequence allele, you can see that simply, in each position the reference allele is changed by the alternative allele.

res <- myBlock$alleles("seq")[1]
#I split it because it is too long
res <- sapply(seq(1,to = nchar(res), by = 100),function(i) substr(res,i, i + 99))
cat(unname(res),sep = "\n")
## _________C_A___C____________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ________________________________________________C___________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ________________C_________________________________________G_________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## _____________________________________________________________G_______________________C______________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ______________T_T___________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## __________________A_____A___________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ___________________________________________________________________________________________G________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## _________________________________________________________C__________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ______________________________G_____________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________C_____G_____________T___________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## _________________________________________________________________________________________________A__
## _A__________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## _______________________________________________________________A____________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## _________________________________________________________T__TG______________________________________
## _______________________________________________________________________G__________________________T_
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## _______________________________________________________________A____________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________G_______________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## _____________________________________________________________________________________________C______
## _____________G______________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ____________________________________________________________________________________________________
## ________________________________T

2.3 MST plot

There is something called an MST plot, or maximum spanning tree, which is a type of graph often used in haplotype comparisons. I worked a bit on the visualization, although I’m not very happy with it (thus why it is not a part of the R6 hapBlock class). You can use it as shown below, if you give it some alleles you care about (I often remove the low-frequency alleles).

Several of the parameters must be adjusted manually (especially vertex.size and label.size). Play with it if you’re not satisfied with the result (or find a different visualization).

stats <- myBlock$haplostats()

alleles <- stats$freqVar[stats$groupFreq > 1]
names(alleles) <- stats$group[stats$groupFreq > 1]
freqs <- stats$freq[ stats$groupFreq > 1]
haploMST(alleles = alleles, allele.freq = freqs,vertex.size = 30, label.size = 0.8)

3. More data

You can retrieve several pieces of information from the hapBlock object which may be useful to you if you are doing analysis.

#This is the distance matrix between observed variants
myBlock$D[1:10,1:10]
##     V01 V02 V03 V04 V05 V06 V07 V08 V09 V10
## V01   0   1   1   2   2   2   2   3   5   3
## V02   1   0   2   3   1   3   3   4   6   4
## V03   1   2   0   1   1   1   1   2   4   2
## V04   2   3   1   0   2   2   2   3   5   3
## V05   2   1   1   2   0   2   2   3   5   3
## V06   2   3   1   2   2   0   2   1   5   1
## V07   2   3   1   2   2   2   0   1   5   3
## V08   3   4   2   3   3   1   1   0   6   2
## V09   5   6   4   5   5   5   5   6   0   6
## V10   3   4   2   3   3   1   3   2   6   0
#The list of ploidy-repeated alleles
myBlock$inds[1:10]
##  [1] "Ind0001" "Ind0001" "Ind0002" "Ind0002" "Ind0003" "Ind0003" "Ind0004"
##  [8] "Ind0004" "Ind0005" "Ind0005"
#The codes dataframe which relates alleles to vairants (and other information)
#Here I truncate the sequence column, otherwise it's too long
codes <- myBlock$codes[1:10,]
codes$sequence <- substr(codes$sequence,1,15)
knitr::kable(codes)
allele variant group haplo variantFreq sequence
0000000000000000000000000000000 V01 4 H02 65 _________T_G___
0000000000000000000000000000010 V02 4 H02 2 _________T_G___
0000000100000000000000000000000 V03 4 H02 327 _________T_G___
0000000100000000000000000000001 V04 4 H02 3 _________T_G___
0000000100000000000000000000010 V05 4 H02 13 _________T_G___
0000000100000000000000000100000 V06 4 H02 5 _________T_G___
0000000100000000000000100000000 V07 4 H02 4 _________T_G___
0000000100000000000000100100000 V08 4 H02 1 _________T_G___
0000000100000000000101011000000 V09 10 H10 1 _________T_G___
0000000100001000000000000100000 V10 4 H02 1 _________T_G___
#The phase data if you need it
knitr::kable(myBlock$phase[1:10,1:10])
V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
42 1 1 0 0 1 1 1 1 1 1
43 1 1 0 0 1 1 1 1 1 1
44 0 0 0 0 0 0 0 0 0 0
45 1 1 0 0 1 1 1 1 1 1
46 0 0 0 0 0 0 0 0 0 0
47 0 0 0 0 0 0 0 0 0 0
48 1 1 0 0 1 1 1 1 1 1
49 0 0 1 1 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0 0
51 1 1 0 0 1 1 1 1 1 1

4. Future developments?

Although I do not plan to develop this further at the moment, I leave these ideas here in case someone (or my future self) wants to develop additional things.

  1. It may be better to group alleles not based on similarity, but based on a comparison of the low-frequency haplotypes to the high-frequency haplotypes, and using mismatch probabilities (a Poisson test) to determine if the variant can be confidently grouped into haplotypes or not.
  2. Calculation of blocks using window-based sizes should not simply take position \(x\) and get all markers until \(x+window\), but rather, starting from position \(x\), find the first marker and then get all markers within the specified window size (thus avoiding a large gap at the beggining of the window).
  3. Block calculation may be enhanced by using genome-location features. How to approach this?
  4. A wrapper to parallelize haplogroup computation across many blocks. The pieces are there, it’s just a matter of intelligently creating a %dopar% loop and returning the relevant results (not the whole hapBlock object.)