SuperMemo had a fertile impact on the research of modeling long-term memory. One of the most interesting fruits of that research is the two component model of long-term memory. Over the last two decades, it has been hinted on many occasions that the model may provide a sound theoretical basis for a better and more universal approach to spaced repetition. Algorithm SM-17 (Alg-SM17 or just SM17 in short) is an attempt to convert the theory into a practical application. Due to its universal nature, the algorithm should, in the future, find its way to all SuperMemo products.
Important! At the moment of writing (April 2016), SuperMemo 17 does not use incremental adjustments to optimization matrices in Algorithm SM-17. This is why you should execute Tools : Memory : 4D Graphs : Stability : Compute from time to time to adjust the algorithm to newly available data. In the future, the adjustments will be made at each repetition.
The two component model of long-term memory underlies Algorithm SM-17. The model asserts that two variables are sufficient to describe the status of unitary memory in a healthy human brain:
In the literature on memory, an ill-defined term of memory strength will often stand for either stability (S) or retrievability (R). This results in a great deal of confusion that slows down the progress of memory research. In 1995 (and earlier), we have proven theoretically that the two variables: R and S are sufficient to describe the status of memory in spaced repetition.
Algorithm SM-17 is the ultimate practical proof for the correctness of the two component model.
All SuperMemo algorithms before the year 2015 took one essential assumption that limited their universal applicability in learning: the review of the learning material should occur at optimum timing determined by SuperMemo. Algorithms employed in SuperMemo 11 through 16 included weak heuristics that would allow of repetition at any time without a noticeable detriment to learning. In particular, those heuristics would help overcome the impact of the spacing effect where frequent reviews lead to less impact or even weakening of memories. New research indicates that the applicability of those heuristics was limited to the less extreme departures from the optimum schedule. The new algorithm, in theory, should make it possible to optimize review for any repetition history, and any repetition timing. It should also make it possible to simulate the status of memory produced with different variants of spaced repetition algorithms (e.g. those used by SuperMemo in remote past).
Here is the list of major weaknesses of Algorithm SM-15 as revealed by comparisons with Algorithm SM-17:
Due to the fact that real-life application of SuperMemo require tackling learning material of varying difficulty, the third variable involved in the model is item difficulty (D). Some of the implication of item difficulty have also be discussed in the above article. In particular, the impact of composite memories with subcomponents of different memory stability (S). The three component model of memory is referred to as the DSR model.
For the purpose of the new algorithm we have defined the three components of memory as follows:
See also: Modeling memory
The three variables of memory were introduced gradually into SuperMemo algorithms:
To assess how well computer models fit reality, SuperMemo uses its own deviation measures. In particular:
SuperMemo converts grades to average retrievability correlated with those grades for a given user (every user has his own system of grading and different levels of average retrievability will correspond with individual grades). The grade-derived deviation (gDev) is the difference between grade-derived retrievability (Rg) and theoretical retrievability (Rt) derived from stability estimate (Sw). The fail-success deviation (fDev) is 1-R for grades>2 and R for grades<3.
In estimating the deviation of grades from predicted grades, SuperMemo uses the formula:
- Dev - prediction deviation
- BGW - binary vs. grade-based weight (currently chosen to be 70%)
- fDev - binary success-failure deviation
- gDev - grade derived retrievability deviation
The deviation for an item repetition history is computed as the square root of the average of squared deviations (Dev) for each repetition.
In least squares comparison, for items or for [[Glossary:Collection|collections], the last squares deviation is:
- LSDev - least squares deviation for all grade predictions (Dev) in a subset of repetition histories
- count - number of repetitions in the set
- sqrt - square root
The assumed weighing (BGW) is a heuristic based on experiments and the fact that grade-retrievability correlation makes for a weak prediction of grades based on R. The said deviation formula is subject to change.
The deviation used in estimating difficulty is weighed further to increase the impact of more recent repetitions on the current difficulty estimate.
Currently, for large collections, the algorithm can generate stability increase matrix which produces a deviation of 18%. Considering the stochastic nature of forgetting, this number would be close to the lowest possible theoretical value if only fail-success deviation (fDev) was used. However, grade-derived deviation (gDev) reduces the span of the deviation (by making R predictions more precise).
When using predicted grade deviations in assessing the strength of particular algorithmic solutions, we need to keep in mind that there is a theoretical limit on the minimum value of the deviation. This means that we will never approach the ideal 0% deviation.
For the formulas used above, the limit will depend on weights (e.g. BGW) and particular user's grade-retrievability averages.
If only binary success-failure deviation (fDev) was used, and all items were scheduled with a perfect interval at the requested forgetting index of 10%, the minimum deviation would correspond with the retrievability prediction of 0.9. That deviation would then be:
In addition to the least square deviation, SuperMemo 17 will provide a few other metrics for assessing the effectiveness of the algorithm. If you want to find your metric in SuperMemo options, please join the discussion: Spaced repetition algorithm metric
SuperMemo uses matrix representations to express various functions, and hill-climbing approximations to find best theoretical symbolic matches for those functions for use in various contexts.
To guide the hill-climbing algorithms, a separate deviation measure is used as an optimization criterion. For each individual matrix entry:
The deviation for the entire matrix is the square root of the average of individual deviations (Dev). SuperMemo expresses that deviation in percent.
SuperMemo uses hill-climbing algorithms to match hypothetical functions that might model data categories of interest for the algorithm. Those functions can later be used to speed up calculations of Algorithm SM-17.Currently, three best fit approximation procedures are used to verify theoretical formulas used in the algorithm. Those approximations determine the following functions:
Those approximation procedures may also be used in research that helps validate the algorithm. For example, to prove the exponential nature of forgetting, take additive and multiplicative variants of approximation formulas for Recall[D,S,R] with separate set of parameters defining power and exponential components along the R dimension. Outcome: despite the fact that heterogeneous forgetting curves may appear to follow the power law, Algorithm SM-17 is filtering items for difficulty well enough to eliminate the power component entirely from the equation. Note that the forgetting curve describing the first review does not form a part of the Recall matrix. This is because a heterogeneous stream of new items might pollute the matrix and initiate the "ripple effect" of distorted estimates for higher stability levels (as in older SuperMemos). That first review forgetting curve follows the power law pretty well and, unlike in older SuperMemos, power regression is used to produce the default startup stability for new items.
There are several hypothetical correlations between approximation parameters and characteristics of user's learning process and knowledge quality. For example:
Algorithm SM-17 computes new intervals using the stability increase function. This function tells SuperMemo how much intervals should increase for all possible combinations of memory status variables: Difficulty, Stability and Retrievability. SuperMemo determines item difficulty by finding which difficulty level provides the best fit to grades scored in repetition history. The first interval is computed in a way that is similar to earlier SuperMemos. The stability increase function is computed using data collected during repetitions.
Central to Algorithm SM-17 is the stability increase function represented by the stability increase matrix (abbreviated to SInc). The simple formula at the core of SuperMemo is:
This means that optimum intervals in SuperMemo increase by SInc in each repetition (assuming the requested forgetting index is 10%). D, S, R stand for 3 memory variables: difficulty, stability and retrievability respectively.
Figure: 3D graph of SInc matrix based on 60,167 repetition cases for items with difficulty=0.5. The increase is dramatic at low stabilities (17-fold increase is unheard of in earlier SuperMemos), and peaks at retrievability of 0.85. In some cases, the SInc drops below 1.0, which corresponds with a drop in stability (i.e. memory lability). Those low values of SInc do not depend on retrievability. Those huge variations in SInc are the main reason why SuperMemo 17 beats SuperMemo 16 in learning metrics by a wide margin..
The relevant formulas used to determine first intervals in SuperMemo are:
Int=SI (for the first repetition)
Int=PLS (for the first review after a failing grade)
The SInc matrix is similar to the optimum factor matrix from earlier SuperMemos (OF matrix in SuperMemo 5.0 through SuperMemo 16.0), but it takes different arguments. In the same way as OF-matrix determined the increase of intervals, SInc is used in the exactly same way except it takes three arguments: Difficulty, Stability and Retrievabilty. OF-matrix was indexed by A-Factors (reflecting Difficulty) and repetition category (roughly related to Stability). However, OF matrix did not include the Retrievability dimension as repetitions were supposed to be executed at optimum intervals determined by the requested forgetting index.
Item difficulty in SuperMemo Algorithm SM-17 is a number from 0 (easiest) to 1 (hardest). Items are linearly mapped in the 0..1 difficulty range by using their maximum possible memory stability increase. Maximum increase in stability occurs in the first review when the length of the interval corresponds with retrievability that produces the greatest increase in the strength of memory. Items with theoretically highest possible stability increase are assigned difficulty of 0. These are the easiest items in the process. Those items are the target of all forms of learning as they are a perfect reflection of the minimum information principle. As the maximum stability increase for very difficult items may approach 1.0 (or even drop below that value, i.e. produce memory liability!), all items at certain lowest acceptable maximum stability increase are classified as having difficulty 1.0. The exact cut off number is the parameter of the algorithm that is subject to change and further optimization (see Value N in Computing difficulty algorithm below)
As of Algorithm SM-8, SuperMemo used the concept of A-Factor to define item difficulty. A-Factor was defined as the increase in optimum interval during the first repetition (RepNo=2 in SuperMemo). In most situations, A-Factor would fall in the 1.0-10.0 range. SuperMemo employed the range from 1.2 to 6.9 where 1.2 limit was set for the sake of filtering out the so-called "leech material", i.e. material whose difficulty would disqualify it from efficient use of spaced repetition. Alg-SM17 might have used that A-Factor definition or strive at more universal concept of maximum increase in memory stability. Those two approaches do not need to differ much as newer data indicates that the maximum stability increase occurs at retrievability (R) that are close to the default levels used in SuperMemo. Alg-SM17 normalizes difficulty in the 0 (easiest) to 1 (hardest) range to roughly correspond with A-Factors 1.1 to 10.0. The assumption is that items that are harder than 1.0 should be eliminated from the learning process, while items that are easier than 10.00 are either extremely rare or can safely be treated along the rest of 10.0 items without a detriment to total workload.
Alg-SM17 requires a full history of repetitions to compute item difficulty (see below for a simplified procedure that goes around that requirement for developers who might wish to develop light SuperMemos that might look at smaller data sets for mobile devices, etc.).
For the full range of possible item difficulties, Alg-SM17 computes the deviations of predicted grades (based on a given assumed difficulty) from actual grades as recorded in repetition history of a given item. The algorithm chooses the difficulty that produces the least deviation.
An important parameter in computing grade deviation in difficulty estimates is the deviation exponent. Least squares approach is prevalent in statistics, however, different values of the deviation exponent yield slightly different results. The ultimate goal is to set all parameters of optimization so that to produce a closer adherence of predicted outcomes to theoretical models.
Note that weighing quantiles by deviations tends to amass most difficulties around 0.5, which shows that this is not a valid approach.
Computing the deviation:
Some important considerations in difficulty calculations:
At the moment, the most likely solution to be adopted in SuperMemo is to use BestSInc() function that transitions from SInc() towards SInc with the gradual inflow of data.
Theoretical SInc() function is the root where the baseline mapping of maximum stability increase to difficulty is established:
- SInc - stability increase
- Difficulty - difficulty in 0..1 range
- f(R,S) - component of SInc that depends on R and S
This baseline formula currently determines the cutoff value for hardest possible items at SInc that is N times less than the maximum value possible for easiest items (with N=(6*f(R,S)+1)/(f(R,S)+1)).
We have also developed a simplified solution for computing item difficulties. This might be useful for developers who look for simplicity, thinner and faster code. Simplified procedure is similar to the approach used in older SuperMemos, esp. SuperMemo 8 through SuperMemo 16.
Item difficulty may be made history-less by relying on the fact that difficulty of an item keeps evolving depending on the context of knowledge and through interference. For that reason, the performance in the most recent repetition counts more than performance in prior repetitions.
Simplified difficulty can then be obtained using the formula:
- sDF - simplified difficulty function
- D[n] - difficulty estimate after the n-th repetition
- Grade - grade scored in the n-th repetition
- Rr - expected recall at the time of the n-th repetition (based on the Recall[D,S,R] matrix)
Unlike best-fit difficulty, simplified difficulty requires no expensive hill-climbing optimization that may slow down SuperMemo (e.g. when computing learning data for long repetition histories). The accuracy of the simplified difficulty estimation is bound to be less and is currently being estimated.
The first interval Int is determined using the first forgetting curve. Unlike the entries of the Recall matrix, the first forgetting curve data is collected using 35 quantiles determined by the interval distribution on a large representative sample collected from an incremental reading collection. Those quantiles span nearly a decade (older SuperMemos collected data for only the first 20 days after memorizing the item). The startup interval (the first optimum interval) is determined by power regression on that forgetting curve data. Power law forgetting is the result of the superposition of exponential forgetting curves of heterogeneous indeterminate difficulty material entering the learning process. The first interval is corrected for the requested forgetting index (rFI) and may be subject to random dispersion that is a standard fixture of SuperMemo algorithms since its inception.
Note that the startup interval (with rFI=10%) is not the same as the startup stability. The interval needs to be determined before the first review (i.e. be a resultant of prior repetitions of other items). The startup stability is determined from repetition histories and may differ after each successive repetition. In short, it represents the best fit to data.
Figure: The first forgetting curve (first repetition for items with no lapses). Unlike it was the case in earlier SuperMemos, where all forgetting curves were exponential, the first forgetting curve in SuperMemo 17 is approximated using power regression. This provides for a more accurate mapping due to the heterogeneity of the learning material introduced in the learning process that results in superposition of exponential forgetting with different decay constants. The use of power regression explains why the first interval might be slightly shorter in Algorithm SM-17. On a semi-log graph, the power regression curve is logarithmic (in yellow), and appearing almost straight. The curve shows that in the presented collection recall drops merely to 58% in four years, which can be explained by a high reuse of the memorized knowledge in real life. In earlier SuperMemos, recall data would only be collected in the span of 20 days, and negatively exponential forgetting curve would make for far lower retrievability predictions. The first optimum interval for the forgetting index of 10% is 3.76 days. The forgetting curve can be described with the formula R=0.987*power(interval,-0.07), where 0.987 is the recall on Day 1, while -0.07 is the decay constant. This is case, the formula yields 89.5% recall after 4 days, which is then used as the first rounded optimum interval. Almost 77,000 repetition cases were used to plot the presented graph. Steeper drop in recall will occur in collections with a higher mix of difficult items, in poorly formulated collections, or in new users with lesser mnemonic skills.
Startup stability S is the stability accomplished with the first single review. Unlike stability computed at later repetitions, this stability cannot be derived from the SInc matrix. Instead, it is computed in a way similar to the way difficulty is computed. Within a range of possible values, S is used to compute memory stabilities after successive repetitions. The startup stability S is taken as the value that produces the least predicted grade deviation from grades in repetition history.
Post-lapse stability (PLS) is the stability resulting from a review with a failing grade. Unlike stability computed after successful repetitions, this stability cannot be derived from the SInc matrix.
In the ideal case, for simple memories, forgetting results in a reset of stability back to near-zero. In theory, only difficult items made of composite memories may show a substantial decrease in the costs of re-learning, however, even that does not show in data (perhaps due to imperfect difficulty estimations or due to SuperMemo's natural tendency to help users eliminate hard-to-learn material in the learning process).
It has been shown long ago that the length of the first post-lapse optimum interval is best correlated with the number of memory lapses afforded an item. Even then, post-lapse interval usually oscillates in the 1-4 days range for forgetting index of 10%, and the correlation is not very useful in adding to the efficiency of learning in optimizing the length of that interval. Some competitive spaced repetition software, as well as SuperMemo in its first years, experimented with re-learning hypotheses based on ancient wisdom of learning psychology, e.g. by halving intervals after a memory lapse. Current data shows clearly that this is a harmful and time-wasting approach.
In addition to memory lapses, there is also a strong correlation between the length of the post-lapse interval and retrievability (R). This should be obvious considering that substantial delay in review will result in "undeserved" lapse on items that might be then quite easy to re-learn. Algorithm SM-17 takes retrievability dimension into account.
In the light of the above, SuperMemo uses a separate matrix for post-lapse stabilities: PLS with Lapse and Retrievability dimensions. The first interval after scoring a failing grade is then determined as follows:
That neat theoretical approach is made a bit more complex when we consider that forgetting may not be perfectly exponential if items are difficult or with mixed difficulty. In addition, forgetting curves in SuperMemo can be marred by user strategies in all situations where grades are not an ideal reflection of recall/retrievability.
Theoretical two-component model may be undermined by imperfect difficulty filtering, user strategies, implementation shortcomings (e.g. poor choice of parameters), and imperfect modelling of reality. For that reason SuperMemo uses three different measures of retrievability:
Recall matrix: Algorithm SM-17 accounts for possible non-exponential decay of memory, as well as for D and S estimate errors, by using the Recall[D,S,R] matrix. Recall matrix takes the same arguments as the SInc matrix. Recall[D,S,R] is the average recall for items of difficulty D, at stability level S, and at retrievability R. Recall matrix keeps the forgetting curve record with the time axis expressed as retrievability. The main functions of Recall matrix are:
The actual recall level statistic is used to amend the theoretical retrievability level derived from stability and time.
Figure: The Recall matrix graph shows that the actual recall differs from predicted retrievability. For higher stabilities and difficulties, it is harder to reach the desired recall level.
Recall-Grade correlation: For individual repetitions, retrievability can be checked against the average recall level registered for individual grades at repetitions. Success and failure of memory, as defined in the grade scale are also used to measure binary recall (0 - failure, 1 - success). Algorithm SM-17 uses both the recall-grade correlation and binary recall to add to the set of information needed to estimate memory stability. Grades carry more information, while binary recall is a better reflection of forgetting. Those two must be weighed up when looking at grade-derived estimate of R.
All the three measures of retrievability can be weighed up to provide for the best estimate of R that is used in further estimates of stability and stability increase. Weights used in the estimate will change with inflow of new information (e.g. repetition cases registered in the Recall matrix). The ultimate goal of weight parameters is to minimize the deviation between the model and the grades scored in learning.
A careful distinction must be made between all measures of recall. Theoretical retrievability (R) concept is essential for indexing optimization matrices as R is the ultimate target when converging onto the theoretical model of memory. At the other extreme, stability estimates are item-specific and need to take into account past recall statistics and actual grades scored by a particular item in the learning process.
The interval and the resulting grade can be used to draw conclusions about the stability estimate. If the interval is greater than stability, good grades indicate stability underestimate. Conversely, short intervals and poor grades may indicate an overestimate. This type of reasoning underlay the first attempt to compute the stability increase function in this 2005 article. However, due to being applicable to a subset of repetition scenarios, this information is insufficient to estimate stability for all items and all repetitions. In addition, conclusions drawn in 2005 differ substantially from the new data based on far more accurate recall-verified retrievability estimate (as described above).
The second, and more reliable source of information about stability is the value derived from retrievability estimate modified along prior recall measurements. Stability can be estimated from retrievability level and the stability estimate value before the repetition. This is the most reliable estimate.
Those three sources of information are weighed up for reliability (e.g. on the number of cases in the Recall matrix used to modify R, or the number of cases in the SInc matrix used to generate pre-repetition stability). The actual parameters used in weighing up information have been computed using optimization methods targeted at minimizing predicted grade deviation in larger collections.
Stability increase function is used to compute intervals in Algorithm SM-17.
The stability increase function is expressed in Algorithm SM-17 using three different methods:
Due to its monotonicity, theoretical formula is used in all hill-climbing procedures which are employed richly in the algorithm. It is also used exclusively in new collections where no learning data is available. The SInc matrix is the truest expression of stability changes in user's particular collection. The SInc matrix has the greatest research value and is used in all contexts where true properties of memory need to be taken into account. The matrix is the key to Alg-SM17 adaptability. Finally, the ultimate compromise BestSInc() function is a statement on what is best known about the memory properties and learning strategies of the user and his or her particular collection at any given moment in time.
The choice between the tree expressions of SInc, and the choice of compromise parameters in BestSInc() are essential for the ultimate efficiency of the algorithm. Those choices and parameters are still in the process of being optimized.
The SInc matrix can be computed using three different methods:
The recommended approach for users of older SuperMemo is to compute a new SInc matrix as the first step and then proceed with incremental changes during a course of standard learning with Algorithm SM-17.
Intervals in Algorithm SM-17 are computed using stability increase function SInc() (the best compromise variant is used as described above as BestSInc).
The following formula is used:
- Int[n] - interval after the n-th repetition
- S[n] - memory stability after n-th repetition
- R[n] - memory retrievability at the time of n-th repetition
- SInc[D,S,R] - stability increase for a given level of difficulty (D), stability (S) and retrievability (R)
- rFI - requested forgetting index expressed in percent
This formula is analogous to similar formulas in all versions of SuperMemo beginning with Algorithm SM-0. The role of the stability increase SInc was played earlier by E-Factors (based on item difficulty) and O-Factors (derived from forgetting curves).
The following procedure can be used to determine the status of memory (DSR status) at any moment of time for a given history of repetitions. Once the DSR status is known, we can determine the next interval using criteria of choice, e.g. forgetting index, maximization of stability, long-term workload minimization, etc.
Note that replacing Int:=Int[n-1]*SInc[D,Sw,Rw] with Int:=Sw*SInc[D,Sw,Rw] frees the user from ever worrying about manual intervention in the length of intervals (e.g. before an exam, or when encountering knowledge in real life, etc.). This is one of the key advantages of Algorithm SM-17 over all prior versions of the algorithm.
Figure: R-Metric graph demonstrates superiority of Algorithm SM-17 over the old Algorithm SM-15 for the presented collection used in the testing period of 12 months before the release of SuperMemo 17. It was plotted using 10,699 data points (i.e. repetition cases with data from both algorithms), and smoothed up to show the trends. One spot beneath the line of 0 at the vertical axis (
R-metric<0) corresponds with a moment in time when the previous version of the algorithm appeared superior (in this case, due to a small sample of repetitions). Some positive and negative trends correspond with changes in the algorithm as data were collected in the new algorithm's testing period. A gradual increase in the metric in the months Feb-May 2016, might be a statistical aberration, or it might be the result of new interval values and a bigger R-metric for intervals departing from the optimum used in earlier SuperMemos. The latter might indicate that the benefits of Algorithm SM-17 might gradually increase over time.
Although the presented algorithm may seem complex, you should find it easier to follow its steps once you understand the evolution of individual concepts such as E-Factor, matrix of optimum intervals, optimum factors, forgetting curves, mid-interval repetition, stability, retrievability, etc.