mirror of
https://github.com/Zenithsiz/ist-pic-report.git
synced 2026-02-03 14:10:07 +00:00
Started work on first few chapters.
This commit is contained in:
parent
0517cb9ceb
commit
cdcef1bf1b
8
.vscode/settings.json
vendored
8
.vscode/settings.json
vendored
@ -1,3 +1,9 @@
|
||||
{
|
||||
"cSpell.words": ["biblatex", "hbadness", "printbibliography"]
|
||||
"cSpell.words": [
|
||||
"biblatex",
|
||||
"ftmemsim",
|
||||
"hbadness",
|
||||
"optane",
|
||||
"printbibliography"
|
||||
]
|
||||
}
|
||||
|
||||
@ -15,3 +15,12 @@
|
||||
location = {Virtual Event, Germany},
|
||||
series = {SOSP '21}
|
||||
}
|
||||
|
||||
@misc{izraelevitz2019basic,
|
||||
title = {Basic Performance Measurements of the Intel Optane DC Persistent Memory Module},
|
||||
author = {Joseph Izraelevitz and Jian Yang and Lu Zhang and Juno Kim and Xiao Liu and Amirsaman Memaripour and Yun Joon Soh and Zixuan Wang and Yi Xu and Subramanya R. Dulloor and Jishen Zhao and Steven Swanson},
|
||||
year = {2019},
|
||||
eprint = {1903.05714},
|
||||
archiveprefix = {arXiv},
|
||||
primaryclass = {cs.DC}
|
||||
}
|
||||
|
||||
109
report.tex
109
report.tex
@ -21,29 +21,124 @@
|
||||
|
||||
% Abstract
|
||||
\begin{abstract}
|
||||
Citation \cite{10.1145/3477132.3483550}
|
||||
\lipsum[1]
|
||||
\end{abstract}
|
||||
|
||||
% Motivation
|
||||
\section{Motivation}
|
||||
\lipsum[1-2]
|
||||
|
||||
Over recent years, we've witnessed the introduction of
|
||||
programs that require extremely large runtime datasets.
|
||||
Despite advancements in main memory capacities, some programs
|
||||
truly require larger-than-main-memory datasets.
|
||||
|
||||
Traditionally, the memory system used on most machines is one
|
||||
consisting of main-memory, situated on a DRAM chip, alongside
|
||||
swap memory, typically situated on non-volatile storage shared
|
||||
with other user filesystem partitions.
|
||||
|
||||
Once main memory is close to full, the kernel moves pages from
|
||||
it to swap. Once a program attempts to access the page, a page
|
||||
fault occurs and the kernel transparently moves the page from
|
||||
swap back to ram and resumes the program.
|
||||
|
||||
This handling of memory can incur large costs. This is largely due to
|
||||
swap residing on persistent memory, typically a hard drive or
|
||||
solid-state disk. These have much lower throughput and higher
|
||||
latency than DRAM chips. They might also contain other partitions
|
||||
than the swap partition, which might imply competition with
|
||||
filesystem reads/writes.
|
||||
|
||||
Another reason for possible high costs of this approach is the
|
||||
time it takes for the kernel to resolve a page fault when a program
|
||||
accesses a swapped-out page.
|
||||
|
||||
When datasets become bigger than the main memory capacity, a program
|
||||
will encounter trashing, where pages are constantly bouncing between
|
||||
main memory and swap.
|
||||
|
||||
With the advent of tiered-memory systems, we found a different paradigm
|
||||
with a much more scalable approach.
|
||||
|
||||
In tiered-memory systems, we typically find two, or more, memories,
|
||||
each with different tradeoffs regarding size and throughput / latency.
|
||||
Each of these may be addressed by the CPU individually.
|
||||
|
||||
By assigning each page to a different memory, based on it's usage patterns,
|
||||
it is possible to avoid paying the cost of page faults, since the program
|
||||
can just access the memory directly. Despite this memory potentially being
|
||||
slower than main memory, it is still much faster than swap and doesn't require
|
||||
copying a whole page to main memory on access.
|
||||
|
||||
These systems have an additional problem to solve, however. They need to
|
||||
decide where each page should reside. This is typically solved by running a
|
||||
classifier in the background that receives periodic samples of the access
|
||||
patterns for each page. This classifier can then chose to migrate certain pages
|
||||
to other memories.
|
||||
|
||||
These migrations are similar to the swap migrations, but they can be much faster,
|
||||
since they don't involve copying to/from a hard-drive or solid-state disk.
|
||||
It is also possible to, in some configurations, perform these page migrations in
|
||||
the background via a DMA command to another processing unit in the system.
|
||||
|
||||
% Background
|
||||
\section{Background}
|
||||
\lipsum[1-2]
|
||||
|
||||
One such implementation of a tiered-memory system is HeMem \cite{10.1145/3477132.3483550}.
|
||||
It is designed to work with two memories: A main memory based on DRAM, as well
|
||||
as secondary memory, based on Intel Optane.
|
||||
|
||||
This gives it a fast, but small DRAM for frequently accessed pages, and then a
|
||||
slightly-slower, but larger Optane for less accessed pages.
|
||||
|
||||
It's classifier works by tagging pages as cold/hot, depending on their access patterns.
|
||||
|
||||
A cold page is one that isn't accessed very often.
|
||||
This usually implies it can remain in / be moved to Optane memory.
|
||||
|
||||
On the other hand, hot pages are ones that are accessed very often.
|
||||
This usually implies it should remain in / be moved to DRAM.
|
||||
|
||||
Pages are allocated on the main memory, as long as space is available. Else they are
|
||||
allocated on the secondary memory.
|
||||
|
||||
When a cold page that resides in secondary memory becomes hot, it is moved to main memory.
|
||||
This may require migrating colder pages in main memory to secondary memory before-hand.
|
||||
The coldest memory is chosen for this migration.
|
||||
|
||||
When a hot page that resides in main memory becomes cold, it is moved to secondary memory.
|
||||
|
||||
% Objectives
|
||||
\section{Objectives}
|
||||
\lipsum[1-3]
|
||||
|
||||
The HeMem classifier will be the main object of study for this report.
|
||||
We'll be studying how it acts under certain circumstances, to determine
|
||||
whether some anti-patterns occur, where it may be choosing actions that
|
||||
ultimately slow down the system as a whole.
|
||||
|
||||
HeMem as a whole is a very complex system. However, we're only interested
|
||||
in studying it's classifier. In order to achieve this, we create a simulator
|
||||
to run just the classifier on.
|
||||
|
||||
In order to study how a program's accesses are evaluated by HeMem, we record
|
||||
all memory accesses of that program to a trace file.
|
||||
This is done with \texttt{valgrind}, using a custom tool.
|
||||
|
||||
Given that valgrind can record all accesses, and not just those that go to memory,
|
||||
this tool contains an ideal cache simulator, to ensure only accesses that
|
||||
would reach memory are registered.
|
||||
|
||||
A second tool, called \texttt{ftmemsim}, then accepts these traces and
|
||||
runs the HeMem classifier on it, outputting a data file with results on
|
||||
several statistics collected.
|
||||
|
||||
A third tool, called \texttt{ftmemsim-graphs} can then accept the data file and
|
||||
produce graphs to intuitively visualize the data.
|
||||
|
||||
% Results
|
||||
\section{Results}
|
||||
\lipsum[1-3]
|
||||
|
||||
% Conclusion
|
||||
\section{Conclusion}
|
||||
\lipsum[1-4]
|
||||
|
||||
% Bibliography
|
||||
% TODO: Investigate if we can fix the hbadness of
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user