mirror of
https://github.com/Zenithsiz/ist-pic-report.git
synced 2026-02-03 05:57:12 +00:00
187 lines
6.4 KiB
TeX
187 lines
6.4 KiB
TeX
\documentclass[twocolumn]{article}
|
|
|
|
% Packages
|
|
\usepackage{lipsum}
|
|
\usepackage{biblatex}
|
|
\usepackage{hyperref}
|
|
|
|
% TODO: DUE DATE: 12/07
|
|
|
|
% `biblatex` resources
|
|
\addbibresource{references.bib}
|
|
|
|
% Meta
|
|
\title{TODO}
|
|
\author{Filipe Rodrigues}
|
|
\date{June 9, 2023}
|
|
|
|
% Document
|
|
\begin{document}
|
|
|
|
% Title
|
|
\maketitle
|
|
|
|
% Abstract
|
|
\begin{abstract}
|
|
\end{abstract}
|
|
|
|
% Introduction
|
|
\section{Introduction}
|
|
|
|
Over recent years, we've witnessed the introduction of
|
|
programs that require extremely large runtime datasets.
|
|
Despite advancements in main memory capacities, some programs
|
|
truly require larger-than-main-memory datasets.
|
|
|
|
Traditionally, the memory system used on most machines is one
|
|
consisting of main-memory, situated on a DRAM chip, alongside
|
|
swap memory, typically situated on non-volatile storage shared
|
|
with other user filesystem partitions.
|
|
|
|
Once main memory is close to full, the kernel moves pages from
|
|
it to swap. Once a program attempts to access the page, a page
|
|
fault occurs and the kernel transparently moves the page from
|
|
swap back to ram and resumes the program.
|
|
|
|
This handling of memory can incur large costs. This is largely due to
|
|
swap residing on persistent memory, typically a hard drive or
|
|
solid-state disk. These have much lower throughput and higher
|
|
latency than DRAM chips. They might also contain other partitions
|
|
than the swap partition, which might imply competition with
|
|
filesystem reads/writes.
|
|
|
|
Another reason for possible high costs of this approach is the
|
|
time it takes for the kernel to resolve a page fault when a program
|
|
accesses a swapped-out page.
|
|
|
|
When datasets become bigger than the main memory capacity, a program
|
|
will encounter trashing, where pages are constantly bouncing between
|
|
main memory and swap.
|
|
|
|
With the advent of tiered-memory systems, we found a different paradigm
|
|
with a much more scalable approach.
|
|
|
|
In tiered-memory systems, we typically find two, or more, memories,
|
|
each with different tradeoffs regarding size and throughput / latency.
|
|
Each of these may be addressed by the CPU individually.
|
|
|
|
By assigning each page to a different memory, based on it's usage patterns,
|
|
it is possible to avoid paying the cost of page faults, since the program
|
|
can just access the memory directly. Despite this memory potentially being
|
|
slower than main memory, it is still much faster than swap and doesn't require
|
|
copying a whole page to main memory on access.
|
|
|
|
These systems have an additional problem to solve, however. They need to
|
|
decide where each page should reside. This is typically solved by running a
|
|
classifier in the background that receives periodic samples of the access
|
|
patterns for each page. This classifier can then chose to migrate certain pages
|
|
to other memories.
|
|
|
|
These migrations are similar to the swap migrations, but they can be much faster,
|
|
since they don't involve copying to/from a hard-drive or solid-state disk.
|
|
It is also possible to, in some configurations, perform these page migrations in
|
|
the background via a DMA command to another processing unit in the system.
|
|
|
|
% TODO: Explain what we're doing
|
|
|
|
% TODO: Problem is somewhat unresolved
|
|
|
|
% TODO: Mention more tired memory systems
|
|
|
|
% TODO: Memory systems aren't very well developed
|
|
|
|
% TODO: Want to contribute to this problem, by studying current systems
|
|
|
|
% TODO: Explain what we solved (Some HeMem results not yet studied + framework)
|
|
|
|
% TODO: Explain swap isn't byte-addressable
|
|
|
|
% TODO: Explain how heterogenous memory models address multiple RAMs (~NUMA)
|
|
|
|
% Background
|
|
\section{Background}
|
|
|
|
% Explain how some memories systems exist
|
|
% Hybrid:
|
|
% TODO: Example: DRAM + Optane
|
|
% TODO: Example: DRAM + CCLX Mention
|
|
|
|
% TODO: Explain how OSes interact with memory systems
|
|
|
|
One such implementation of a tiered-memory system is HeMem \cite{10.1145/3477132.3483550}.
|
|
It is designed to work with two memories: A main memory based on DRAM, as well
|
|
as secondary memory, based on Intel Optane.
|
|
|
|
This gives it a fast, but small DRAM for frequently accessed pages, and then a
|
|
slightly-slower, but larger Optane for less accessed pages.
|
|
|
|
It's classifier works by tagging pages as cold/hot, depending on their access patterns.
|
|
|
|
A cold page is one that isn't accessed very often.
|
|
This usually implies it can remain in / be moved to Optane memory.
|
|
|
|
On the other hand, hot pages are ones that are accessed very often.
|
|
This usually implies it should remain in / be moved to DRAM.
|
|
|
|
Pages are allocated on the main memory, as long as space is available. Else they are
|
|
allocated on the secondary memory.
|
|
|
|
When a cold page that resides in secondary memory becomes hot, it is moved to main memory.
|
|
This may require migrating colder pages in main memory to secondary memory before-hand.
|
|
The coldest memory is chosen for this migration.
|
|
|
|
When a hot page that resides in main memory becomes cold, it is moved to secondary memory.
|
|
|
|
% TODO: Mention approach 1 (JIT recompiler that checks "hot" memory sections)
|
|
|
|
% TODO: Mention approach 2 (Runtime, like HeMem)
|
|
|
|
% Objectives
|
|
\section{Objectives}
|
|
|
|
The HeMem classifier will be the main object of study for this report.
|
|
We'll be studying how it acts under certain circumstances, to determine
|
|
whether some anti-patterns occur, where it may be choosing actions that
|
|
ultimately slow down the system as a whole.
|
|
|
|
HeMem as a whole is a very complex system. However, we're only interested
|
|
in studying it's classifier. In order to achieve this, we create a simulator
|
|
to run just the classifier on.
|
|
|
|
In order to study how a program's accesses are evaluated by HeMem, we record
|
|
all memory accesses of that program to a trace file.
|
|
This is done with \texttt{valgrind}, using a custom tool.
|
|
|
|
Given that valgrind can record all accesses, and not just those that go to memory,
|
|
this tool contains an ideal cache simulator, to ensure only accesses that
|
|
would reach memory are registered.
|
|
|
|
A second tool, called \texttt{ftmemsim}, then accepts these traces and
|
|
runs the HeMem classifier on it, outputting a data file with results on
|
|
several statistics collected.
|
|
|
|
A third tool, called \texttt{ftmemsim-graphs} can then accept the data file and
|
|
produce graphs to intuitively visualize the data.
|
|
|
|
% Results
|
|
\section{Results}
|
|
|
|
% TODO: Ensure replicability
|
|
|
|
% TODO: Mention workloads & explain them (Single-treaded versus Multi-threaded).
|
|
|
|
% TODO: Section should be Q&A.
|
|
|
|
% TODO: Add "conclusion" to each question / graph.
|
|
|
|
% Conclusion
|
|
\section{Conclusion}
|
|
|
|
% Bibliography
|
|
% TODO: Investigate if we can fix the hbadness of
|
|
% `printbibliography`?
|
|
\hbadness 10000
|
|
\printbibliography
|
|
|
|
\end{document}
|