[Top] [All Lists]

Re: sparsemem support for mips with highmem

To: Dave Hansen <>
Subject: Re: sparsemem support for mips with highmem
From: C Michael Sundius <>
Date: Fri, 15 Aug 2008 11:17:21 -0700
Authentication-results: sj-dkim-1;; dkim=neutral
Cc: Thomas Bogendoerfer <>,,,, Andy Whitcroft <>
In-reply-to: <1218821875.23641.103.camel@nimitz>
Original-recipient: rfc822;
References: <> <1218753308.23641.56.camel@nimitz> <> <> <1218815299.23641.80.camel@nimitz> <> <> <> <1218821875.23641.103.camel@nimitz>
User-agent: Thunderbird (X11/20080501)
Dave Hansen wrote:
On Fri, 2008-08-15 at 10:16 -0700, C Michael Sundius wrote:
Ah, your right. thanks.  "but it's not necessar*il*y a good idea".
That is to say, we don't put
memory above 2 GiB. No need to make the mem_section[] array bigger
than need be.

This gives further credence for it to be a configurable in Kconfig as

I definitely don't want it to be something that users see.  It is never
enough overhead to really care.  On a 16TB system with 16MB sections,
the mem_section[] array is still only 16MB!!

So, I'd say to just make it as big as the arch needs in the worst case
(smallest SECTION_SIZE_BITS and largest MAX_PHYSMEM_BITS) and leave it.
We might even want to merge the 32 and 64-bit versions.

For your 32-bit version, we now use:
8 bytes (2 32-bit words) for each mem_section[]
2GB/128MB sections = 16
So, that's only 512 bytes.

For the 64-bit version, we now use:
16 bytes (2 64-bit words) for each mem_section[]
32GB/256MB sections = 128
So, that's only 2048 bytes.

If we were to merge the 32 and 64-bit versions to:
#define SECTION_SIZE_BITS       27
#define MAX_PHYSMEM_BITS        35

Your 32-bit version would go to 2048 bytes, and the 64-bit version would
go to 4096 bytes.  The 32-bit version would we able to address more
memory, and the 64-bit version would be able to handle smaller memory
holes more efficiently.
-- Dave

Ah, compromise :] that's why you get paid the big bux dave. thanks.

diff --git a/Documentation/sparsemem.txt b/Documentation/sparsemem.txt
new file mode 100644
index 0000000..89656e3
--- /dev/null
+++ b/Documentation/sparsemem.txt
@@ -0,0 +1,96 @@
+Sparsemem divides up physical memory in your system into N section of M
+bytes. Page descriptors are created for only those sections that
+actually exist (as far as the sparsemem code is concerned). This allows
+for holes in the physical memory without having to waste space by
+creating page discriptors for those pages that do not exist.
+When page_to_pfn() or pfn_to_page() are called there is a bit of overhead to
+look up the proper memory section to get to the descriptors, but this
+is small compared to the memory you are likely to save. So, it's not the
+default, but should be used if you have big holes in physical memory.
+Note that discontiguous memory is more closely related to NUMA machines
+and if you are a single CPU system use sparsemem and not discontig. 
+It's much simpler. 
+Once the bootmem allocator is up and running, you should call the
+sparsemem function "memory_present(node, pfn_start, pfn_end)" for each
+block of memory that exists on your system.
+The size of N and M above depend upon your architecture
+and your platform and are specified in the file:
+      include/asm-<your_arch>/sparsemem.h
+and you should create the following lines similar to below: 
+       #define SECTION_SIZE_BITS       27      /* 128 MiB */
+       #define MAX_PHYSMEM_BITS        31      /* 2 GiB   */
+if they don't already exist, where: 
+ * SECTION_SIZE_BITS            2^M: how big each section will be
+ * MAX_PHYSMEM_BITS             2^N: how much memory we can have in that
+                                     space
+Section size should be equal or less than the smallest block of
+memory in your system. Max physmem should be greater than or 
+equal to the highest physical memory address of memory in your
+You should make sure that you initialize the sparse memory code by calling 
+       bootmem_init();
+  +    sparse_init();
+       paging_init();
+just before you call paging_init() and after the bootmem_allocator is
+turned on in your setup_arch() code.  
+Add a line like this:
+into the config for your platform in arch/<your_arch>/Kconfig. This will
+ensure that turning on sparsemem is enabled for your platform. 
+Run make menuconfig or make gconfig, as you like, and turn on the sparsemem
+memory model under the "Kernel Type" --> "Memory Model" and then build your
+6) Gotchas
+One trick that I encountered when I was turning this on for MIPS was that there
+was some code in mem_init() that set the "reserved" flag for pages that were 
+valid RAM. This caused my kernel to crash when I enabled sparsemem since those
+pages (and page descriptors) didn't actually exist. I changed my code by adding
+lines like below:
+       for (tmp = highstart_pfn; tmp < highend_pfn; tmp++) {
+               struct page *page = pfn_to_page(tmp);
+   +           if (!pfn_valid(tmp))
+   +                   continue;
+   +
+               if (!page_is_ram(tmp)) {
+                       SetPageReserved(page);
+                       continue;
+               }
+               ClearPageReserved(page);
+               init_page_count(page);
+               __free_page(page);
+               physmem_record(PFN_PHYS(tmp), PAGE_SIZE, physmem_highmem);
+               totalhigh_pages++;
+       }
+Once I got that straight, it worked!!!! I saved 10MiB of memory.  
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index c6a063b..5b1af87 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -408,7 +408,6 @@ static void __init bootmem_init(void)
                /* Register lowmem ranges */
                free_bootmem(PFN_PHYS(start), size << PAGE_SHIFT);
-               memory_present(0, start, end);
@@ -420,6 +419,23 @@ static void __init bootmem_init(void)
         * Reserve initrd memory if needed.
+       /* call memory present for all the ram */
+       for (i = 0; i < boot_mem_map.nr_map; i++) {
+               unsigned long start, end;
+               /*
+                * memory present only usable memory.
+                */
+               if ([i].type != BOOT_MEM_RAM)
+                       continue;
+               start = PFN_UP([i].addr);
+               end   = PFN_DOWN([i].addr
+                                   +[i].size);
+               memory_present(0, start, end);
+       }
 #endif /* CONFIG_SGI_IP27 */
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 137c14b..31496a1 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -414,6 +414,9 @@ void __init mem_init(void)
        for (tmp = highstart_pfn; tmp < highend_pfn; tmp++) {
                struct page *page = pfn_to_page(tmp);
+               if (!pfn_valid(tmp))
+                       continue;
                if (!page_is_ram(tmp)) {
diff --git a/include/asm-mips/sparsemem.h b/include/asm-mips/sparsemem.h
index 795ac6c..64376db 100644
--- a/include/asm-mips/sparsemem.h
+++ b/include/asm-mips/sparsemem.h
@@ -6,7 +6,7 @@
  * SECTION_SIZE_BITS           2^N: how big each section will be
  * MAX_PHYSMEM_BITS            2^N: how much memory we can have in that space
-#define SECTION_SIZE_BITS       28
+#define SECTION_SIZE_BITS       27     /* 128 MiB */
 #define MAX_PHYSMEM_BITS        35
 #endif /* CONFIG_SPARSEMEM */
<Prev in Thread] Current Thread [Next in Thread>