linux-mips
[Top] [All Lists]

Re: sparsemem support for mips with highmem

To: Dave Hansen <dave@linux.vnet.ibm.com>
Subject: Re: sparsemem support for mips with highmem
From: C Michael Sundius <Michael.sundius@sciatl.com>
Date: Fri, 15 Aug 2008 11:17:21 -0700
Authentication-results: sj-dkim-1; header.From=Michael.sundius@sciatl.com; dkim=neutral
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>, linux-mm@kvack.org, linux-mips@linux-mips.org, jfraser@broadcom.com, Andy Whitcroft <apw@shadowen.org>
In-reply-to: <1218821875.23641.103.camel@nimitz>
Original-recipient: rfc822;linux-mips@linux-mips.org
References: <48A4AC39.7020707@sciatl.com> <1218753308.23641.56.camel@nimitz> <48A4C542.5000308@sciatl.com> <20080815080331.GA6689@alpha.franken.de> <1218815299.23641.80.camel@nimitz> <48A5AADE.1050808@sciatl.com> <20080815163302.GA9846@alpha.franken.de> <48A5B9F1.3080201@sciatl.com> <1218821875.23641.103.camel@nimitz>
Sender: linux-mips-bounce@linux-mips.org
User-agent: Thunderbird 2.0.0.14 (X11/20080501)
Dave Hansen wrote:
On Fri, 2008-08-15 at 10:16 -0700, C Michael Sundius wrote:
Ah, your right. thanks.  "but it's not necessar*il*y a good idea".
That is to say, we don't put
memory above 2 GiB. No need to make the mem_section[] array bigger
than need be.

This gives further credence for it to be a configurable in Kconfig as
well.

I definitely don't want it to be something that users see.  It is never
enough overhead to really care.  On a 16TB system with 16MB sections,
the mem_section[] array is still only 16MB!!

So, I'd say to just make it as big as the arch needs in the worst case
(smallest SECTION_SIZE_BITS and largest MAX_PHYSMEM_BITS) and leave it.
We might even want to merge the 32 and 64-bit versions.

For your 32-bit version, we now use:
8 bytes (2 32-bit words) for each mem_section[]
2GB/128MB sections = 16
So, that's only 512 bytes.

For the 64-bit version, we now use:
16 bytes (2 64-bit words) for each mem_section[]
32GB/256MB sections = 128
So, that's only 2048 bytes.

If we were to merge the 32 and 64-bit versions to:
#define SECTION_SIZE_BITS       27
#define MAX_PHYSMEM_BITS        35

Your 32-bit version would go to 2048 bytes, and the 64-bit version would
go to 4096 bytes.  The 32-bit version would we able to address more
memory, and the 64-bit version would be able to handle smaller memory
holes more efficiently.
-- Dave

Ah, compromise :] that's why you get paid the big bux dave. thanks.


diff --git a/Documentation/sparsemem.txt b/Documentation/sparsemem.txt
new file mode 100644
index 0000000..89656e3
--- /dev/null
+++ b/Documentation/sparsemem.txt
@@ -0,0 +1,96 @@
+Sparsemem divides up physical memory in your system into N section of M
+bytes. Page descriptors are created for only those sections that
+actually exist (as far as the sparsemem code is concerned). This allows
+for holes in the physical memory without having to waste space by
+creating page discriptors for those pages that do not exist.
+When page_to_pfn() or pfn_to_page() are called there is a bit of overhead to
+look up the proper memory section to get to the descriptors, but this
+is small compared to the memory you are likely to save. So, it's not the
+default, but should be used if you have big holes in physical memory.
+
+Note that discontiguous memory is more closely related to NUMA machines
+and if you are a single CPU system use sparsemem and not discontig. 
+It's much simpler. 
+
+1) CALL MEMORY_PRESENT()
+Once the bootmem allocator is up and running, you should call the
+sparsemem function "memory_present(node, pfn_start, pfn_end)" for each
+block of memory that exists on your system.
+
+2) DETERMINE AND SET THE SIZE OF SECTIONS AND PHYSMEM
+The size of N and M above depend upon your architecture
+and your platform and are specified in the file:
+
+      include/asm-<your_arch>/sparsemem.h
+
+and you should create the following lines similar to below: 
+
+       #define SECTION_SIZE_BITS       27      /* 128 MiB */
+       #define MAX_PHYSMEM_BITS        31      /* 2 GiB   */
+
+if they don't already exist, where: 
+
+ * SECTION_SIZE_BITS            2^M: how big each section will be
+ * MAX_PHYSMEM_BITS             2^N: how much memory we can have in that
+                                     space
+
+Section size should be equal or less than the smallest block of
+memory in your system. Max physmem should be greater than or 
+equal to the highest physical memory address of memory in your
+system.
+
+3) INITIALIZE SPARSE MEMORY
+You should make sure that you initialize the sparse memory code by calling 
+
+       bootmem_init();
+  +    sparse_init();
+       paging_init();
+
+just before you call paging_init() and after the bootmem_allocator is
+turned on in your setup_arch() code.  
+
+4) ENABLE SPARSEMEM IN KCONFIG
+Add a line like this:
+
+       select ARCH_SPARSEMEM_ENABLE
+
+into the config for your platform in arch/<your_arch>/Kconfig. This will
+ensure that turning on sparsemem is enabled for your platform. 
+
+5) CONFIG
+Run make menuconfig or make gconfig, as you like, and turn on the sparsemem
+memory model under the "Kernel Type" --> "Memory Model" and then build your
+kernel.
+
+
+6) Gotchas
+
+One trick that I encountered when I was turning this on for MIPS was that there
+was some code in mem_init() that set the "reserved" flag for pages that were 
not
+valid RAM. This caused my kernel to crash when I enabled sparsemem since those
+pages (and page descriptors) didn't actually exist. I changed my code by adding
+lines like below:
+
+
+       for (tmp = highstart_pfn; tmp < highend_pfn; tmp++) {
+               struct page *page = pfn_to_page(tmp);
+
+   +           if (!pfn_valid(tmp))
+   +                   continue;
+   +
+               if (!page_is_ram(tmp)) {
+                       SetPageReserved(page);
+                       continue;
+               }
+               ClearPageReserved(page);
+               init_page_count(page);
+               __free_page(page);
+               physmem_record(PFN_PHYS(tmp), PAGE_SIZE, physmem_highmem);
+               totalhigh_pages++;
+       }
+
+
+Once I got that straight, it worked!!!! I saved 10MiB of memory.  
+
+
+
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index c6a063b..5b1af87 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -408,7 +408,6 @@ static void __init bootmem_init(void)
 
                /* Register lowmem ranges */
                free_bootmem(PFN_PHYS(start), size << PAGE_SHIFT);
-               memory_present(0, start, end);
        }
 
        /*
@@ -420,6 +419,23 @@ static void __init bootmem_init(void)
         * Reserve initrd memory if needed.
         */
        finalize_initrd();
+
+       /* call memory present for all the ram */
+       for (i = 0; i < boot_mem_map.nr_map; i++) {
+               unsigned long start, end;
+
+               /*
+                * memory present only usable memory.
+                */
+               if (boot_mem_map.map[i].type != BOOT_MEM_RAM)
+                       continue;
+
+               start = PFN_UP(boot_mem_map.map[i].addr);
+               end   = PFN_DOWN(boot_mem_map.map[i].addr
+                                   + boot_mem_map.map[i].size);
+
+               memory_present(0, start, end);
+       }
 }
 
 #endif /* CONFIG_SGI_IP27 */
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 137c14b..31496a1 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -414,6 +414,9 @@ void __init mem_init(void)
        for (tmp = highstart_pfn; tmp < highend_pfn; tmp++) {
                struct page *page = pfn_to_page(tmp);
 
+               if (!pfn_valid(tmp))
+                       continue;
+
                if (!page_is_ram(tmp)) {
                        SetPageReserved(page);
                        continue;
diff --git a/include/asm-mips/sparsemem.h b/include/asm-mips/sparsemem.h
index 795ac6c..64376db 100644
--- a/include/asm-mips/sparsemem.h
+++ b/include/asm-mips/sparsemem.h
@@ -6,7 +6,7 @@
  * SECTION_SIZE_BITS           2^N: how big each section will be
  * MAX_PHYSMEM_BITS            2^N: how much memory we can have in that space
  */
-#define SECTION_SIZE_BITS       28
+#define SECTION_SIZE_BITS       27     /* 128 MiB */
 #define MAX_PHYSMEM_BITS        35
 
 #endif /* CONFIG_SPARSEMEM */
<Prev in Thread] Current Thread [Next in Thread>