Its Gives Latest Tech News Updates

It gives latest mobiles updates laptops updates

It gives latest Teachnology updates

It will gives Latest Opreating System Updates

It will Provides Various materials

It will Provides Smaple Examples Programming Materials.

Tuesday, 23 January 2018

How to Block the Unwanted E Mails


How to Block the Unwanted E Mails


If we want to block the un wanted emails from Gmail we have login to mail id and select which maild to want to block go for the Setting Gear Icon select the settings then it will open another window then go for Filters and Blocked address Click on Create New Filter and fill all the details and click on create filter then it will directly got the spam folder.

  

How to overcome blue screen errors

Over come blue Screen errors 

If we get the blue screen errors when we working with pc suddenly,then we have find errors in our hardware and software first we have check in the software drivers first in insall all the drivers including graphic drivers also and install all the drivers again problem will be sloved if it is getting again then we have go check in that ram,first dos based ram checking software will be avaliable in the internet download and write in CD in bottable,
then it will ask some permissions to check ram give then it will check ram  if it has any errors in the Ram then we have change the Ram,thee are the main cases blue screen will occur.

How to hide hardisk drive

How to hide Hardisk drive

We can hide the hardisk if anybody don't want to see the particular drive or data without effecting any data.First of all close all the files and go to desktop,Right click on my computer and select the option Disk management it will show all the drives select the drive and Right click on select change letter and path and select remove drive letter it will hide the the drive if u want again retrieve the drive then assign the drive letter it will again show drive .

Monday, 22 January 2018

Issues in Paging and Virtual Memory


Issues in Paging and Virtual Memory

Issues in Paging and Virtual Memory
Page Table Structure. Where to store page tables, issues in page table design.

In a real machine, page tables stored in physical memory. Several issues arise:

 How much memory does the page table take up?

How to manage the page table memory. Contiguous allocation? Blocked allocation? What about paging the page table?

On TLB misses, OS must access page tables. Issue: how the page table design affects the TLB miss penalty.

Real operating systems provide the abstraction of sparse address spaces. Issue: how well does a particular page table design support sparse address spaces?

Linear Page Table.



OS now faces variable-sized allocation problem for the page table storage.

Page table may occupy significant amount of physical memory. Page table for 32-bit address space with 4K byte pages has 232 / 212 = 220 entries. If  each entry is 32 bits, need 4M bytes of memory to store page table.

 Does not support sparse address spaces well - too many wasted entries.

TLB miss handler is very simple - just index the page table.

Two-Level Page Table. Page Table itself is broken up into pages. An outer page table indexes pages of page table. Assuming 4K byte pages and 32 byte page table entries, each page can hold 212 / 4 = 210 entries. There are 10 bits left in virtual address for index of outer page table. Virtual address
now has 10 bit outer page table index, 10 bit inner page table offset and 12 bit page offset.

Page table lookup for two level page table.Find physical page containing outer page table for process.  Extract top 10 bits of address. Index outer page table using extracted bits to get physical page number of page table page.

Extract next 10 bits of address. Index page table page using extracted bits to get physical page number of accessed page. Extract final 12 bits of address - the page offset. Index accessed physical page using 12 bit page offset. Evaluation of two level scheme.

Eliminates variable-sized allocation problem for page tables. Have one page for outer page table, and rest of page table is allocated in page-size chunks.

Have internal fragmentation - both for last page table page and for outer page table page. If page table takes up too much memory, can page the pages of the page table. Question: is there anything that OS MUST keep in physical memory?

Supports sparse address spaces - can eliminate page table pages if corresponding parts of address space are not valid. Increases TLB miss time. Have to perform two table lookups instead of
one.

Three level scheme. Like two level scheme, only with one more level. May become scheme of choice for machines with 64 bit address space. On such machines the outer page table can become much larger than one page. SPARC uses three level page tables. Primary job of page table: doing TLB reload. So why map entire address space? Instead, just maintain mappings for pages resident in physical memory.

Inverted Page Table. Has one entry for each physical page frame specifying process that owns the page and the virtual address of page frame. On a TLB miss, search inverted page table data structure to find physical page frame for virtual address of process generating the access. Speed up
the lookup by: Hashing

 Associative table of recently accessed entries.  IBM machines (RS/6000, RT, System 38) and HP Spectrum machines use this scheme.

What if accessed page is not in memory? Must look up the disk location in a data structure that looks much like a standard page table. Since this data structure should not be accessed very often, it can be paged. All of these schemes have advantages and disadvantages. Which one should the hardware implement? Answer: hardware designer does not have to decide! Most modern machines handle TLB misses in software, so the OS can use whatever page table scheme it wants. Hardware only ``knows'' about the TLB.

Sharing code and data. If two page table entries in different processes point to same physical page, the processes share the memory. If one process writes the data, other process will see the changes. Is a very efficient way to communicate.

Can also share code. For example, only one copy of editor or compiler code can be kept in memory, and all editor or compiler processes can execute that one copy of the code. Helps memory utilization.

Concept of reentrant code. Reentrant code cannot modify itself and must make sure that it has a separate copy of per-process global variables. All of your Nachos kernel code should be reentrant.

Complications with sharing code: virtually indexed caches and inverted page tables.Virtual Memory. Basic idea: main memory used as a cache for backing store. Usual solution: demand paging. A page can be resident either on disk or in main memory.

First extension for demand paging: the valid bit. Each page table or TLB entry has a valid bit. If the valid bit is set, the page is in physical memory. In a simple system, page is on disk if valid bit is not set. The operating system manages the transfer of pages to and from the backing store. It manages the valid bit and the page table setup.

What does OS do on a page fault?

Trap to OS.

Save user registers and process state. Determine that exeception was page fault. Check that reference was legal and find page on disk. Find a free page frame. Issue read from disk to free page frame.
Queue up for disk. Program disk controller to read page. Wait for seek and latency. Transfer page into memory. As soon as program controller, allocate CPU to another process. Must schedule the CPU, restore process state. Take disk transfer completed interrupt. Save user registers and process state.
Determine that interrupt was a disk interrupt. Find process and update page tables. Reschedule CPU.
Restore process state and resume execution. How well do systems perform under demand paging? Compute the effective access time. p is proportion of memory accesses that generate a page fault
(0 <= p <= 1). What is effective acess time? If data in memory, between 10 and 200 nanoseconds, depending on if it is cached or not. Call it 100 nanoseconds for purposes of argument. Retrieving page from disk may take 25 milliseconds - latency of 8 milliseconds, seek of 15 milliseconds and
transfer of 1 millisecond. Add on OS time for page fault handling, and get 25 milliseconds.

Effective access time = (1-p) * 100 + p * 25*106. If we want overall effective access time to be 110, less than one memory reference out of 2.5 * 106 can fault.

In the future, difference between local accesses and faulting accesses will only get worse. In practice people simply do not run computations with a lot of paging - they just buy more memory or run smaller computations.

Where to swap. Swap space - a part of disk dedicated to paging. Can usually make swap space accesses go faster than normal file accesses because avoid the overheads associated with a normal file system.

On the other hand, using the file system immediately makes the paging system work with any device that the file system is implemented on. So, can page remotely on a diskless workstation using file system.

May not always use backing store for all of process's data. Executable code. Can just use executable file on disk as backing store. (Problem: recompilation).

Unreferenced pages in uninitialized data segment. Just zero the page on first access - no need to access backing store. Called zero on demand paging.

To get a free page, may need to write a page out to backing store. When write a page out, need to clear the valid bit for the corresponding page table entry. A core map helps this process along. A core map records, for each physical page frame, which process and virtual page occupy that page
frame. Core maps will be useful for other operations.

When invalidate a page, must also clear the TLB to avoid having a stale entry cached. Page replacement algorithms. Which page to swap out? Two considerations: A page that will not be accessed for a long time.

A clean page that does not have to be written back to the backing store. Hardware provides two bits to help the OS develop a reasonable page replacement policy. Use bit. Set every time page accessed.
Dirty bit. Set every time page written.

Hardware with software-managed TLBs only set the bits in the TLB. So, TLB fault handlers must keep TLB entries coherent with page table entries when eject TLB entries. There is another way to synthesize these bits in software that makes TLB reload faster.


How to evaluate a given policy? Consider how well it works with strings of page references. So, the string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 represents a sequence of references to pages 1, 2, 3, 4, 1, 2, etc.

FIFO page replacement. When need to replace a page, choose the first page brought in. So, if we have three physical page frames, here is what happens for the above sequence:

Pages 1, 2, 3 brought in to memory. Page 1 ejected, page 4 in - 2, 3, 4 in memory. Page 2 ejected, page 1 in - 3, 4, 1 in memory. Page 3 ejected, page 2 in - 4, 1, 2 in memory. Page 4 ejected, page 5 in - 1, 2, 5 in memory. Pages 1 and 2 accessed in memory. Page 1 ejected, page 3 in - 2, 5, 3 in memory.
Page 2 ejected, page 4 in - 5, 3, 2 in memory. Page 5 accessed in memory. 9 page faults total. What is disadvantage of FIFO? May eject a heavily used page.

Belady's anomaly - adding more physical memory may actually make paging algorithm behave worse! Consider above example with four physical page frames.

Pages 1, 2, 3, 4 brought in to memory. Pages 1 and 2 accessed in memory. Page 1 ejected, page 5 in - 2, 3, 4, 5 in memory. Page 2 ejected, page 1 in - 3, 4, 5, 1 in memory. Page 3 ejected, page 2 in - 4, 5, 1, 2 in memory. Page 4 ejected, page 3 in - 5, 1, 2, 3 in memory. Page 5 ejected, page 4 in - 1, 2, 3, 4 in memory. Page 1 ejected, page 5 in - 2, 3, 4, 5 in memory. 10 page faults total. with strings of page references. So, the string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 represents a sequence of references to pages 1, 2,

LRU - eject least recently used page. Consider above example with four physical page frames. Pages 1, 2, 3, 4 brought in to memory. Pages 1 and 2 accessed in memory - 3, 4, 1 2 in memory. Page 3 ejected, page 5 in - 4, 1, 2, 5 in memory. Pages 1 and 2 accessed in memory. Page 4 ejected, page 3 in - 5, 1, 2, 3 in memory. Page 5 ejected, page 4 in - 1, 2, 3, 4 in memory. Page 1 ejected, page 5 in - 2, 3, 4, 5 in memory. 8 page faults total. How to implement LRU? Two strategies: Build a clock and mark each page with the time every time it is accessed. Move page to front of list every time it is accessed. Both strategies are totally impractical on a modern computer - overhead is too large.

So, implement an approximation to LRU. One version is equivalent to LRU with a clock that ticks very slowly. So, many pages may be marked with the same time. Have a low-resolution LRU.

Implement using use bits. Periodically, OS goes through all pages in physical memory and shifts the use bit into a history register for the page. It then clears the use bit. When read in a new page, clear page's history register.

FIFO with second chance. Basically, use a FIFO page ordering. Keep a FIFO queue of pages to be paged out. But, if page at front of FIFO list has its use bit on when it is due to be paged out, clear the use bit and put page at end of FIFO queue.

Can enhance FIFO with second chance to take into account four levels of  page replacement desirability:

use = 0, dirty = 0: Best page to replace. use = 0, dirty = 1: Next best - has not been recently used.
use = 1, dirty = 0: Next best - don't have to write out. use = 1, dirty = 1: Worst. Go through the FIFO list several times. Each time, look for next highest level. Stop when find first suitable page.

Most paging algorithms try to free up pages in advance. So, don't have to write ejected page out when the fault actually happens. Keep a list of modified pages, and write pages out to disk whenever paging device is idle. Then clear the dirty bits. Increases probability that page will be clean when it is ejected, so don't have to write page out.

Keep a pool of free frames, but remember which virtual pages they contain. If get a reference for one of these virtual pages, retrieve from the free frame pool. VAX/VMS uses this with a FIFO replacement policy - use bit didn't work on early versions of VAX!

What if hardware does not implement use or dirty bits. Can the OS? Yes, if  hardware has a valid and readonly bit. Hardware traps if valid bit is not set in a page table or TLB entry. Hardware also traps if readonly bit is set and reference is a write.

To implement a use bit, OS clears valid bit every time it clears use bit. OS keeps track of true state of page in different data structure. When the first reference to the page traps, OS figures out that page is really resident, and sets use and valid bits.

To implement dirty bit, OS sets readonly bit on clean pages. OS keeps track of true state of page in different data structure. When the first write traps, OS sets dirty bit and clears readonly bit.

Many systems use this scheme even if TLB has dirty and use bits - with this scheme, don't have to rewrite page table entries when eject an entry from TLB.

Concept of a working set. Each process has a set of pages that it frequently accesses. For example, if the process is doing a grid relaxation algorithm, the working set would include the pages storing the grd and the pages storing the grid relaxation code.

Working set may change over time. If the process finishes the relaxing one grid and starts relaxing another, the pages for the old grid drop out of the working set and the pages for the new grid come into the working set.

Discussion so far focussed on running program. Two complications: loading a program and extending address space. Invariant: must reserve space in backing store for all pages of a running process. If don't, may get in a situation when need to eject a page, but backing store is full.

When load a process, must reserve space in backing store. Note: do not have to actually write pages to backing store - can just load into memory. Makes startup time faster.

What needs to be initialized? Only init data segment, in principle.

Can use zero on demand pages for uninit data segment.

Can use executable file to hold code.

Makes sense to preload much of code segment to make startup go faster. Of course, must be prepared to page even during startup - what if init data segment does not fit in available physical memory?

Must also allocate more backing store when extend address space via something like malloc.

What is involved to allocate backing store pages? Just manipulate data structure in kernel memory that maintains state for backing store page allocation.

Thrashing. A system thrashes if physical memory is too small to hold the working sets of all the processes running. The upshot of thrashing is that pages always spend their time waiting for the backing store to fetch their pages.

Thrashing is a discrete phenonmenon. Usually, there is a phase transition from no thrashing to thrashing.

Typical way thrashing came about in early batch systems. Scheduler brought in new process whenever CPU utilization goes down. Eventually the size of the working sets become larger than physical memory, and processes start to page. The CPU utilization drops even further, so more processes come in, and the system starts to thrash. Throughput drops like crazy.

Eliminating thrashing. Must drop degree of multiprogramming. So, swap all of a process out to backing store and suspend process.

Come up with a Page Fault Frequency based solution. Basic idea: a process should have an ideal page fault frequency. If it faults too often, need to give it more physical page frames. If it faults too infrequently, it has too many physical page frames and need to take some from it and give to
other process. When all processes fault too frequently, choose one to swap out.

Another swapping algorithm - keep a free list threshold. When processes demand free pages at a high rate, free list will be consumed faster than pages can be swapped out to fill it. When amount of free memory goes above threshold, system starts swapping out processes. They start coming back in
when free memory drops back below threshold.

Page size. How big are current page sizes? 4K bytes or 8K bytes. Why aren't
they bigger?

Tradition and existing code.

Increased fragmentation.

Why aren't they smaller?

Smaller page size has more page faults for same working set size. More TLB entries required.
Larger page tables required. Page sizes have increased as memory has become cheaper and page faults more relatively expensive. Will probably increase in the future.

Pinning pages. Some pages cannot or should not be paged out. There is some data that OS must always have resident to operate correctly. Some memory accessed very frequently - disk buffers.

Sometimes DMA into memory - DMA device usually works with physical addresses, so must lock corresponding page into memory until DMA finishes. 

Implementing Synchronization Operations


Implementing Synchronization Operations

Implementing Synchronization Operations
How do we implement synchronization operations like locks? Can build synchronization operations out of atomic reads and writes. There is a lot of literature on how to do this, one algorithm is called the bakery algorithm. But, this is slow and cumbersome to use. So, most machines have hardware support for synchronization - they provide synchronization instructions.

On a uniprocessor, the only thing that will make multiple instruction sequences not atomic is interrupts. So, if want to do a critical section, turn off interrupts before the critical section and turn on interrupts after the critical section. Guaranteed atomicity. It is also fairly efficient. Early versions of Unix did this.

Why not just use turning off interrupts? Two main disadvantages: can't use in a multiprocessor, and can't use directly from user program for synchronization.

Test-And-Set. The test and set instruction atomically checks if a memory location is zero, and if so, sets the memory location to 1. If the memory location is 1, it does nothing. It returns the old value of the memory location. You can use test and set to implement locks as follows:



The lock state is implemented by a memory location. The location is 0 if  the lock is unlocked and 1 if the lock is locked.

The lock operation is implemented as:

o        while (test-and-set(l) == 1);


The unlock operation is implemented as: *l = 0;

The problem with this implementation is busy-waiting. What if one thread already has the lock, and another thread wants to acquire the lock? The acquiring thread will spin until the thread that already has the lock unlocks it.

What if the threads are running on a uniprocessor? How long will the acquiring thread spin? Until it expires its quantum and thread that will unlock the lock runs. So on a uniprocessor, if can't get the thread the first time, should just suspend. So, lock acquisition looks like this:

·         while (test-and-set(l) == 1) {
·           currentThread->Yield();
·         }

Can make it even better by having a queue lock that queues up the waiting threads and gives the lock to the first thread in the queue. So, threads never try to acquire lock more than once.

On a multiprocessor, it is less clear. Process that will unlock the lock may be running on another processor. Maybe should spin just a little while, in hopes that other process will release lock. To evaluate spinning and suspending strategies, need to come up with a cost for each suspension
algorithm. The cost is the amount of CPU time the algorithm uses to acquire a lock.

There are three components of the cost: spinning, suspending and resuming. What is the cost of spinning? Waste the CPU for the spin time. What is cost of suspending and resuming? Amount of CPU time it takes to suspend the thread and restart it when the thread acquires the lock.

Each lock acquisition algorithm spins for a while, then suspends if it didn't get the lock. The optimal algorithm is as follows:

If the lock will be free in less than the suspend and resume time, spin until acquire the lock.

If the lock will be free in more than the suspend and resume time, suspend immediately.

Obviously, cannot implement this algorithm - it requires knowledge of the future, which we do not in general have.

How do we evaluate practical algorithms - algorithms that spin for a while, then suspend. Well, we compare them with the optimal algorithm in the worst case for the practical algorithm. What is the worst case for any practical algorithm relative to the optimal algorithm? When the lock become free just after the practical algorithm stops spinning.

What is worst-case cost of algorithm that spins for the suspend and resume time, then suspends? (Will call this the SR algorithm). Two times the suspend and resume time. The worst case is when the lock is unlocked just after the thread starts the suspend. The optimal algorithm just spins until the lock is unlocked, taking the suspend and resume time to acquire the lock. The SR algorithm costs twice the suspend and resume time -it first spins for the suspend and resume time, then suspends, then gets the lock, then resumes.

What about other algorithms that spin for a different fixed amount of time then block? Are all worse than the SR algorithm.

If spin for less than suspend and resume time then suspend (call this the LT-SR algorithm), worst case is when lock becomes free just after start the suspend. In this case the the algorithm will cost spinning time plus suspend and resume time. The SR algorithm will just cost the spinning time.

If spin for greater than suspend and resume time then suspend (call this the GR-SR algorithm), worst case is again when lock becomes free just after start the suspend. In this case the SR algorithm will also suspend and resume, but it will spin for less time than the GT-SR algorithm Of course, in practice locks may not exhibit worst case behavior, so best algorithm depends on locking and unlocking patterns actually observed.

Here is the SR algorithm. Again, can be improved with use of queueing locks.

         notDone = test-and-set(l);
         if (!notDone) return;
         start = readClock();
         while (notDone) {
           stop = readClock();
           if (stop - start >= suspendAndResumeTime) {
             currentThread->Yield();
             start = readClock();
           }
           notDone = test-and-set(l);
         }
There is an orthogonal issue. test-and-set instruction typically consumes bus resources every time. But a load instruction caches the data. Subsequent loads come out of cache and never hit the bus. So, can do something like this for inital algorithm:

         while (1) {
           if !test-and-set(l) break;
           while (*l == 1);
         }

Are other instructions that can be used to implement spin locks - swap instruction, for example.

On modern RISC machines, test-and-set and swap may cause implementation headaches. Would rather do something that fits into load/store nature of Architecture. So, have a non-blocking abstraction: Load Linked(LL)/Store Conditional(SC).

Semantics of LL: Load memory location into register and mark it as loaded by this processor. A memory location can be marked as loaded by more than one processor.

Semantics of SC: if the memory location is marked as loaded by this processor, store the new value and remove all marks from the memory location. Otherwise, don't perform the store. Return whether or not the store succeeded.

Here is how to use LL/SC to implement the lock operation:

           while (1) {
             LL  r1, lock
             if (r1 == 0) {
               LI r2, 1
               if (SC r2, lock) break;
             }
           }

Unlock operation is the same as before.

Can also use LL/SC to implement some operations (like increment) directly. People have built up a whole bunch of theory dealing with the difference in power between stuff like LL/SC and test-and-set.

           while (1) {
             LL   r1, lock
             ADDI r1, 1, r1
             if (SC r2, lock) break;
           }

Note that the increment operation is non-blocking. If two threads start to perform the increment at the same time, neither will block - both will complete the add and only one will successfully perform the SC. The other will retry. So, it eliminates problems with locking like: one thread acquires locks and dies, or one thread acquires locks and is suspended for a long time, preventing other threads that need to acquire the lock from proceeding.

Processes And Threads


Processes And Threads

Processes And Threads
I. A process is an execution stream in the context of a particular process state.

  1. An execution stream is a sequence of instructions.

  2. Process state determines the effect of the instructions. It usually
      includes (but is not restricted to):

          Registers
          Stack
          Memory (global variables and dynamically allocated memory)
          Open file tables
          Signal management information

Key concept: processes are separated: no process can directly affect the state of another process.

II. Process is a key OS abstraction that users see - the environment you interact with when you use a computer is built up out of processes.



   1.The shell you type stuff into is a process.

   2.When you execute a program you have just compiled, the OS generates a
      process to run the program.

   3.Your WWW browser is a process.

III. Organizing system activities around processes has proved to be a useful way of separating out different activities into coherent units.

IV. Two concepts: uniprogramming and multiprogramming.

   1.Uniprogramming: only one process at a time. Typical example: DOS. Problem: users often wish to perform more than one activity at a time  (load a remote file while editing a program, for example), and  uniprogramming does not allow this. So DOS and other uniprogrammed  systems put in things like memory-resident programs that invoked asynchronously, but still have separation problems. One key problem with DOS is that there is no memory protection - one program may write  the memory of another program, causing weird bugs.

  2.Multiprogramming: multiple processes at a time. Typical of Unix plus all currently envisioned new operating systems. Allows system to separate out activities cleanly.

V. Multiprogramming introduces the resource sharing problem - which processes get to use the physical resources of the machine when? One  crucial resource: CPU. Standard solution is to use preemptive multitasking - OS runs one process for a while, then takes the CPU away from that
process and lets another process run. Must save and restore process state. Key issue: fairness. Must ensure that all processes get their fair share of the CPU.

VI. How does the OS implement the process abstraction? Uses a context switch to switch from running one process to running another process.

VII. How does machine implement context switch? A processor has a limited amount of physical resources. For example, it has only one register set. But every process on the machine has its own set of registers. Solution: save and restore hardware state on a context switch. Save the state in
Process Control Block (PCB). What is in PCB? Depends on the hardware.

   1.Registers - almost all machines save registers in PCB.

   2.Processor Status Word.

   3.What about memory? Most machines allow memory from multiple processes     to coexist in the physical memory of the machine. Some may require    Memory Management Unit (MMU) changes on a context switch. But, some    early personal computers switched all of process's memory out to disk
   (!!!).

VIII.Operating Systems are fundamentally event-driven systems - they wait for an event to happen, respond appropriately to the event, then wait for the next event. Examples:

  1.User hits a key. The keystroke is echoed on the screen.

  2.A user program issues a system call to read a file. The operating    system figures out which disk blocks to bring in, and generates a    request to the disk controller to read the disk blocks into memory.

  3.The disk controller finishes reading in the disk block and generates    and interrupt. The OS moves the read data into the user program and  restarts the user program.

  4.A Mosaic or Netscape user asks for a URL to be retrieved. This eventually generates requests to the OS to send request packets out over the network to a remote WWW server. The OS sends the packets.

  5.The response packets come back from the WWW server, interrupting the processor. The OS figures out which process should get the packets, then routes the packets to that process.

  6.Time-slice timer goes off. The OS must save the state of the current  process, choose another process to run, the give the CPU to that  process.

IX.When build an event-driven system with several distinct serial  activities, threads are a key structuring mechanism of the OS.

X. A thread is again an execution stream in the context of a thread state. Key difference between processes and threads is that multiple threads share parts of their state. Typically, allow multiple threads to read and write same memory. (Recall that no processes could directly access memory of
another process). But, each thread still has its own registers. Also has its own stack, but other threads can read and write the stack memory.

XI. What is in a thread control block? Typically just registers. Don't need to do anything to the MMU when switch threads, because all threads can access same memory.

XII.Typically, an OS will have a separate thread for each distinct activity. In particular, the OS will have a separate thread for each process, and that thread will perform OS activities on behalf of the
process. In this case we say that each user process is backed by a kernel thread.

  1.When process issues a system call to read a file, the process's thread will take over, figure out which disk accesses to generate, and issue the low level instructions required to start the transfer. It then suspends until the disk finishes reading in the data.

  2.When process starts up a remote TCP connection, its thread handles the low-level details of sending out network packets.

XIII.Having a separate thread for each activity allows the programmer to program the actions associated with that activity as a single serial stream of actions and events. Programmer does not have to deal with the complexity of interleaving multiple activities on the same thread.

XIV.Why allow threads to access same memory? Because inside OS, threads must coordinate their activities very closely.

  1.If two processes issue read file system calls at close to the same    time, must make sure that the OS serializes the disk requests appropriately.

  2.When one process allocates memory, its thread must find some free    memory and give it to the process. Must ensure that multiple threads allocate disjoint pieces of memory.

 Having threads share the same address space makes it much easier to coordinate activities - can build data structures that represent system state and have threads read and write data structures to figure out what to do when they need to process a request.

XV. One complication that threads must deal with: asynchrony. Asynchronous events happen arbitrarily as the thread is executing, and may interfere with the thread's activities unless the programmer does something to limit the asynchrony. Examples:

 1.An interrupt occurs, transferring control away from one thread to an
interrupt handler.

 2.A time-slice switch occurs, transferring control from one thread to
another.

 3.Two threads running on different processors read and write the same
memory.

XVI.Asynchronous events, if not properly controlled, can lead to incorrect behavior.
Examples:
 1.Two threads need to issue disk requests. First thread starts to program disk controller (assume it is memory-mapped, and must issue multiple writes to specify a disk operation). In the meantime, the second thread runs on a different processor and also issues the memory-mapped writes to program the disk controller. The disk controller gets horribly confused and reads the wrong disk block.

 2.Two threads need to write to the display. The first thread starts to build its request, but before it finishes a time-slice switch occurs and the second thread starts its request. The combination of the two threads issues a forbidden request sequence, and smoke starts pouring out of the display.

 3.For accounting reasons the operating system keeps track of how much time is spent in each user program. It also keeps a running sum of the total amount of time spent in all user programs. Two threads increment their local counters for their processes, then concurrently increment the global
counter. Their increments interfere, and the recorded total time spent in all user processes is less than the sum of the local times.

XVII.So, programmers need to coordinate the activities of the multiple threads so that these bad things don't happen. Key mechanism: synchronization operations. These operations allow threads to control the timing of their events relative to events in other threads. Appropriate use allows programmers to avoid problems like the ones outlined above.

Operating System Potpourri


OS Potpourri

OS Potpourri
When does a process need to access OS functionality? Here are several examples

Reading a file. The OS must perform the file system operations required to read the data off of disk.

Creating a child process. The OS must set stuff up for the child process.

Sending a packet out onto the network. The OS typically handles the network interface.

Why have the OS do these things? Why doesn't the process just do them
directly?

Convenience. Implement the functionality once in the OS and encapsulate it behind an interface that everyone uses. So, processes just deal with the simple interface, and don't have to write complicated low-level code to deal with devices.



Portability. OS exports a common interface typically available on many hardware platforms. Applications do not contain hardware-specific code.

  Protection. If give applications complete access to disk or network or whatever, they can corrupt data from other applications, either maliciously or because of bugs. Having the OS do it eliminates security problems between applications. Of course, applications still have to trust the OS.

How do processes invoke OS functionality? By making a system call. Conceptually, processes call a subroutine that goes off and performs the required functionality. But OS must execute in a different protection domain than the application. Typically, OS executes in supervisor mode, which allows it to do things like manipulate the disk directly.

To switch from normal user mode to supervisor mode, most machines provide a system call instruction. This instruction causes an exception to take place. The hardware switches from user mode to supervisor mode and invokes the exception handler inside the operating system. There is typically some kind of convention that the process uses to interact with the OS.

Let's do an example - the Open system call. System calls typically start out with a normal subroutine call. In this case, when the process wants to open a file, it just calls the Open routine in a system library someplace.


    /* Open the Nachos file "name", and return an "OpenFileId" that can
          * be used to read and write to the file.
          */
         OpenFileId Open(char *name);

Inside the library, the Open subroutine executes a syscall instruction, which generates a system call exception.

         Open:
                 addiu $2,$0,SC_Open
                 syscall
                 j       $31
                 .end Open

By convention, the Open subroutine puts a number (in this case SC_Open) into register 2. Inside the exception handler the OS looks at register 2 to figure out what system call it should perform.

The Open system call also takes a parameter - the address of the character string giving the name of the file to open. By convention, the compiler puts this parameter into register 4 when it generates the code that calls the Open routine in the library. So, the OS looks in that register to find the address of the name of the file to open.

More conventions: succeeding parameters are put into register 5, register 6, etc. Any return values from the system call are put into register 2.

Inside the exception handler, the OS figures out what action to take, performs the action, then returns back to the user program.

There are other kinds of exceptions. For example, if the program attempts to deference a NULL pointer, the hardware will generate an exception. The OS will have to figure out what kind of exception took place and handle it accordingly. Another kind of exception is a divide by 0 fault.

Similar things happen on a interrupt. When an interrupt occurs, the hardware puts the OS into supervisor mode and invokes an interrupt handler. The difference between interrupts and exceptions is that interrupts are generated by external events (the disk IO completes, a new character is
typed at the console, etc.) while exceptions are generated by a running program.

Object file formats. To run a process, the OS must load in an executable file from the disk into memory. What does this file contain? The code to run, any initialized data, and a specification for how much space the uninitialized data takes up. May also be other stuff to help debuggers run,
etc.


The compiler, linker and OS must agree on a format for the executable file.
For example, Nachos uses the following format for executables:

    #define NOFFMAGIC       0xbadfad    /* magic number denoting Nachos
                                                  * object code file
                                                  */
         typedef struct segment {
           int virtualAddr;              /* location of segment in virt
addr space */
·           int inFileAddr;               /* location of segment in this
file */
           int size;                     /* size of segment */
         } Segment;
         typedef struct noffHeader {
            int noffMagic;               /* should be NOFFMAGIC */
            Segment code;                /* executable code segment */
            Segment initData;            /* initialized data segment */
            Segment uninitData;          /* uninitialized data segment --
                                          * should be zero'ed before use
                                          */
         } NoffHeader;

What does the OS do when it loads an executable in?

    Reads in the header part of the executable. Checks to see if the magic number matches.

  Figures out how much space it needs to hold the process. This includes space for the stack, the code, the initialized data and the uninitialized data.

    If it needs to hold the entire process in physical memory, it goes off and finds the physical memory it needs to hold the process.

    It then reads the code segment in from the file to physical memory.     It then reads the initialized data segment in from the file to physical memory.

    It zeros the stack and unintialized memory.How does the operating system do IO? First, we give an overview of how the hardware does IO.

There are two basic ways to do IO - memory mapped IO and programmed IO.  Memory mapped IO - the control registers on the IO device are mapped into the memory space of the processor. The processor controls the device by performing reads and writes to the addresses that the IO device is mapped into.

 Programmed IO - the processor has special IO instructions like IN and OUT. These control the IO device directly.

Writing the low level, complex code to control devices can be a very tricky business. So, the OS encapsulates this code inside things called device drivers. There are several standard interfaces that device drivers present to the kernel. It is the job of the device driver to implement its standard
interface for its device. The rest of the OS can then use this interface and doesn't have to deal with complex IO code.

For example, Unix has a block device driver interface. All block device drivers support a standard set of calls like open, close, read and write. The disk device driver, for example, translates these calls into operations that read and write sectors on the disk.

Typically, IO takes place asynchronously with respect to the processor. So, the processor will start an IO operation (like writing a disk sector), then go off and do some other processing. When the IO operation completes, it interrupts the processor. The processor is typically vectored off to an
interrupt handler, which takes whatever action needs to take place.

Here is how Nachos does IO. Each device presents an interface. For example, the disk interface is in disk.h, and has operations to start a read and write request. When the request completes, the "hardware" invokes the Handle Interrupt method.

Only one thread can use each device at a time. Also, threads typically want to use devices synchronously. So, for example, a thread will perform a disk operation then wait until the disk operation completes. Nachos therefore encapsulates the device interface inside a higher level interface that provides synchronous, synchronized access to the device. For the disk device, this interface is in synchdisk.h. This provides operations to read and write sectors, for example.

Each method in the synchronous interface ensures exclusive access to the IO device by acquiring a lock before it performs any operation on the device.

When the synchronous method gets exclusive access to the device, it performs the operation to start the IO. It then uses a semaphore (P operation) to block until the IO operation completes. When the IO operation completes, it invokes an interrupt handler. This handler performs a V operation on the semaphore to unblock the synchronous method. The synchronous method then releases the lock and returns back to the calling thread.

Introduction To Paging


Introduction To Paging

Introduction To Paging
Basic idea: allocate physical memory to processes in fixed size chunks 
called page frames. Present abstraction to application of a single linear address space. Inside machine, break address space of application up into fixed size chunks called pages. Pages and page frames are same size. Store pages in page frames. When process generates an address, dynamically translate to the physical page frame which holds data for that page.

So, a virtual address now consists of two pieces: a page number and an offset within that page. Page sizes are typically powers of 2; this simplifies extraction of page numbers and offsets. To access a piece of data at a given address, system automatically does the following:

Extracts page number.

Extracts offset.

Translate page number to physical page frame id.



Accesses data at offset in physical page frame.

How does system perform translation? Simplest solution: use a page table. Page table is a linear array indexed by virtual page number that gives the physical page frame that contains that page. What is lookup process?

Extract page number.

Extract offset.

Check that page number is within address space of process.

Look up page number in page table.

Add offset to resulting physical page number

Access memory location.

Problem: for each memory access that processor generates, must now generate two physical memory accesses.

Speed up the lookup problem with a cache. Store most recent page lookup values in TLB. TLB design options: fully associative, direct mapped, set associative, etc. Can make direct mapped larger for a given amount of circuit space.

How does lookup work now?

Extract page number.

Extract offset.

Look up page number in TLB.

If there, add offset to physical page number and access memory location. Otherwise, trap to OS. OS performs check, looks up physical page number, and loads translation into TLB. Restarts the instruction.

Like any cache, TLB can work well, or it can work poorly. What is a good and bad case for a direct mapped TLB? What about fully associative TLBs, or set associative TLB?

Fixed size allocation of physical memory in page frames dramatically simplifies allocation algorithm. OS can just keep track of free and used pages and allocate free pages when a process needs memory. There is no fragmentation of physical memory into smaller and smaller allocatable chunks.

But, are still pieces of memory that are unused. What happens if a program's address space does not end on a page boundary? Rest of page goes unused. Book calls this internal fragmentation.

How do processes share memory? The OS makes their page tables point to the same physical page frames. Useful for fast interprocess communication mechanisms. This is very nice because it allows transparent sharing at speed.

What about protection? There are a variety of protections:

Preventing one process from reading or writing another process' memory.

Preventing one process from reading another process' memory.

Preventing a process from reading or writing some of its own memory.

Preventing a process from reading some of its own memory.

How is this protection integrated into the above scheme?



Preventing a process from reading or writing memory: OS refuses to establish a mapping from virtual address space to physical page frame containing the protected memory. When program attempts to access this memory, OS will typically generate a fault. If user process catches the fault, can take action to fix things up.

Preventing a process from writing memory, but allowing a process to read memory. OS sets a write protect bit in the TLB entry. If process attempts to write the memory, OS generates a fault. But, reads go through just fine.


Virtual Memory Introduction.

When a segmented system needed more memory, it swapped segments out to disk and then swapped them back in again when necessary. Page based systems can do something similar on a page basis.

Basic idea: when OS needs to a physical page frame to store a page, and there are none free, it can select one page and store it out to disk. It can then use the newly free page frame for the new page. Some pragmatic considerations:

In practice, it makes sense to keep a few free page frames. When number of  free pages drops below this threshold, choose a page and store it out. This way, can overlap I/O required to store out a page with computation that uses the newly allocated page frame.

In practice the page frame size usually equals the disk block size. Why?

Do you need to allocate disk space for a virtual page before you swap it out? (Not if always keep one page frame free) Why did BSD do this? At some point OS must refuse to allocate a process more memory because has no swap space. When can this happen? (malloc, stack extension, new process
creation).

When process tries to access paged out memory, OS must run off to the disk, find a free page frame, then read page back off of disk into the page frame and restart process.

What is advantage of virtual memory/paging?

Can run programs whose virtual address space is larger than physical memory. In effect, one process shares physical memory with itself. Can also flexibly share machine between processes whose total address space sizes exceed the physical memory size.

Supports a wide range of user-level stuff - See Li and Appel paper.

Disadvantages of VM/paging: extra resource consumption.

Memory overhead for storing page tables. In extreme cases, page table may take up a significant portion of virtual memory. One Solution: page the page table. Others: go to a more complicated data structure for storing virtual to physical translations.

Translation overhead. 

Introduction To Memory Management


Introduction To Memory Management

Introduction To Memory Management

Point of memory management algorithms - support sharing of main memory. We will focus on having multiple processes sharing the same physical memory.
Key issues:

Protection. Must allow one process to protect its memory from access by other processes.

Naming. How do processes identify shared pieces of memory.

Transparency. How transparent is sharing. Does user program have to manage
anything explicitly?

Efficiency. Any memory management strategy should not impose too much of a
performance burden.

Why share memory between processes? Because want to multiprogram the processor. To time share system, to overlap computation and I/O. So, must provide for multiple processes to be resident in physical memory at the same time. Processes must share the physical memory.



Historical Development.

For first computers, loaded one program onto machine and it executed to completion. No sharing required. OS was just a subroutine library, and there was no protection. What addresses does program generate?

Desire to increase processor utilization in the face of long I/O delays drove the adoptation of multiprogramming. So, one process runs until it does I/O, then OS lets another process run. How do processes share memory? Alternatives:

Load both processes into memory, then switch between them under OS control. Must relocate program when load it. Big Problem: Protection. A bug in one process can kill the other process. MS-DOS, MS-Windows use this strategy.

Copy entire memory of process to disk when it does I/O, then copy back when it restarts. No need to relocate when load. Obvious performance problems. Early version of Unix did this.

Do access checking on each memory reference. Give each program a piece of memory that it can access, and on every memory reference check that it stays within its address space. Typical mechanism: base and bounds registers. Where is check done? Answer: in hardware for speed. When OS runs process, loads the base and bounds registers for that process. Cray-1 did this. Note: there is now a translation process. Program generates virtual addresses that get translated into physical addresses. But, no longer have a protection problem: one process cannot access another's memory, because it is outside its address space. If it tries to access it, the hardware will generate an exception.

End up with a model where physical memory of machine is dynamically allocated to processes as they enter and exit the system. Variety of allocation strategies: best fit, first fit, etc. All suffer from external fragmentation. In worst case, may have enough memory free to load a process, but can't use it because it is fragmented into little pieces. What if cannot find a space big enough to run a process? Either because of  fragmentation or because physical memory is too small to hold all address spaces. Can compact and relocate processes (easy with base and bounds hardware, not so easy for direct physical address machines). Or, can swap a process out to disk then restore when space becomes available. In both cases incur copying overhead. When move process within memory, must copy
between memory locations. When move to disk, must copy back and forth to disk.

One way to avoid external fragmentation: allocate physical memory to processes in fixed size chunks called page frames. Present abstraction to application of a single linear address space. Inside machine, break address space of application up into fixed size chunks called pages. Pages and page frames are same size. Store pages in page frames. When process generates an address, dynamically translate to the physical page frame which holds data for that page.

So, a virtual address now consists of two pieces: a page number and an offset within that page. Page sizes are typically powers of 2; this simplifies extraction of page numbers and offsets. To access a piece of data at a given address, system automatically does the following:

Extracts page number.

Extracts offset.

Translate page number to physical page frame id.

Accesses data at offset in physical page frame.

How does system perform translation? Simplest solution: use a page table.Page table is a linear array indexed by virtual page number that gives the physical page frame that contains that page. What is lookup process?

Extract page number.

Extract offset.

Check that page number is within address space of process.

Look up page number in page table.

Add offset to resulting physical page number

Access memory location.

With paging, still have protection. One process cannot access a piece of physical memory unless its page table points to that physical page. So, if the page tables of two processes point to different physical pages, the processes cannot access each other's physical memory.

Fixed size allocation of physical memory in page frames dramatically simplifies allocation algorithm. OS can just keep track of free and used pages and allocate free pages when a process needs memory. There is no fragmentation of physical memory into smaller and smaller allocate chunks.

But, are still pieces of memory that are unused. What happens if a program's address space does not end on a page boundary? Rest of page goes unused. This kind of memory loss is called internal fragmentation.

Operating System Dead Lock


Dead Lock

You may need to write code that acquires more than one lock. This opens up the possibility of deadlock. Consider the following piece of code:

         Lock *l1, *l2;
         void p() {
           l1->Acquire();
           l2->Acquire();
           code that manipulates data that l1 and l2 protect
           l2->Release();
           l1->Release();
         }
         void q() {
           l2->Acquire();
           l1->Acquire();
           code that manipulates data that l1 and l2 protect
           l1->Release();
           l2->Release();
         }



If p and q execute concurrently, consider what may happen. First, p acquires l1 and q acquires l2. Then, p waits to acquire l2 and q waits to acquire l1. How long will they wait? Forever.

This case is called deadlock. What are conditions for deadlock?

Mutual Exclusion: Only one thread can hold lock at a time.Hold and Wait: At least one thread holds a lock and is waiting for another process to release a lock.

No preemption: Only the process holding the lock can release it. Circular Wait: There is a set t1, ..., tn such that t1 is waiting for a lock held by t2, ..., tn is waiting for a lock held by t1.

How can p and q avoid deadlock? Order the locks, and always acquire the locks in that order. Eliminates the circular wait condition. Occasionally you may need to write code that needs to acquire locks in different orders. Here is what to do in this situation.

First, most locking abstractions offer an operation that tries to acquire the lock but returns if it cannot. We will call this operation TryAcquire. Use this operation to try to acquire the lock that you need to acquire out of order.

If the operation succeeds, fine. Once you've got the lock, there is no problem.If the operation fails, your code will need to release all locks that come after the lock you are trying to acquire. Make sure the associated data structures are in a state where you can release the locks without crashing
the system.

Release all of the locks that come after the lock you are trying to acquire, then reacquire all of the locks in the right order. When the code resumes, bear in mind that the data structures might have changed between the time when you released and reacquired the lock.

Here is an example.

         int d1, d2;
         // The standard acquisition order for these two locks
         // is l1, l2.
         Lock *l1, // protects d1
              *l2; // protects d2
         // Decrements d2, and if the result is 0, increments d1
         void increment() {
           l2->Acquire();
           int t = d2;
           t--;
           if (t == 0) {
             if (l1->TryAcquire()) {
               d1++;
             } else {
               // Any modifications to d2 go here - in this case none
               l2->Release();
               l1->Acquire();
               l2->Acquire();
               t = d2;
              t--;
               // some other thread may have changed d2 - must recheck it
               if (t == 0) {
                 d1++;
               }
             }
             l1->Release();
           }
           d2 = t;
           l2->Release();
         }

This example is somewhat contrived, but you will recognize the situation when it occurs.

There is a generalization of the deadlock problem to situations in which processes need multiple resources, and the hardware may have multiple kinds of each resource - two printers, etc. Seems kind of like a batch model - processes request resources, then system schedules process to run when
resources are available.

In this model, processes issue requests to OS for resources, and OS decides who gets which resource when. A lot of theory built up to handle this situation.

Process first requests a resource, the OS issues it and the process uses the resource, then the process releases the resource back to the OS.

Reason about resource allocation using resource allocation graph. Each resource is represented with a box, each process with a circle, and the individual instances of the resources with dots in the boxes. Arrows go from processes to resource boxes if the process is waiting for the resource. Arrows go from dots in resource box to processes if the process holds that instance of the resource.

If graph contains no cycles, is no deadlock. If has a cycle, may or may not have deadlock.System can either Restrict the way in which processes will request resources to prevent deadlock.

Require processes to give advance information about which resources they will require, then use algorithms that schedule the processes in a way that avoids deadlock.

Detect and eliminate deadlock when it occurs. First consider prevention. Look at the deadlock conditions listed above. Mutual Exclusion - To eliminate mutual exclusion, allow everybody to use
the resource immediately if they want to. Unrealistic in general - do you want your printer output interleaved with someone elses?

Hold and Wait. To prevent hold and wait, ensure that when a process requests resources, does not hold any other resources. Either asks for all resources before executes, or dynamically asks for resources in chunks as needed, then releases all resources before asking for more. Two problems -
processes may hold but not use resources for a long time because they will eventually hold them. Also, may have starvation. If a process asks for lots of resources, may never run because other processes always hold some subset of the resources.

Circular Wait. To prevent circular wait, order resources and require processes to request resources in that order. Deadlock avoidance. Simplest algorithm - each process tells max number of resources it will ever need. As process runs, it requests resources but never exceeds max number of resources. System schedules processes and allocates resoures in a way that ensures that no deadlock results.

Example: system has 12 tape drives. System currently running P0 needs max 10 has 5, P1 needs max 4 has 2, P2 needs max 9 has 2.

Can system prevent deadlock even if all processes request the max? Well, right now system has 3 free tape drives. If P1 runs first and completes, it will have 5 free tape drives. P0 can run to completion with those 5 free tape drives even if it requests max. Then P2 can complete. So, this schedule will execute without deadlock.

If P2 requests two more tape drives, can system give it the drives? No, because cannot be sure it can run all jobs to completion with only 1 free drive. So, system must not give P2 2 more tape drives until P1 finishes. If  P2 asks for 2 tape drives, system suspends P2 until P1 finishes.

Concept: Safe Sequence. Is an ordering of processes such that all processes can execute to completion in that order even if all request maximum resources. Concept: Safe State - a state in which there exists a safe sequence. Deadlock avoidance algorithms always ensure that system stays in a safe state.

How can you figure out if a system is in a safe state? Given the current and maximum allocation, find a safe sequence. System must maintain some  information about the resources and how they are used. See OSC 7.5.3.

         Avail[j] = number of resource j available
         Max[i,j] = max number of resource j that process i will use
         Alloc[i,j] = number of resource j that process i currently has
         Need[i,j] = Max[i,j] - Alloc[i,j]

Notation: A<=B if for all processes i, A[i]<=B[i].

Safety Algorithm: will try to find a safe sequence. Simulate evolution of  system over time under worst case assumptions of resource demands.

         1: Work = Avail;
            Finish[i] = False for all i;
         2: Find i such that Finish[i] = False and Need[i] <= Work
            If no such i exists, goto 4
         3: Work = Work + Alloc[i]; Finish[i] = True; goto 2
         4: If Finish[i] = True for all i, system is in a safe state

Now, can use safety algorithm to determine if we can satisfy a given resource demand. When a process demands additional resources, see if can give them to process and remain in a safe state. If not, suspend process until system can allocate resources and remain in a safe state. Need an
additional data structure:

         Request[i,j] = number of j resources that process i requests

Here is algorithm. Assume process i has just requested additional resources.

         1: If Request[i] <= Need[i] goto 2. Otherwise, process has
            violated its maximum resource claim.
         2: If Request[i] <= Avail goto 3. Otherwise, i must wait
            because resources are not available.
         3: Pretend to allocate resources as follows:
            Avail = Avail - Request[i]
            Alloc[i] = Alloc[i] + Request[i]
            Need[i] = Need[i] - Request[i]
            If this is a safe state, give the process the resources.Otherwise, suspend the process and                    restore the old state.

When to check if a suspended process should be given the resources and resumed? Obvious choice - when some other process relinquishes its resources. Obvious problem - process starves because other processes with lower resource requirements are always taking freed resources.

See Example in Section 7.5.3.3.

Third alternative: deadlock detection and elimination. Just let deadlock happen. Detect when it does, and eliminate the deadlock by preempting resources.

Here is deadlock detection algorithm. Is very similar to safe state detection algorithm.

         1: Work = Avail;
            Finish[i] = False for all i;
         2: Find i such that Finish[i] = False and Request[i] <= Work
            If no such i exists, goto 4
         3: Work = Work + Alloc[i]; Finish[i] = True; goto 2
         4: If Finish[i] = False for some i, system is deadlocked.
         Moreover, Finish[i] = False implies that process i is deadlocked.

When to run deadlock detection algorithm? Obvious time: whenever a process requests more resources and suspends. If deadlock detection takes too much time, maybe run it less frequently.

OK, now you've found a deadlock. What do you do? Must free up some resources so that some processes can run. So, preempt resources - take them away from processes. Several different preemption cases:

Can preempt some resources without killing job - for example, main memory. Can just swap out to disk and resume job later.

If job provides rollback points, can roll job back to point before acquired resources. At a later time, restart job from rollback point. Default rollback point - start of job.

For some resources must just kill job. All resources are then free. Can either kill processes one by one until your system is no longer deadlocked. Or, just go ahead and kill all deadlocked processes.

In a real system, typically use different deadlock strategies for CPU Scheduling different situations based on resource characteristics.

This whole topic has a sort of 60's and 70's batch mainframe feel to it. How come these topics never seem to arise in modern Unix systems?

Thread Creation, Manipulation and Synchronization


Thread Creation, Manipulation and Synchronization

We first must postulate a thread creation and manipulation interface.
Will use the one in Nachos:

         class Thread {
           public:
             Thread(char* debugName);
             ~Thread();
             void Fork(void (*func)(int), int arg);
             void Yield();
             void Finish();
         }

The Thread constructor creates a new thread. It allocates a data structure with space for the TCB.

To actually start the thread running, must tell it what function to start running when it runs. The Fork method gives it the function and a parameter to the function.

What does Fork do? It first allocates a stack for the thread. It then sets up the TCB so that when the thread starts running, it will invoke the function and pass it the correct parameter. It then puts the thread on a run queue someplace. Fork then returns, and the thread that called Fork continues.



How does OS set up TCB so that the thread starts running at the function? First, it sets the stack pointer in the TCB to the stack. Then, it sets the PC in the TCB to be the first instruction in the function. Then, it sets the register in the TCB holding the first parameter to the parameter. When
the thread system restores the state from the TCB, the function will magically start to run.

The system maintains a queue of runnable threads. Whenever a processor becomes idle, the thread scheduler grabs a thread off of the run queue and runs the thread.

Conceptually, threads execute concurrently. This is the best way to reason about the behavior of threads. But in practice, the OS only has a finite number of processors, and it can't run all of the runnable threads at once. So, must multiplex the runnable threads on the finite number of processors.

Let's do a few thread examples. First example: two threads that increment a
variable.

         int a = 0;
         void sum(int p) {
         a++;
          printf("%d : a = %d\n", p, a);
         }
         void main() {
           Thread *t = new Thread("child");
           t->Fork(sum, 1);
           sum(0);
         }

The two calls to sum run concurrently. What are the possible results of the program? To understand this fully, we must break the sum subroutine up into its primitive components.

sum first reads the value of a into a register. It then increments the register, then stores the contents of the register back into a. It then reads the values of of the control string, p and a into the registers that
it uses to pass arguments to the printf routine. It then calls printf, which prints out the data.

The best way to understand the instruction sequence is to look at the generated assembly language (cleaned up just a bit). You can have the compiler generate assembly code instead of object code by giving it the -S flag. It will put the generated assembly in the same file name as the .c or

.cc file, but with a .s suffix.

                la      a, %r0
                 ld      [%r0],%r1
                 add     %r1,1,%r1
                 st      %r1,[%r0]
       
                 ld      [%r0], %o3 ! parameters are passed starting with
%o0
                 mov     %o0, %o1
                 la      .L17, %o0
                 call    printf

So when execute concurrently, the result depends on how the instructions
interleave. What are possible results?

         0 : 1                                      0 : 1
         1 : 2                                      1 : 1
       
         1 : 2                                      1 : 1
         0 : 1                                      0 : 1
       
         1 : 1                                      0 : 2
         0 : 2                                      1 : 2
       
         0 : 2                                      1 : 2
         1 : 1                                      0 : 2

So the results are nondeterministic - you may get different results when you run the program more than once. So, it can be very difficult to reproduce bugs. Nondeterministic execution is one of the things that makes writing parallel programs much more difficult than writing serial programs.

Chances are, the programmer is not happy with all of the possible results listed above. Probably wanted the value of a to be 2 after both threads finish. To achieve this, must make the increment operation atomic. That is, must prevent the interleaving of the instructions in a way that would
interfere with the additions.

Concept of atomic operation. An atomic operation is one that executes without any interference from other operations - in other words, it executes as one unit. Typically build complex atomic operations up out of sequences of primitive operations. In our case the primitive operations are the individual machine instructions.

More formally, if several atomic operations execute, the final result is guaranteed to be the same as if the operations executed in some serial order.

In our case above, build an increment operation up out of loads, stores and add machine instructions. Want the increment operation to be atomic.

Use synchronization operations to make code sequences atomic. First synchronization abstraction: semaphores. A semaphore is, conceptually, a counter that supports two atomic operations, P and V. Here is the Semaphore interface from Nachos:

        class Semaphore {
          public:
             Semaphore(char* debugName, int initialValue);     
             ~Semaphore();                                   
             void P();
             void V();
         }

Here is what the operations do:

Semphore(name, count) : creates a semaphore and initializes the counter to count.

   P() : Atomically waits until the counter is greater than 0, then decrements the counter and returns.

   V() : Atomically increments the counter.

Here is how we can use the semaphore to make the sum example work:

         int a = 0;
         Semaphore *s;
         void sum(int p) {
           int t;
           s->P();
           a++;
           t = a;
           s->V();
           printf("%d : a = %d\n", p, t);
         }
         void main() {
           Thread *t = new Thread("child");
           s = new Semaphore("s", 1);
           t->Fork(sum, 1);
           sum(0);
         }

We are using semaphores here to implement a mutual exclusion mechanism. The idea behind mutual exclusion is that only one thread at a time should be allowed to do something. In this case, only one thread should access a. Use mutual exclusion to make operations atomic. The code that performs the
atomic operation is called a critical section.

Semaphores do much more than mutual exclusion. They can also be used to synchronize producer/consumer programs. The idea is that the producer is generating data and the consumer is consuming data. So a Unix pipe has a producer and a consumer. You can also think of a person typing at a keyboard as a producer and the shell program reading the characters as a consumer.

Here is the synchronization problem: make sure that the consumer does not get ahead of the producer. But, we would like the producer to be able to produce without waiting for the consumer to consume. Can use semaphores to do this. Here is how it works:

         Semaphore *s;
         void consumer(int dummy) {
           while (1) {
             s->P();
             consume the next unit of data
           }
         }
         void producer(int dummy) {
           while (1) {
             produce the next unit of data
             s->V();
           }
         }
         void main() {
           s = new Semaphore("s", 0);
           Thread *t = new Thread("consumer");
           t->Fork(consumer, 1);
           t = new Thread("producer");
           t->Fork(producer, 1);
         }

In some sense the semaphore is an abstraction of the collection of data. In the real world, pragmatics intrude. If we let the producer run forever and never run the consumer, we have to store all of the produced data somewhere. But no machine has an infinite amount of storage. So, we want to
let the producer to get ahead of the consumer if it can, but only a given amount ahead. We need to implement a bounded buffer which can hold only N items. If the bounded buffer is full, the producer must wait before it can put any more data in.

         Semaphore *full;
         Semaphore *empty;
         void consumer(int dummy) {
           while (1) {
             full->P();
             consume the next unit of data
             empty->V();
           }
         }
         void producer(int dummy) {
           while (1) {
             empty->P();
            produce the next unit of data
             full->V();
           }
         }
         void main() {
           empty = new Semaphore("empty", N);
           full = new Semaphore("full", 0);
           Thread *t = new Thread("consumer");
           t->Fork(consumer, 1);
           t = new Thread("producer");
           t->Fork(producer, 1);
         }

An example of where you might use a producer and consumer in an operating system is the console (a device that reads and writes characters from and to the system console). You would probably use semaphores to make sure you don't try to read a character before it is typed.

Semaphores are one synchronization abstraction. There is another called locks and condition variables. Locks are an abstraction specifically for mutual exclusion only. Here is the Nachos lock interface:

         class Lock {
           public:
             Lock(char* debugName);   // initialize lock to be FREE
             ~Lock();                // deallocate lock
             void Acquire(); // these are the only operations on a lock
             void Release(); // they are both *atomic*
         }

A lock can be in one of two states: locked and unlocked. Semantics of lock operations:
  Lock(name) : creates a lock that starts out in the unlocked state.

  Acquire() : Atomically waits until the lock state is unlocked, then sets  the lock state to locked.
  Release() : Atomically changes the lock state to unlocked from locked. In assignment 1 you will implement locks in Nachos on top of semaphores.

What are requirements for a locking implementation?

 Only one thread can acquire lock at a time. (safety)

If multiple threads try to acquire an unlocked lock, one of the threads will get it. (liveness)

All unlocks complete in finite time. (liveness)

What are desirable properties for a locking implementation?

Efficiency: take up as little resources as possible.

Fairness: threads acquire lock in the order they ask for it. Are also
 weaker forms of fairness.

Simple to use.

When use locks, typically associate a lock with pieces of data that multiple threads access. When one thread wants to access a piece of data, it first acquires the lock. It then performs the access, then unlocks the lock. So, the lock allows threads to perform complicated atomic operations on each piece of data. Can you implement unbounded buffer only using locks? There is a problem - if the consumer wants to consume a piece of data before the producer produces the data, it must wait. But locks do not allow the consumer to wait until the producer produces the data. So, consumer must loop until the data is ready. This is bad because it wastes CPU resources.

There is another synchronization abstraction called condition variables just for this kind of situation. Here is the Nachos interface:

         class Condition {
           public:
             Condition(char* debugName);
             ~Condition();
             void Wait(Lock *conditionLock);
             void Signal(Lock *conditionLock);
             void Broadcast(Lock *conditionLock);
         }

Semantics of condition variable operations:

    Condition(name) : creates a condition variable.

    Wait(Lock *l) : Atomically releases the lock and waits. When Wait returns the lock will have been reacquired.

    Signal(Lock *l) : Enables one of the waiting threads to run. When Signal returns the lock is still acquired.

    Broadcast(Lock *l) : Enables all of the waiting threads to run. When Broadcast returns the lock is still acquired.

All locks must be the same. In assignment 1 you will implement condition variables in Nachos on top of semaphores.

Typically, you associate a lock and a condition variable with a data structure. Before the program performs an operation on the data structure, it acquires the lock. If it has to wait before it can perform the operation, it uses the condition variable to wait for another operation to bring the data structure into a state where it can perform the operation. In some cases you need more than one condition variable. Let's say that we want to implement an unbounded buffer using locks and
condition variables. In this case we have 2 consumers.

         Lock *l;
         Condition *c;
         int avail = 0;
         void consumer(int dummy) {
           while (1) {
             l->Acquire();
             if (avail == 0) {
               c->Wait(l);
             }
             consume the next unit of data
             avail--;
             l->Release();
           }
         }
         void producer(int dummy) {
           while (1) {
             l->Acquire();
             produce the next unit of data
             avail++;
             c->Signal(l);
             l->Release();
           }
         }
         void main() {
           l = new Lock("l");
           c = new Condition("c");
           Thread *t = new Thread("consumer");
           t->Fork(consumer, 1);
           Thread *t = new Thread("consumer");
           t->Fork(consumer, 2);
           t = new Thread("producer");
           t->Fork(producer, 1);
         }
There are two variants of condition variables: Hoare condition variables and Mesa condition variables. For Hoare condition variables, when one thread performs a Signal, the very next thread to run is the waiting thread. For Mesa condition variables, there are no guarantees when the
signalled thread will run. Other threads that acquire the lock can execute between the signaller and the waiter. The example above will work with Hoare condition variables but not with Mesa condition variables.

What is the problem with Mesa condition variables? Consider the following scenario: Three threads, thread 1 one producing data, threads 2 and 3 consuming data.

 Thread 2 calls consumer, and suspends.     Thread 1 calls producer, and signals thread 2.

 Instead of thread 2 running next, thread 3 runs next, calls consumer, and consumes the element. (Note: with Hoare monitors, thread 2 would always run next, so this would not happen.)

Thread 2 runs, and tries to consume an item that is not there. Depending on the data structure used to store produced items, may get some kind of illegal access error.

How can we fix this problem? Replace the if with a while.

         void consumer(int dummy) {
           while (1) {
             l->Acquire();
             while (avail == 0) {
               c->Wait(l);
             }
             consume the next unit of data
             avail--;
             l->Release();
           }
         }

In general, this is a crucial point. Always put while's around your condition variable code. If you don't, you can get really obscure bugs that show up very infrequently.

In this example, what is the data that the lock and condition variable are associated with? The avail variable. People have developed a programming abstraction that automatically associates locks and condition variables with data. This abstraction is called a monitor. A monitor is a data structure plus a set of operations (sort of like an abstract data type). The monitor also has a lock and, optionally, one or more condition variables. See notes for Lecture 14.

The compiler for the monitor language automatically inserts a lock operation at the beginning of each routine and an unlock operation at the end of the routine. So, programmer does not have to put in the lock operations. Monitor languages were popular in the middle 80's - they are in some sense
safer because they eliminate one possible programming error. But more recent languages have tended not to support monitors explicitly, and expose the locking operations to the programmer. So the programmer has to insert the lock and unlock operations by hand. Java takes a middle ground - it
supports monitors, but also allows programmers to exert finer grain control over the locked sections by supporting synchronized blocks within methods. But synchronized blocks still present a structured model of synchronization, so it is not possible to mismatch the lock acquire and release.

Laundromat Example: A local laudromat has switched to a computerized machine allocation scheme. There are N machines, numbered 1 to N. By the front door there are P allocation stations. When you want to wash your clothes, you go to an allocation station and put in your coins. The allocation station gives you a number, and you use that machine. There are also P deallocation stations. When your clothes finish, you give the number back to one of the deallocation stations, and someone else can use the machine. Here is the alpha release of the machine allocation software:


         allocate(int dummy) {
           while (1) {
             wait for coins from user
             n = get();
             give number n to user
           }
         }
         deallocate(int dummy) {
           while (1) {
             wait for number n from user
             put(i);
           }
         }
         main() {
           for (i = 0; i < P; i++) {
             t = new Thread("allocate");
             t->Fork(allocate, 0);
             t = new Thread("deallocate");
             t->Fork(deallocate, 0);
           }
         }
The key parts of the scheduling are done in the two routines get and put,which use an array data structure a to keep track of which machines are in use and which are free.

         int a[N];
         int get() {
           for (i = 0; i < N; i++) {
             if (a[i] == 0) {
               a[i] = 1;
               return(i+1);
             }
           }
         }
         void put(int i) {
           a[i-1] = 0;
         }

It seems that the alpha software isn't doing all that well. Just looking at the software, you can see that there are several synchronization problems.

The first problem is that sometimes two people are assigned to the same
machine. Why does this happen? We can fix this with a lock:

         int a[N];
         Lock *l;
         int get() {
           l->Acquire();
           for (i = 0; i < N; i++) {
             if (a[i] == 0) {
               a[i] = 1;
               l->Release();
               return(i+1);
             }
           }
           l->Release();
         }
         void put(int i) {
           l->Acquire();
           a[i-1] = 0;
           l->Release();
         }

So now, have fixed the multiple assignment problem. But what happens if someone comes in to the laundry when all of the machines are already taken? What does the machine return? Must fix it so that the system waits until there is a machine free before it returns a number. The situation calls for
condition variables.


int a[N];
Lock *l;
Condition *c;
int get() {
  l->Acquire();
  while (1) {
    for (i = 0; i < N; i++) {
      if (a[i] == 0) {
        a[i] = 1;
        l->Release();
        return(i+1);
      }
    }
    c->Wait(l);
  }
}
void put(int i) {
  l->Acquire();
  a[i-1] = 0;
  c->Signal();
  l->Release();
}

What data is the lock protecting? The a array.

When would you use a broadcast operation? Whenever want to wake up all waiting threads, not just one. For an event that happens only once. For example, a bunch of threads may wait until a file is deleted. The thread that actually deleted the file could use a broadcast to wake up all of the threads.

Also use a broadcast for allocation/deallocation of variable sized units.
Example: concurrent malloc/free.

         Lock *l;
         Condition *c;
         char *malloc(int s) {
           l->Acquire();
           while (cannot allocate a chunk of size s) {
             c->Wait(l);
           }
           allocate chunk of size s;
           l->Release();
           return pointer to allocated chunk
         }
         void free(char *m) {
           l->Acquire();
           deallocate m.
           c->Broadcast(l);
           l->Release();
         }

Example with malloc/free. Initially start out with 10 bytes free.

Time Process 1 Process 2 Process 3
malloc(10) - succeeds malloc(5) - suspends lock malloc(5) suspends lock
1 gets lock - waits
2 gets lock - waits
3 free(10) - broadcast
4 resume malloc(5) - succeeds
5 resume malloc(5) - succeeds
6 malloc(7) - waits
7 malloc(3) - waits
8 free(5) - broadcast
9 resume malloc(7) - waits
10 resume malloc(3) - succeeds



What would happen if changed c->Broadcast(l) to c->Signal(l)? At step 10, process 3 would not wake up, and it would not get the chance to allocate available memory. What would happen if changed while loop to an if?

You will be asked to implement condition variables as part of assignment
1. The following implementation is INCORRECT. Please do not turn this
implementation in.

         class Condition {
           private:
             int waiting;
             Semaphore *sema;
         }
         void Condition::Wait(Lock* l)
         {
           waiting++;
           l->Release();
           sema->P();
           l->Acquire();
         }
         void Condition::Signal(Lock* l)
         {
           if (waiting > 0) {
             sema->V();
             waiting--;
           }
         }

Why is this solution incorrect? Because in some cases the signalling thread may wake up a waiting thread that called Wait after the signalling thread called Signal.