Naming II – Structured naming

9 minute read

Updated:

Flat vs Structured Naming

  • Flat names: not very convenient for humans to use, but good for machines
  • Alternative – to think about sth better – if you have a support for structured names, you can end up with names that are human readable.
    • One thing we are familiar with is to do with file naming on OS – could be on local machine, etc. with file system, we use same approach to handle local or remote files.
  • Next – explore how this structured name are handled and how to resolve them to addresses

Namespaces

  • Namespace is a collection of all valid names – anything you access are resources, and it should stick to a
  • When we handle a name service, the server will require you to use a namespace.
  • Namespace for structured names can be represented as directed graphs with two nodes
    • leaf nodes w no outgoing edges that represents a named identity
      • leaf nodes store entity attributes (like address) or state of entity itself
    • directory nodes which are labelled outgoing edges.
      • Each node in the graph is yet another entity.
      • Directory nodes store a directory table of outgoing edges to help you locate (node identifier, edge label) pair
  • Each path in naming graph can be identified by sequence of labels which are associated with edges in the path – N : {L1,L2,..Ln}

Example: A File System

  • image
  • Red – directory node, Green – leaf node → difference is whether it has an outgoing edge
  • We go until you reach the leaf node.
  • One imp aspect is you have to differentiate between absolute pathname and relative pathname
    • Absolute pathname – d1-d2-d3-l4- if the first node in a path name is the root of the graph
    • Relative pathname– d3- l3, but it cannot access relative pathname automatically, so you have to find a mechanism to reach d3. D3 is relative coz you gotta go to d2 to access d3
  • End of the day directory node will contain (node identifier, edge label) pair

Naming Graph with a single root node

  • Screenshot 2020-02-25 at 9 07 56 pm
  • Has root node, n0, which has only outgoing and no incoming edges
  • Points to directory node n1 stores extra data as it has 2 leaf nodes to n2, n3 and pointer to directory node n4
  • Look at the importance of edges and the labels on the edges. – will help us look up that name
    • /home/steen/mbox is a path!!! Which are the labels in the edges
  • Importance of directory table and n1 will know labels from n1 to n3, n1 to n4 and etc – knows it all
  • What’s also interesting is that with such naming graph, I can have appreciation of the type of entity I’m holding – usually files, and they have path name. and the identifier could be used to access the entity itself.
    • My mailbox is stored in a directory called ‘home’, and sub directory called ‘steen’.
  • If you reach /home/steen/keys (n0-n1-n4-n5)– you have reached n5, which is in itself which is directly accessible by n0 (no-n5). → symbolic link
  • Q: what’s Unix command to create a symbolic link?
    • A: ln – creates symbolic link

Name Resolution

  • Name resolution- the process of looking up a name.
  • Problem- to resolve a name we need a directory node. So how do you find the initial node?
    • Closure mechanism – in practice, we have to find a mechanism to select an implicit context –aka a start point. Maybe start with a pointer to a node– and name resolution will help you carry on with your task.
  • E.g.
    • To access www.distirbuted.net - Ur computer talks to DNS server and pass the URL, DNS server gives the IP address of that server and that address.- this is our closure mechanism
      • Dns – first point of entry and is a crucial point!
    • /home/maaten/mbox – start at the local NFS file server
    • 0031 20 etc: NL – 31 is Netherlands – then we then look up the rest of the name
    • 77.167.55.6 – figure out what 77 means and then you look up the rest
  • → the need for a closure mechanism!!- mechanism where to start

Closure mechanism

  • Close mechanism - name resolution is possible only if we know how and where to start – deals with selecting an initial node.
  • In UNIX, to find a file in a disk
    • Screenshot 2020-02-25 at 9 09 06 pm
    • First point of entry, initial node, is called inode in Unix.
    • Access to the boot block which is absolute address
    • Index nodes that give us the relative address of the values on this block belonging to diff blocks
    • How it works all together – we have inodes, which gives us the relative address of the first block of each individual file. This means we assume know the boot block – root node.
      • All these addresses are relative to the start of the boot block itself – aka the absolute address
  • To find /home/steen/mbox – we’ll have to find a boot block, and find an index node that will tell us the relative address
  • → Always think about sth that is absolute and then relative address

Linking

  • We can create aliases 별명 for a named entity, corresponding to links in the naming graph
  • We can create aliases for a named entity, corresponding to links in the naming graph
  • Implementing an alias in 2 ways
    • Can have multiple absolute pathnames of same node → hard links
      • n5 could be referred to by two diff path names: /keys AND /home/steen/keys
    • To have a leaf node of alias store the absolute path name of the aliased entity → symbolic link
  • Q: How do we resolve in a case it has diff namespace?

Mounting

  • To merge different namespaces in a transparent way through mounting.
  • You will need the concept of mounting – you mount the namespace – you have the first namespace and you have a diff one on LHS and you wanna combine, so that from the first namespace you can access the second namespace, but transparently.
  • Terminology
    • Foreign namespace – namespace that needs to be accessed – so RHS in the diagram
    • Mount point – node in the current namespace that contains the node identifier of the foreign namespace
      • That’s the one you are gonna use to refer to the foreign
    • Mounting point – the node in the foreign namespace where to continue name resolution
  • Mounting across a network requires:
    • The name of an access protocol - NFS
    • Name of the server – filts.cs.vu.nl
    • Name of the mounting point in the foreign name space - /home/steen/
  • If you are able to do this in a file system, then you need a protocol to allow yourmount point to access the mounting point to bring the 2 namespaces together.
    • Screenshot 2020-02-25 at 9 09 48 pm
    • So you start resolving in the first machine, and once you can’t resolve further you mount to the foreign namespace and start resolving. Once its resolved it goes back to the original
    • Need a protocol that allow the mount point to be able to access the mounting point of the foreign name space to bring it together.
    • → Resolve name on current namespace then the foreign namespace
    • Think about the protocol that will bring the 2 together

Namespace implementation

  • It is necessary to distribute the implementation of a namespace over multiple nameservers
    • Distribute name resolution process as well as name space management across multiple machines, by distributing nodes of the naming graph.
  • With DNS, name resolution is by definition is distributed bc DNS runs in large number of machines all over the world
  • How we distribute the nodes of the naming graph over the large-scale system.
  • DNS – How we structure the directory node in the tree
  • Screenshot 2020-02-25 at 9 10 47 pm
    • Global level – there are a few of them and are managed by diff organisations and number is quite limited. – nl, uk, org, com, etc
    • Administrational level – mid level directory, and they are grouped in such a way where each group can be assigned to a separate administration.
      • Vu university – cs department is part of admin layer and is managed by the group of people.
    • Managerial level– a lot of thing happens. It not only consists of low-level directory node, but is managed by its own administration. Main issue is effectively mapping directory nodes to local name server, starting from global to managerial.

Namespace implementation

  Global Administrational Managerial
Geographical scale of network Worldwide Organization Department
Total number of nodes Few Many Vast numbers
Responsiveness to lookups Seconds Milliseconds Immediate
Update propagation Lazy Immediate Immediate
Number of replicas Many None or few None
Is client-side caching apeopleied? Yes Yes Sometimes
  • Replicas – replicas should be close to people who want to do a DNS lookup. You’d hope there’s a DNS server near the uk.

## Name resolution (navigation) methods

Iterative resolution

  • Root server will resolve the path name as far as it can, and return the result to the client
  • And client passes the reaming pathname to the next name server
  • Screenshot 2020-02-25 at 9 11 32 pm
    1. My client take the name and contact root remote server and it passes the entire name – root server returns the <nl> server. But not the <vu,cs,ftp>
    2. Contact <nl> and pass the remaining name <vu,cs,ftp> to the <nl>, and it resolves <vu> but returns <cs,ftp>
    3. Contact <vu> and pass <cs,ftp>. <vu> knows and resolves <cs>, but not <ftp>. So the address of ftp is sent back to the client
    4. Contact <cs> and it resolves <ftp>!!

Recursive resolution

  • Screenshot 2020-02-25 at 9 12 18 pm
    1. Step 1. Root name passes the entire name <nl,vu,cs,ftp>
    2. Root contacts <nl> and asks <vu,cs,ftp>
    3. <nl> node contact <vu> and ask <cs,ftp>
    4. <vu> resolve the rest and <vu> node contacts <cs> and ask to resolve <ftp>
    5. Bc its recursive, it goes back upup and the root name server pass the entire result to the client

Which is better?

  • Recursive - there are great demands on each server in the path
    • So servers requiring high throughput처리량 might only support iterative resolution – they only resolve the part and they don’t wanna deal with the rest.
  • Caching
    • Recursive - caching result is more effective – allows each name server to gradually learn the address of each name server responsible for implementing lower-level nodes. → caching can enhance performance
      • Hence, lookup operation can be handled efficiently
    • Iterative - caching is limited to the client’s name resolver
    • Compromise –many organisations have local naming server share by all clients so that all naming requests and caches result are handled
  • Communication
    • Recursive – only have to connect to nl once, then communication happens within the nl
    • Iterative- client’s host has to communicate separately with the nl,vu,cs server – total costs may be three times of recursive

Scalability Issues: communication costs

  • Screenshot 2020-02-25 at 9 12 57 pm
  • Name servers are usually accessible through WAN – low bandwidth high latency
  • Have to look at how we compare recursive and iterative
  • Recursive – only connects to R1 once, then it jumps to vu and cs
  • Iterative - client’s host has to communicate separately with the nl,vu,cs server – total costs may be three times of recursive
  • Think about communication cost and if its gonna scale in a large DS network

Leave a comment