Naming II – Structured naming
Updated:
Flat vs Structured Naming
- Flat names: not very convenient for humans to use, but good for machines
- Alternative – to think about sth better – if you have a support for
structured names, you can end up with names that are human readable.
- One thing we are familiar with is to do with file naming on OS – could be on local machine, etc. with file system, we use same approach to handle local or remote files.
- Next – explore how this structured name are handled and how to resolve them to addresses
Namespaces
- Namespace is a collection of all valid names – anything you access are resources, and it should stick to a
- When we handle a name service, the server will require you to use a namespace.
- Namespace for structured names can be represented as directed graphs
with two nodes
- leaf nodes w no outgoing edges that represents a named identity
- leaf nodes store entity attributes (like address) or state of entity itself
- directory nodes which are labelled outgoing edges.
- Each node in the graph is yet another entity.
- Directory nodes store a directory table of outgoing edges to help you locate (node identifier, edge label) pair
- leaf nodes w no outgoing edges that represents a named identity
- Each path in naming graph can be identified by sequence of labels which are associated with edges in the path – N : {L1,L2,..Ln}
Example: A File System
- Red – directory node, Green – leaf node → difference is whether it has an outgoing edge
- We go until you reach the leaf node.
- One imp aspect is you have to differentiate between absolute
pathname and relative pathname
- Absolute pathname – d1-d2-d3-l4- if the first node in a path name is the root of the graph
- Relative pathname– d3- l3, but it cannot access relative pathname automatically, so you have to find a mechanism to reach d3. D3 is relative coz you gotta go to d2 to access d3
- End of the day directory node will contain (node identifier, edge label) pair
Naming Graph with a single root node
- Has root node, n0, which has only outgoing and no incoming edges
- Points to directory node n1 stores extra data as it has 2 leaf nodes to n2, n3 and pointer to directory node n4
- Look at the importance of edges and the labels on the edges. – will
help us look up that name
- /home/steen/mbox is a path!!! Which are the labels in the edges
- Importance of directory table and n1 will know labels from n1 to n3, n1 to n4 and etc – knows it all
- What’s also interesting is that with such naming graph, I can have
appreciation of the type of entity I’m holding – usually files, and
they have path name. and the identifier could be used to access the
entity itself.
- My mailbox is stored in a directory called ‘home’, and sub directory called ‘steen’.
- If you reach /home/steen/keys (n0-n1-n4-n5)– you have reached n5, which is in itself which is directly accessible by n0 (no-n5). → symbolic link
- Q: what’s Unix command to create a symbolic link?
- A: ln – creates symbolic link
Name Resolution
- Name resolution- the process of looking up a name.
- Problem- to resolve a name we need a directory node. So how do you
find the initial node?
- Closure mechanism – in practice, we have to find a mechanism to select an implicit context –aka a start point. Maybe start with a pointer to a node– and name resolution will help you carry on with your task.
- E.g.
- To access www.distirbuted.net - Ur
computer talks to DNS server and pass the URL, DNS server gives
the IP address of that server and that address.- this is our
closure mechanism
- Dns – first point of entry and is a crucial point!
- /home/maaten/mbox – start at the local NFS file server
- 0031 20 etc: NL – 31 is Netherlands – then we then look up the rest of the name
- 77.167.55.6 – figure out what 77 means and then you look up the rest
- To access www.distirbuted.net - Ur
computer talks to DNS server and pass the URL, DNS server gives
the IP address of that server and that address.- this is our
closure mechanism
- → the need for a closure mechanism!!- mechanism where to start
Closure mechanism
- Close mechanism - name resolution is possible only if we know how and where to start – deals with selecting an initial node.
- In UNIX, to find a file in a disk
- First point of entry, initial node, is called inode in Unix.
- Access to the boot block which is absolute address
- Index nodes that give us the relative address of the values on this block belonging to diff blocks
- How it works all together – we have inodes, which gives us the
relative address of the first block of each individual file.
This means we assume know the boot block – root node.
- All these addresses are relative to the start of the boot block itself – aka the absolute address
- To find /home/steen/mbox – we’ll have to find a boot block, and find an index node that will tell us the relative address
- → Always think about sth that is absolute and then relative address
Linking
- We can create aliases 별명 for a named entity, corresponding to links in the naming graph
- We can create aliases for a named entity, corresponding to links in the naming graph
- Implementing an alias in 2 ways
- Can have multiple absolute pathnames of same node → hard links
- n5 could be referred to by two diff path names: /keys AND /home/steen/keys
- To have a leaf node of alias store the absolute path name of the aliased entity → symbolic link
- Can have multiple absolute pathnames of same node → hard links
- Q: How do we resolve in a case it has diff namespace?
Mounting
- To merge different namespaces in a transparent way through mounting.
- You will need the concept of mounting – you mount the namespace – you have the first namespace and you have a diff one on LHS and you wanna combine, so that from the first namespace you can access the second namespace, but transparently.
- Terminology
- Foreign namespace – namespace that needs to be accessed – so RHS in the diagram
- Mount point – node in the current namespace that contains the
node identifier of the foreign namespace
- That’s the one you are gonna use to refer to the foreign
- Mounting point – the node in the foreign namespace where to continue name resolution
- Mounting across a network requires:
- The name of an access protocol - NFS
- Name of the server – filts.cs.vu.nl
- Name of the mounting point in the foreign name space - /home/steen/
- If you are able to do this in a file system, then you need a
protocol to allow yourmount point to access the mounting point to
bring the 2 namespaces together.
- So you start resolving in the first machine, and once you can’t resolve further you mount to the foreign namespace and start resolving. Once its resolved it goes back to the original
- Need a protocol that allow the mount point to be able to access the mounting point of the foreign name space to bring it together.
- → Resolve name on current namespace then the foreign namespace
- Think about the protocol that will bring the 2 together
Namespace implementation
- It is necessary to distribute the implementation of a namespace over
multiple nameservers
- Distribute name resolution process as well as name space management across multiple machines, by distributing nodes of the naming graph.
- With DNS, name resolution is by definition is distributed bc DNS runs in large number of machines all over the world
- How we distribute the nodes of the naming graph over the large-scale system.
- DNS – How we structure the directory node in the tree
-
- Global level – there are a few of them and are managed by diff organisations and number is quite limited. – nl, uk, org, com, etc
- Administrational level – mid level directory, and they are
grouped in such a way where each group can be assigned to a
separate administration.
- Vu university – cs department is part of admin layer and is managed by the group of people.
- Managerial level– a lot of thing happens. It not only consists of low-level directory node, but is managed by its own administration. Main issue is effectively mapping directory nodes to local name server, starting from global to managerial.
Namespace implementation
Global | Administrational | Managerial | |
---|---|---|---|
Geographical scale of network | Worldwide | Organization | Department |
Total number of nodes | Few | Many | Vast numbers |
Responsiveness to lookups | Seconds | Milliseconds | Immediate |
Update propagation | Lazy | Immediate | Immediate |
Number of replicas | Many | None or few | None |
Is client-side caching apeopleied? | Yes | Yes | Sometimes |
- Replicas – replicas should be close to people who want to do a DNS lookup. You’d hope there’s a DNS server near the uk.
## Name resolution (navigation) methods
Iterative resolution
- Root server will resolve the path name as far as it can, and return the result to the client
- And client passes the reaming pathname to the next name server
-
- My client take the name and contact root remote server and it passes the entire name – root server returns the <nl> server. But not the <vu,cs,ftp>
- Contact <nl> and pass the remaining name <vu,cs,ftp> to the <nl>, and it resolves <vu> but returns <cs,ftp>
- Contact <vu> and pass <cs,ftp>. <vu> knows and resolves <cs>, but not <ftp>. So the address of ftp is sent back to the client
- Contact <cs> and it resolves <ftp>!!
Recursive resolution
-
- Step 1. Root name passes the entire name <nl,vu,cs,ftp>
- Root contacts <nl> and asks <vu,cs,ftp>
- <nl> node contact <vu> and ask <cs,ftp>
- <vu> resolve the rest and <vu> node contacts <cs> and ask to resolve <ftp>
- Bc its recursive, it goes back upup and the root name server pass the entire result to the client
Which is better?
- Recursive - there are great demands on each server in the path
- So servers requiring high throughput처리량 might only support iterative resolution – they only resolve the part and they don’t wanna deal with the rest.
- Caching
- Recursive - caching result is more effective – allows each name
server to gradually learn the address of each name server
responsible for implementing lower-level nodes. → caching can
enhance performance
- Hence, lookup operation can be handled efficiently
- Iterative - caching is limited to the client’s name resolver
- Compromise –many organisations have local naming server share by all clients so that all naming requests and caches result are handled
- Recursive - caching result is more effective – allows each name
server to gradually learn the address of each name server
responsible for implementing lower-level nodes. → caching can
enhance performance
- Communication
- Recursive – only have to connect to nl once, then communication happens within the nl
- Iterative- client’s host has to communicate separately with the nl,vu,cs server – total costs may be three times of recursive
Scalability Issues: communication costs
- Name servers are usually accessible through WAN – low bandwidth high latency
- Have to look at how we compare recursive and iterative
- Recursive – only connects to R1 once, then it jumps to vu and cs
- Iterative - client’s host has to communicate separately with the nl,vu,cs server – total costs may be three times of recursive
- Think about communication cost and if its gonna scale in a large DS network
Leave a comment