SOLR : What is schema.xml ?

April 28, 2012 § 3 Comments

One of the configuration files that describe each implementation Solr is schema.xml file. It describes one of the most important things of the implementation – the structure of the data index. The information contained in this file allow you to control how Solr behaves when indexing the data, or when making queries. Schema.xml is not only the very structure of the index, is also detailed information about data types that have a large influence on the behavior Solr, and usually are treated with neglect. This entry will try to bring some insight about schema.xml.

Schema.xml file consists of several parts:

  • version,
  • type definitions,
  • field definitions,
  • copyField section,
  • additional definitions.

Version

The first thing we come across in the schema.xml file is the version. This is the information for Solr how to treat some of the attributes in schema.xml file. The definition is as follows:

1 <schema name="example" version="1.3">

Please note that this is not the definition of the version from the perspective of your project. At this point Solr supports four versions of a schema.xml file:

  • 1.0 – multiValued attribute does not exist, all fields are multivalued by default.
  • 1.1 – introduced multiValued attribute, the default attribute value is false.
  • 1.2 – introduced omitTermFreqAndPositions attribute, the default value is true for all fields, besides text fields.
  • 1.3 – removed the possibility of an optional compression of fields.

Type definitions

Type definitions can be logically divided into two separate sections – the simple types and complex types. Simple types as opposed to the complex types do not have a defined filters and tokenizer.

Simple types

First thing we see in the schema.xml file after version are types definition. Each type is described as a number of attributes defining the behavior of that type. First, some attributes that describe each type and are mandatory:

  • name – name of the type (required attribute).
  • class – class that is responsible for the implementation. Please note that classes are delivered from standard Solr packaged will have names with ‘solr’ prefix.

Besides the two mentioned above, types can have the following optional attributes:

  • sortMissingLast – attribute specifying how values in a field based on this type should be treated in case of sorting. When set to true documents without value in a field of this type will always be at the end of the results list regardless of sort order. The default attribute value is false. Attribute can be used only for types that are considered by Lucene as a string.
  • sortMissingFirst – attribute specifying how values in a field based on this type should be treated in case of sorting. When set to true documents without value in a field of this type will always be at the first positions of the results list regardles of sort order. The default attribute value is false. Attribute can be used only for types that are considered by Lucene as a string.
  • omitNorms – attribute specifying whether field normalization should take place.
  • omitTermFreqAndPositions – attribute specifying whether term frequency and term positions should be calculated.
  • indexed – attribute specifying whether the field based on this type will keep their original values.
  • positionIncrementGap – attribute specifying how many position Lucene should skip.

It is worth remembering that in the default settings sortMissingLast and sortMissingFirst attributes Lucene will apply behavior of placing a document with blank field values at the beginning of the ascending sort, and at the end of the list of results for descending sorting.

One more options for simple types, but only those based on Trie*Field classes:

  • precisionStep – attribute specifying the number of bits of precision. The greater the number of bits, the faster the queries based on numerical ranges. This however, also increases the size of the index, as more values are indexed. Set attribute value to 0 to disable the functionality of indexing at various precisions.

An example of a simple type defined:

1 <fieldType name="string" class="solr.StrField" sortMissingLast="<em>true</em>" omitNorms="<em>true</em>"/>

Complex types

In addition to simple types, schema.xml file may include types consisting of a tokenizer and filters. Tokenizer is responsible for dividing the contents of the field in the tokens, while the filters are responsible for further token analysis. For example, the type that is responsible for dealing with the texts in Polish, would consist of a tokenizer in charge of the division of words based on whitespace, commas and periods. Filters for that type could be responsible for bringing generated tokens to lowercase, further division of tokens (for example on the basis of dashes), and then bringing tokens to the basic form.

Complex types, like simple types, have their name (name attribute) and the class which is responsible for implementation (class attribute). They can also be characterized by other attributes as described in the case of simple types (on the same basis). In addition, however, complex types can have a definition of tokenizer and filters to be used at the stage of indexing, and at the stage of query. As most of you know, for a given phase (indexing, or query) there can can be many filters defined but only one tokenizer. For example, just looks like a text type definition look like in the example provided with Solr:

01 <fieldType name="text" class="solr.TextField" positionIncrementGap="100"autoGeneratePhraseQueries="<em>true</em>">
02    <analyzer type="index">
03       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
04       <filter class="solr.StopFilterFactory" ignoreCase="<em>true</em>" words="stopwords.txt" enablePositionIncrements="<em>true</em>" />
05       <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"splitOnCaseChange="1"/>
06       <filter class="solr.LowerCaseFilterFactory"/>
07       <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
08       <filter class="solr.PorterStemFilterFactory"/>
09    </analyzer>
10    <analyzer type="query">
11       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
12       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"ignoreCase="<em>true</em>" expand="<em>true</em>"/>
13       <filter class="solr.StopFilterFactory" ignoreCase="<em>true</em>" words="stopwords.txt" enablePositionIncrements="<em>true</em>" />
14       <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"splitOnCaseChange="1"/>
15       <filter class="solr.LowerCaseFilterFactory"/>
16       <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
17       <filter class="solr.PorterStemFilterFactory"/>
18    </analyzer>
19 </fieldType>

It is worth noting that there is an additional attribute for the text field type:

  • autoGeneratePhraseQueries

This attribute is responsible for telling filters how to behave when dividing tokens. Some filters (such asWordDelimiterFilter) can divide tokens into a set of tokens. Setting the attribute to true (default value) will automatically generate phrase queries. This means that WordDelimiterFilter will divide the word “wi-fi” into two tokens “wi” and “fi”. With autoGeneratePhraseQueries set to true query sent to Lucene will look like "field:wi fi", while with set to false Lucene query will look like field:wi OR field:fi. However, please note, that this attribute only behaves well with tokenizers based on white spaces.

Returning to the type definition. As you can see, I gave an example which has two main sections:

1 <analyzer type="index">

and

1 <analyzer type="query">

The first section is responsible for the definition of the type, which will be used for indexing documents, the second section is responsible for the definition of type used for queries to fields based on this type. Note that if you want to use the same definitions for indexing and query phase, you can opt out of the two sections. Then our definition will look like this:

1 <fieldType name="text" class="solr.TextField" positionIncrementGap="100"autoGeneratePhraseQueries="<em>true</em>">
2    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
3    <filter class="solr.StopFilterFactory" ignoreCase="<em>true</em>" words="stopwords.txt" enablePositionIncrements="<em>true</em>" />
4    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"splitOnCaseChange="1"/>
5    <filter class="solr.LowerCaseFilterFactory"/>
6    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
7    <filter class="solr.PorterStemFilterFactory"/>
8 </fieldType>

As I mentioned in the definition of each complex type there is a tokenizer and a series of filters (though not necessarily). I will not describe each filter and tokenizer available in Solr. This information is available at the following address: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.

At the end I wanted to add an important thing. Starting from 1.4 Solr tokenizer does not need to be the first mechanism that deals with the analysis of the field. Solr 1.4 introduced new filters – CharFilters that operate on the field before tokenizer and transmit the result to the tokenizer. It is worth to know because it might come in useful.

Multi-dimensional types

At the end I left myself a little addition – a novelty in Solr 1.4 – multi-dimensional fields – fields consisting of a number of other fields. Generally speaking, the assumption of this type of field was simple – to store in Solr pairs of values, triples or more related data, such as georaphical point coordinates. In practice this is realized by means of dynamic fields, but let me not get into the implementation details. The sample type definition that will consist two fields:

1 <fieldType name="location" class="solr.PointType" dimension="2"subFieldSuffix="_d"/>

In addition to standard attributes: name and class there are two others:

  • dimension – the number of dimensions (used by the class attribute solr.PointType).
  • subFieldSuffix – suffix, which will be added to the dynamic fields created by that type. It is important to remember that the field based on the presented type will create three fields in the index – the actual field (for example named mylocation and two additional dynamic fields).

Field Definitions

Definitions of the fields is another section in the schema.xml file, the section, which in theory should be of interest to us the most during the design of Solr index. As a rule, we find here two kinds of field definitions:

  1. Static Fields
  2. Dynamic Fields

These fields are treated differently by the Solr. The first type of fields, are fields that are available under one name. Dynamic fields are fields that are available under many names – actually their name are a simple regular expression (name starting or ending with a ‘*’ sign). Please note that Solr first selects the static field, then the dynamic field. In addition, if the field name matches more than one definition, Solr will select a field with a longer name pattern.

Returning to the definition of the fields (both static and dynamic), they consist of the following attributes:

  • name – the name of the field (required attribute).
  • type – type of field, which is one of the pre-defined types (required attribute).
  • indexed – if a field is to be indexed (set to true, if you want to search or sort on this field).
  • stored – whether you want to store the original values (set to true, if we want to retrieve the original value of the field).
  • omitNorms – whether you want norms to be ignored (set to true for the fields for which You will apply the full-text search).
  • termVectors – set to true in the case when we want to keep so called term vectors. The default parameter value is false. Some features require setting this parameter to true (eg MoreLikeThis orFastVectorHighlighting).
  • termPositions – set to true, if You want to keep term positions with the term vector. Setting to true will cause the index to expand its size.
  • termOffsets – set to true, if You want to keep term offsets together with term vector. Setting to true will cause the index to expand its size.
  • default – the default value to be given to the field when the document was not given any value in this field.

The following examples of definitions of fields:

1 <field name="id" type="string" indexed="<em>true</em>" stored="<em>true</em>" required="<em>true</em>" />
2 <field name="includes" type="text" indexed="<em>true</em>" stored="<em>true</em>" termVectors="<em>true</em>" termPositions="<em>true</em>" termOffsets="<em>true</em>" />
3 <field name="timestamp" type="date" indexed="<em>true</em>" stored="<em>true</em>" default="NOW" multiValued="<em>false</em>"/>
4 <dynamicField name="*_i" type="int" indexed="<em>true</em>" stored="<em>true</em>"/>

And finally, additional information to remember. In addition to the attributes listed above in the fields definition, we can overwrite the attributes that have been defined for type (eg whether a field is to be multiValued – the above example for a field called timestamp). Sometimes, this functionality can be useful if you need a specific field whose type is slightly different from other types (as in the example – only multiValued attribute). Of course, keep in mind the limitations imposed on the individual attributes associated with types.

CopyField section

In short, this section is responsible for copying the contents of fields to other fields. We define the field which value should be copied, and the destination field. Please note that copying takes place before the field value is analyzed. Example copyField definition:

1 <copyField source="category" dest="text"/>

For the sake of accuracy, occurring attributes mean:

  • source – the source field,
  • dest – the destination field.

Additional definitions

1. Unique key definition

The definition of a unique key that makes possible to unambiguously identify the document. Defining a unique key is not necessary, but is recommended. Sample definition:

1 <uniqueKey>id</uniqueKey>

2. Default search field definition

The Section is responsible for defining a default search field, which Solr use in case You have not given any field. Sample definition:

1 <defaultSearchField>content</defaultSearchField>

3. Default logical operator definition

This section is responsible for the definition of default logical operator that will be used. Sample definition looks as follows:

1 <solrQueryParser defaultOperator="OR" />

Possible values are: OR and AND.

4. Defining similarity

Finally we define the similarity that we will use. It is rather a topic for another post, but you must know that if necessary You can change the default similarity (currently in Solr trunk there are already two classes of similarity). The sample definition is as follows:

1 <similarity class="pl.solr.similarity.CustomSimilarity" />

A few words at the end

Information presented above should give some insight on what schema.xml file is and what correspond to the different sections in this file. Soon I will try to write what You should avoid when designing the index.

Announcing the Windows 8 Editions

April 19, 2012 § Leave a comment

Today I would like to share information with you on the editions that will be available for “Windows 8” when it is released to market. We have talked about Windows 8 as Windows reimagined, from the chipset to the user experience. This also applies to the editions available – we have worked to make it easier for customers to know what edition will work best for them when they purchase a new Windows 8 PC or upgrade their existing PC.

Windows 8 has the flexibility you need – whether you’re on an x86/64 or a WOA PC. You can use a touch screen or a keyboard and mouse – and switch anytime. It’s beautiful, fast, and fluid design is perfect for a wide range of hardware. And you’ll love browsing through the Windows Store and downloading all the apps you want. And those apps can work together too so you can share photos, maps, contacts, links and whatever else you want faster and easier. All editions of Windows 8 offer a no-compromise experience.

First, Windows 8 is the official product name for the next x86/64 editions of Windows.

For PCs and tablets powered by x86 processors (both 32 and 64 bit), we will have two editions: Windows 8and Windows 8 Pro. For many consumers, Windows 8 will be the right choice. It will include all the features above plus an updated Windows ExplorerTask Managerbetter multi-monitor support and the ability to switch languages on the fly (more details on this feature can be found in this blog post),which was previously only available in Enterprise/Ultimate editions of Windows. For China and a small set of select emerging markets, we will offer a local language-only edition of Windows 8.

Windows 8 Pro is designed to help tech enthusiasts and business/technical professionals obtain a broader set of Windows 8 technologies. It includes all the features in Windows 8 plus features for encryption, virtualization, PC management and domain connectivity. Windows Media Center will be available as an economical “media pack” add-on to Windows 8 Pro. If you are an enthusiast or you want to use your PC in a business environment, you will want Windows 8 Pro.

Windows RT is the newest member of the Windows family – also known as Windows on ARM or WOA, as we’ve referred to it previouslyThis single edition will only be available pre-installed on PCs and tablets powered by ARM processors and will help enable new thin and lightweight form factors with impressive battery life. Windows RT will include touch-optimized desktop versions of the new Microsoft Word, Excel, PowerPoint, and OneNote. For new apps, the focus for Windows RT is development on the new Windows runtime, or WinRT, which we unveiled in September and forms the foundation of a new generation of cloud-enabled, touch-enabled, web-connected apps of all kinds.  For more details on WOA, we suggest reading this blog post which shares more detail on how we have been building Windows 8 to run on the ARM architecture.

The below chart breaks down key features by edition (this list should not be considered an exhaustive list of features):

Feature name

Windows 8

Windows 8 Pro

Windows RT

Upgrades from Windows 7 Starter, Home Basic, Home Premium

x

x

 

Upgrades from Windows 7 Professional, Ultimate

 

x

 

Start screen, Semantic Zoom, Live Tiles

x

x

x

Windows Store

x

x

x

Apps (Mail, Calendar, People, Messaging, Photos, SkyDrive, Reader, Music, Video)

x

x

x

Microsoft Office (Word, Excel, PowerPoint, OneNote)

   

x

Internet Explorer 10

x

x

x

Device encryption

   

x

Connected standby

x

x

x

Microsoft account

x

x

x

Desktop

x

x

x

Installation of x86/64 and desktop software

x

x

 

Updated Windows Explorer

x

x

x

Windows Defender

x

x

x

SmartScreen

x

x

x

Windows Update

x

x

x

Enhanced Task Manager

x

x

x

Switch languages on the fly (Language Packs)

x

x

x

Better multiple monitor support

x

x

x

Storage Spaces

x

x

 

Windows Media Player

x

x

 

Exchange ActiveSync

x

x

x

File history

x

x

x

ISO / VHD mount

x

x

x

Mobile broadband features

x

x

x

Picture password

x

x

x

Play To

x

x

x

Remote Desktop (client)

x

x

x

Reset and refresh your PC

x

x

x

Snap

x

x

x

Touch and Thumb keyboard

x

x

x

Trusted boot

x

x

x

VPN client

x

x

x

BitLocker and BitLocker To Go

 

x

 

Boot from VHD

 

x

 

Client Hyper-V

 

x

 

Domain Join

 

x

 

Encrypting File System

 

x

 

Group Policy

 

x

 

Remote Desktop (host)

 

x

 

In the coming months, we plan to share much more information about Windows 8, including details on pricing and limited-time programs and promotions that we will make available to customers. Today, you can check out a preview of Windows 8 for yourself (if you haven’t already done so!).

NOTE: As with previous versions of Windows, we will also have an edition of Windows 8 specifically for those enterprise customers with Software Assurance agreements. Windows 8 Enterprise includes all the features of Windows 8 Pro plus features for IT organization that enable PC management and deployment, advanced security, virtualization, new mobility scenarios, and much more. 

(Source: Windows Blog )

XML serialization using Generics

April 9, 2012 § Leave a comment

Serialize/deserialize your objects using generics. Customize settings like indentation, encoding, namespaces and others.

Table of contents

XML Serialization Overview

XML serialization is the process of converting an object into a XML string in order to persist it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.

Some good uses for XML serialization/deserialization are:

  • Storing user preferences in an object
  • Maintaining security information across pages and applications
  • Modification of XML documents without using the DOM
  • Passing an object from one application to another
  • Passing an object from one domain to another
  • Passing an object through a firewall as an XML string

A generic serializer class – XmlSerializer<T>

I’ve created a generic class to serialize/deserialize XML:

XmlSerializer

This class allows us to:

  • Serialize an object to a XML string
  • Deserialize an object from a XML string
  • Serialize an object to a XML file
  • Deserialize an object from a XML file

It’s also possible to customize some settings like indentation, encoding, namespaces and others (see examples below).

The scenario – User Shopping Cart

(Disclaimer: this is not intended to reflect a real-world shopping cart model.)

XML Serialization - Shopping Cart Model

Model source code:

namespace Serialization.Model
{
    [Serializable]
    [XmlRoot("shopping-cart")]
    public class ShoppingCart
    {
        [XmlElement("purchase-date", IsNullable=true)]
        public DateTime? PurchaseDate { get; set; }

        // {property}Specified
        public bool PurchaseDateSpecified
        {
            get { return PurchaseDate.HasValue; }
        }

        // for each subclass of ShoppingItem you need
        // to specify the correspondent XML element to generate
        [XmlArrayItem("cd", typeof(CD))]
        [XmlArrayItem("book", typeof(Book))]
        [XmlArrayItem("dvd", typeof(Dvd))]
        [XmlArray("items")]
        public List<ShoppingItem> Items { get; set; }

        [XmlIgnore]
        public double TotalPrice
        {
            get
            {
                double total = 0;

                foreach (ShoppingItem i in Items)
                    total += i.Price;

                return total;
            }
        }

        public ShoppingCart()
        {
            Items = new List();
        }
    }

    [Serializable]
    [XmlRoot("item")]
    public class ShoppingItem
    {
        [XmlAttribute("reference")]
        public string Reference { get; set; }

        [XmlAttribute("price")]
        public double Price { get; set; }

        public ShoppingItem()
        {
        }
    }

    [Serializable]
    [XmlRoot("book")]
    public class Book : ShoppingItem
    {
        [XmlElement("name")]
        public string Name { get; set; }

        [XmlElement("author")]
        public string Author { get; set; }

        [XmlElement("description")]
        public string Description { get; set; }

        public Book()
        {
        }
    }

    [Serializable]
    [XmlRoot("cd")]
    public class CD : ShoppingItem
    {
        [XmlElement("artist")]
        public string Artist  { get; set; }

        [XmlElement("name")]
        public string Name { get; set; }

        [XmlElement("genre")]
        public string Genre { get; set; }

        public CD()
        {
        }
    }

    [Serializable]
    [XmlRoot("dvd")]
    public class Dvd : ShoppingItem
    {
        [XmlElement("name")]
        public string Name { get; set; }

        [XmlElement("genre")]
        public string Genre { get; set; }

        public Dvd()
        {
        }
    }

    [Serializable]
    [XmlRoot("user")]
    public class User
    {
        [XmlAttribute("id")]
        public int Id { get; set; }

        [XmlAttribute("user-type")]
        public UserType Type { get; set; }

        [XmlElement("first-name")]
        public string FirstName { get; set; }

        [XmlElement("last-name")]
        public string LastName { get; set; }

        [XmlIgnore]
        public string FullName
        {
            get
            {
                if (string.IsNullOrEmpty(FirstName))
                    return LastName;

                if (string.IsNullOrEmpty(LastName))
                    return FirstName;

                return string.Format("{0} {1}", FirstName, LastName);
            }
        }

        [XmlElement("age")]
        public int? Age { get; set; }

        [XmlElement("email")]
        public string Email { get; set; }

        public bool AgeSpecified
        {
            get { return Age.HasValue; }
        }

        [XmlElement("address")]
        public Address UserAddress { get; set; }

        [XmlElement("delivery-address")]
        public Address DeliveryAddress { get; set; }

        [XmlElement("cart")]
        public ShoppingCart ShoppingCart { get; set; }

        public User()
        {
        }
    }

    [Serializable]
    [XmlRoot("address")]
    public class Address
    {
        [XmlElement("street")]
        public string Street { get; set; }

        [XmlElement("postal-code")]
        public string PostalCode { get; set; }

        [XmlElement("city")]
        public string City { get; set; }

        [XmlElement("country")]
        public string Country { get; set; }

        [XmlIgnore]
        public string FullAddress
        {
            get
            {
                return string.Format("{0}{1}{2} {3}{1}{4}",
                     Street, System.Environment.NewLine, PostalCode, City, Country);
            }
        }

        public Address()
        {
        }
    }

    [Flags]
    public enum UserType
    {
        [XmlEnum("0")]
        Basic = 0,

        [XmlEnum("1")]
        Premium = 1,

        [XmlEnum("2")]
        Platinum = 2
    }
}

Using the code

Create an user, serialize it into a string and recreate the object from the string

static void Main(string[] args)
{
    User user = CreateUser();

    // default serialization settings
    string xml = XmlSerializer<User>.Serialize(user);

    // get user from XML
    User user2 = XmlSerializer<User>.Deserialize(xml);
}

private static User CreateUser()
{
    User user = new User();
    user.Age = 20;
    user.Type = UserType.Platinum;
    user.FirstName = "Rui";
    user.LastName = "Jarimba";
    user.Email = "email@somewhere.com";

    user.UserAddress = new Address();
    user.UserAddress.Street = "my street 1";
    user.UserAddress.PostalCode = "1000-001";
    user.UserAddress.City = "Lisbon";
    user.UserAddress.Country = "Portugal";

    user.DeliveryAddress = new Address();
    user.DeliveryAddress.Street = "another street";
    user.DeliveryAddress.PostalCode = "1000-002";
    user.DeliveryAddress.City = "Lisbon";
    user.DeliveryAddress.Country = "Portugal";

    //
    // Shopping cart
    //
    user.ShoppingCart = new ShoppingCart();
    // u.ShoppingCart.PurchaseDate = DateTime.Now;

    Book book1 = new Book();
    book1.Name = "Jamie's Italy";
    book1.Price = 34.95;
    book1.Reference = "978-1401301958";
    book1.Author = "Jamie Oliver";
    book1.Description = "Italian food made by Jamie Oliver!";
    user.ShoppingCart.Items.Add(book1);

    Book book2 = new Book();
    book2.Name = "Ensaio Sobre a Cegueira";
    book2.Price = 59.95;
    book2.Reference = "B0042TL15I";
    book2.Author = "José Saramago";
    user.ShoppingCart.Items.Add(book2);

    CD cd = new CD();
    cd.Name = "The Blackening";
    cd.Artist = "Machine Head";
    cd.Genre = "Trash Metal";
    cd.Price = 15.0;
    cd.Reference = "B000N3ST9I";
    user.ShoppingCart.Items.Add(cd);

    Dvd dvd = new Dvd();
    dvd.Name = "The Lord of the Rings: The Return of the King";
    dvd.Price = 14.99;
    dvd.Reference = "B00005JKZY";
    dvd.Genre = "Action, Adventure, Drama ";
    user.ShoppingCart.Items.Add(dvd);

    return user;
}

Generated XML:

<?xml version="1.0" encoding="utf-8"?>
<user xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xmlns:xsd="http://www.w3.org/2001/XMLSchema" id="1" user-type="2">
    <first-name>Rui</first-name>
    <last-name>Jarimba</last-name>
    <age>30</age>
    <email>email@somewhere.com</email>
    <address>
        <street>my street 1</street>
        <postal-code>1000-001</postal-code>
        <city>Lisbon</city>
        <country>Portugal</country>
    </address>
    <delivery-address>
        <street>another street</street>
        <postal-code>1000-002</postal-code>
        <city>Lisbon</city>
        <country>Portugal</country>
    </delivery-address>
    <cart>
        <items>
            <book reference="978-1401301958" price="34.95">
                <name>Jamie's Italy</name>
                <author>Jamie Oliver</author>
                <description>Italian food made by Jamie Oliver!</description>
            </book>
            <book reference="B0042TL15I" price="59.95">
                <name>Ensaio Sobre a Cegueira</name>
                <author>José Saramago</author>
            </book>
            <cd reference="B000N3ST9I" price="15">
                <artist>Machine Head</artist>
                <name>The Blackening</name>
                <genre>Trash Metal</genre>
            </cd>
            <dvd reference="B00005JKZY" price="14.99">
                <name>The Lord of the Rings: The Return of the King</name>
                <genre>Action, Adventure, Drama </genre>
            </dvd>
        </items>
    </cart>
</user>

Using the XmlSerializer settings

You can control other settings like indentation, namespaces, encoding and others using the classes XmlWritterSettings, XmlReaderSettings, and XmlSerializerNamespaces.

Example 1: Remove XML indentation

User user = CreateUser();

XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = false;

string xml = XmlSerializer<User>.Serialize(user, settings);

// get user from XML
User user2 = XmlSerializer<User>.Deserialize(xml);

Output XML:

<?xml version="1.0" encoding="utf-8"?><user 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema" id="1" 
  user-type="2"><first-name>Rui</first-name>
<last-name>Jarimba</last-name><age>30</age><email>email@somewhere.com</email>
<address><street>my street 1</street><postal-code>1000-001</postal-code>
<city>Lisbon</city><country>Portugal</country></address>
<delivery-address><street>another street</street><postal-code>1000-002</postal-code>
<city>Lisbon</city><country>Portugal</country></delivery-address><cart>
<items><book reference="978-1401301958" price="34.95"><name>Jamie's Italy</name>
<author>Jamie Oliver</author><description>Italian food made by Jamie Oliver!</description>
</book><book reference="B0042TL15I" price="59.95"><name>Ensaio Sobre a Cegueira</name>
<author>José Saramago</author></book><cd reference="B000N3ST9I" price="15">
<artist>Machine Head</artist><name>The Blackening</name><genre>Trash Metal</genre>
</cd><dvd reference="B00005JKZY" price="14.99"><name>The Lord 
of the Rings: The Return of the King</name><genre>Action, Adventure, Drama </genre>
</dvd></items></cart></user>

Example 2: Remove namespaces

User user = CreateUser();

XmlSerializerNamespaces namespaces = new XmlSerializerNamespaces();
namespaces.Add("", "");

string xml = XmlSerializer<User>.Serialize(user, namespaces);

// get user from XML
User user2 = XmlSerializer<User>.Deserialize(xml);

Output XML:

<?xml version="1.0" encoding="utf-8"?>
<user id="1" user-type="2">
    <!-- Code omitted for brevity -->
</user>

Example 3: Add custom namespaces

User user = CreateUser();

XmlSerializerNamespaces namespaces = new XmlSerializerNamespaces();
namespaces.Add("n1", "http://mynamespace1.com");
namespaces.Add("n2", "http://mynamespace2.com");

string xml = XmlSerializer<User>.Serialize(user, namespaces);

// get user from XML
User user2 = XmlSerializer<User>.Deserialize(xml);

Output XML:

<?xml version="1.0" encoding="utf-8"?>
<user xmlns:n1="http://mynamespace1.com" 
          xmlns:n2="http://mynamespace2.com" id="1" user-type="2">
    <!-- Code omitted for brevity -->
</user>

Example 4: Specify encoding

User user = CreateUser();

XmlWriterSettings writterSettings = new XmlWriterSettings();
writterSettings.Encoding = Encoding.UTF32;

string xml = XmlSerializer<User>.Serialize(user, writterSettings);

// get user from XML
User user2 = XmlSerializer<User>.Deserialize(xml, Encoding.UTF32);

Example 5: Remove XML declaration

User user = CreateUser();

XmlWriterSettings writterSettings = new XmlWriterSettings();
writterSettings.OmitXmlDeclaration = true;

string xml = XmlSerializer<User>.Serialize(user, writterSettings);

XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.ConformanceLevel = ConformanceLevel.Fragment;
User user2 = XmlSerializer<User>.Deserialize(xml, readerSettings);
User user3 = XmlSerializer<User>.Deserialize(xml); // this works too

Output XML:

<user xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xmlns:xsd="http://www.w3.org/2001/XMLSchema" id="1" user-type="2">
    <!-- Code omitted for brevity -->
</user>

Example 6: File serialization

User user = CreateUser();
string filename = @"c:\dump\user.xml";

// default file serialization
XmlSerializer<User>.SerializeToFile(user, filename);

// try to get the object from the created file
User u3 = XmlSerializer<User>.DeserializeFromFile(filename);

//
// define some settings
//
XmlSerializerNamespaces namespaces = new XmlSerializerNamespaces();
namespaces.Add("", "");

XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;
settings.Indent = true;
settings.IndentChars = "\t";

XmlSerializer<User>.SerializeToFile(user, filename, namespaces, settings);
u3 = XmlSerializer<User>.DeserializeFromFile(filename);

References

Source : CodeProject.com

Where Am I?

You are currently viewing the archives for April, 2012 at Naik Vinay.