Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing load of generating sitemap #258

Open
Jogai opened this issue Dec 18, 2013 · 17 comments
Open

Reducing load of generating sitemap #258

Jogai opened this issue Dec 18, 2013 · 17 comments

Comments

@Jogai
Copy link

Jogai commented Dec 18, 2013

We generate a sitemap with 15000 urls. Most of them have localized urls. It takes a while to enumerate them all from the database and add the corresponding localizations. Both our webservers run 4 instances of the site. When the sitemap must be generated the servers completely lock up. We use the breadcrumb feature, so I think the servers try to generate the sitemap in order to serve the page including the breadcrumbs.
For our regular caching we use appfabric. I didnt implement a DI to configure the sitemap cache to use appfabric yet. I think my problem is more related to the generation of the sitemap.
Is there a way around this performance problem for me?

@NightOwl888
Copy link
Collaborator

If I understand you correctly, you are referring to the /sitemap.xml endpoint that generates an XML structure for search engines, correct?

Currently, there are no built-in options, but there are plans to make some. The code that generates the XML is not that complex, and if you are interested in either modifying it for yourself and/or to make a contribution it could be done with some effort.

The primary bottleneck is that it uses an in memory collection to generate the XML no matter how large it is. What we really need is an option to specify an external folder where the results of a FileStream can be stored and then read back to the response stream. This would prevent the need to utilize the same amount of memory as the size of the data structure itself since it would never be loaded into memory all at once.

Of course, this means that we need to have 3 things:

  1. A location to store the file.
  2. Assurance that permissions are available to write to that location.
  3. A way to plugin alternate streams for various cloud-based, file, and memory locations.

While we could just use a hard-coded location, I think it would be better to specify a folder via configuration and just default to using an in-memory scheme if it is not provided.

Ideally, we would also be able to specify that the file is not regenerated for some period (maybe defaulting to 12 hours). This is because if the result exceeds 35,000 nodes, it will need to be generated once for the index and again for each XML page as it is accessed because it is broken up into multiple parts for the search engine. It makes little sense to generate the structure more than once each time the search engine robot requests the set of XML files, and I think 12 hours should be enough time for the search engine to read all of them.

Furthermore, it would make sense to make the file writing interface pluggable (via DI) so it can be written to blob storage and other locations, but default to writing to memory and include an implementation that writes to the Windows file system (and possibly other implementations as well).

My suggestion is to break the streaming into two stages - the first one would serialize and stream the entire structure to a single location (through an injected stream service) and the second one would use the output stream of that service (whether it is in memory or on disk) to generate each one of the XML structures and compress them accordingly. Ideally, the primary stream would contain the raw data elements (in this case, just the information that is necessary to generate each of the url XML elements) and the output stream would be generated on the fly based on this data. The visibility/accessibility of nodes should be processed before doing this storage.

The key is to make a binary formatter and serialize all of the objects as they are created so you free up the memory immediately instead of holding onto it until the entire operation is complete.

System.Runtime.Serialization.Formatters.Binary.BinaryFormatter formatter = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();

// Use an injected streamFactory to create a stream for the memory or disk storage system. 
// The streamFactory will get configuration (such as folder location) through its constructor.
using (Stream stream = new streamFactory.Create())
{
    foreach (var item in flattenedHierarchy)
    {
        // Convert the data into a structure that requires the bare minimum of space
        ISiteMapXmlItem newItem = GetRequriedDataOnly(item);

        formatter.Serialize(stream, newItem);
    }
}

The same goes for when reading the data back from the stream - you must ensure you only read one element at a time and write the result to the output stream immediately, allowing the memory it occupies to be freed. There must not be any IEnumerable<T> lying around in memory or it won't scale.

using (var inputStream = streamService.GetStoredStream())
{
    using (var outputStream = RetrieveOutputStream(context);
    {
            // Write the opening XML elements

        do 
        {
            var item = (ISiteMapXmlItem)formatter.Deserialize(inputStream);
            outputStream.Write(item.GetUrlXmlElement());
        } while (stream.Position < stream.Length - 1);

            // Write the closing XML elements

        outputStream.Flush();
    }
}

If you decide to make a contribution, please base your changes on the "dev" branch and submit the pull request back to the "dev" branch. Also, feel free to post any follow up questions here.

@Jogai
Copy link
Author

Jogai commented Dec 19, 2013

Thanks for your extensive answer.

If I understand you correctly, you are referring to the /sitemap.xml endpoint that generates an XML structure for search engines, correct?

A bit of both I think. I was under the impression that the first request after iisreset triggers the generation of the whole sitemap structure in order to show the proper breadcrumbs.

Furthermore, it would make sense to make the file writing interface pluggable (via DI) so it can be written to blob storage and other locations, but default to writing to memory and include an implementation that writes to the Windows file system (and possibly other implementations as well).

Yes, and that would make it possible to store it all in appfabric cache right? But why via DI? The Output Cache is easier to extend, you just have to create a derived class and configure it in the web.config. This is probably another question, but why cant MvcSiteMapProvider make use of the same mechanism? I guess under the hood there is some built in DI in the works anyway, but it makes it really easy to extend it.

If you decide to make a contribution..

I'd like to but I don't know if I have the skill and the time..

@NightOwl888
Copy link
Collaborator

A bit of both I think. I was under the impression that the first request after iisreset triggers the generation of the whole sitemap structure in order to show the proper breadcrumbs.

Actually, both the SiteMap structure and /sitemap.xml endpoint currently have some scalability issues/limits because of the way they use memory. You are correct that the SiteMap is cached at application startup and shared between users and the default is to expire the cache every 5 minutes. This is the unfortunate result of basing its original design on Microsoft's ASP.NET sitemap functionality which also stored the entire cache in memory at once.

15,000 nodes shouldn't be too much of an issue for the SiteMap unless your server is starved for memory. And if it is, you can implement System.Runtime.Caching.ObjectCache and provide a file caching mechanism (or there is an open source file cache). See this article for info about extending the cache.

If you do that and it is enough to fix the SiteMap performance problem, then the only part you will need to worry about is the /sitemap.xml endpoint (the XML for search engines) because of the issues I mentioned before.

Note that if you do not need to index some of your pages in search engines, you can work around this problem by using preservedRouteParameters to force multiple URLs to match a single node and then using a custom visibility provider and [SiteMapTitleAttribute] to fix the display of your menu and breadcrumbs. See How to Make MvcSiteMapProvider Remember a User's Position for working examples and documentation about how to do that.

Yes, and that would make it possible to store it all in appfabric cache right?

Correct.

But why via DI? The Output Cache is easier to extend, you just have to create a derived class and configure it in the web.config.

You can think of DI like a super-web.config file. When following a few principles of pattern design and constructor injection, making an application using DI means you don't have to carefully plan out every single extension point and painstakingly (and unwittingly) tie it to the web.config file. When making an application using DI, virtually any interface or abstract class turns into an extension point that can be replaced with another implementation. It puts the application in charge of what implementation (or implementations) of ISomeService to plug in where, which is much more powerful and less effort than what you can do using web.config. It also makes writing automated tests for your code easier.

Several problems appear when you tie it to the web.config file. One of them is that the implementation becomes more complex - a lot of code needs to be written to read the values from the config file. Also, the author of the code might have made the multiplicity wrong for your particular scenario - what if you have a multi-tenant application and each one needs a separate implementation? Suddenly that configurable web.config seems quite limiting. Finally, if you have to put settings in a web.config for your test you have gone from a unit test to a more complicated integration test because you cannot isolate that piece to test independenly.

To get up to speed with DI, I recommend reading Dependency Injection in .NET. There aren't many books I have read that I can honestly say changed the way I program - this is one of them.

The Output Cache is easier to extend, you just have to create a derived class and configure it in the web.config. This is probably another question, but why cant MvcSiteMapProvider make use of the same mechanism?

By default MvcSiteMapProvider uses the System.Runtime.Caching.MemoryCache, a subclass of System.Runtime.Caching.ObjectCache. This is the new low-level caching extension point available since .NET 4.0 which is not tied in any way to ASP.NET. As previously mentioned through the use of DI, this automatically becomes an extension point is where you can plug another implementation of ObjectCache.

There are also several other uses of caching throughout MvcSiteMapProvider, but currently page output caching is not one of them and until now hasn't been something I have considered. I did some checking and it doesn't seem like there is a way to use it on a templated HTML helper, though - it looks like it is only an option for controller actions (which MvcSiteMapProvider only uses on the /sitemap.xml endpoint, but nowhere else).

I think for /sitemap.xml it would make more sense to use a low-level cache and store only the count of the number of visible nodes in it. From that number, the rest of the logic can be driven to determine how many pages are required to be generated and whether or not an index page applies, and also get a 12 hour timeout. You really don't gain much from changing the caching around because the main problem is that it loads the entire data structure into RAM before it starts to build the output stream every time it is built.

I'd like to but I don't know if I have the skill and the time..

No problem. But unfortunately, that means you will either have to wait until I have some time to work on it or else build your own implementation that generates the sitemaps XML in a way that scales to 15,000 nodes.

@NightOwl888 NightOwl888 mentioned this issue Dec 20, 2013
Closed
@NightOwl888
Copy link
Collaborator

FYI - I am working on another project that also has its own sitemap and I did a test with output caching. As it turns out, when writing to a stream the output caching functionality doesn't work. When I tried enabling it, it hit the database on every page request. Of course, my test was with ASP.NET, not MVC, but I suspect the result will be the same because the output caching is based on buffered rather than streamed output.

So, the caching needs to be done at a lower level. And to both make it scale and adjust to using a sitemap index in the case where the node count exceeds 35,000, it will need to be cached using a stream (or probably a set of streams - one for each 35,000 nodes) that writes to disk.

@NightOwl888
Copy link
Collaborator

I did some experimenting and was able to get MvcSiteMapProvider to work with adequate performance (aside from a delayed initial load) with 15,000 nodes. I am using a 5 year old laptop with 8GB of RAM.

However, I was able to reproduce the serious performance drain when doubling that to 30,000 nodes. The bottleneck is reading the SiteMap, not actually creating the nodes. Basically after reaching a certain point the OS starts using disk swap space instead of RAM and things really slow down.

Due to a recent contribution though, I discovered a bug that was causing each node to take up about double the memory that it should have. This has now been fixed in v4.4.7. After the fix, I was also able to get adequate performance with 30,000 nodes, although building the /sitemap.xml took some CPU for several seconds to make happen.

If you upgrade and you find that you are still having problems, there are some things you can try to reduce the load.

  1. Be absolutely sure that you need 15,000 nodes. I doubt that you have that many nodes in your menu so this really only comes down to a question of what appears in the SiteMapPath vs what you need in the /sitemaps.xml endpoint. Any page that you don't need indexed by search engines is probably something you can use preservedRouteParameters to force a single node to match multiple pages, which can significantly reduce the number of nodes. Administration pages are a good example of these. See How to Make MvcSiteMapProvider Remember a User's Position to see how that can be done.
  2. Add more RAM to the server. This is definitely the simplest option and with enough RAM you can certainly handle 15,000 nodes.
  3. If you are putting strings in the Description, ImageUrl, or in custom attributes, consider localizing them (even if you don't need localization). If the strings that you are configuring are much longer than the explicit localization strings that are required to call them up, you will see a drop in memory because the localized strings are not cached in the SiteMap. If you are only using Title, this will probably make you consume more memory though - you must be aware of the length of the strings vs the length of the explicit localization strings.
  4. Make sure you have a reasonably long time-out so your SiteMap isn't built during peak periods.
  5. Consider combining the first approach I mentioned (to reduce nodes to the minimum by using preservedRouteParameters) with a home brew SiteMaps XML implementation that doesn't interact with the SiteMap object. This will allow you to use very little memory by not loading so many nodes at one time (you would potentially be able to remove any nodes that do not appear in the Menu), but some engineering must be done to generate the SiteMaps XML.
  6. Extend the cache to use file caching rather than memory caching. I haven't tested this option yet, so there may or may not be a scalability benefit.

@Jogai
Copy link
Author

Jogai commented Jan 7, 2014

Thank you for your effort and following up.

  1. Well, our test set seems to be a bit bigger than production. We've 2500 products live. We serve every page in 4 languages, each with their own url (/EN/) so currently we have a sitemap with around 11000 urls. The pages that are protected by login arent included (yet). We arent using the menu helper much, we've a couple menus consisting of 3 to 6 items.
  2. There is 10GB of ram, but the site runs on 6 instances. We do see one worker process consume about 50% of the ram after a reset of IIS.
  3. Description and title consists of localized strings.
  4. MvcSiteMapProvider_CacheDuration is set to 120 in the configuration.
  5. I want to use this project because of the ease of use...
  6. I'd rather extend it to use Appfabric caching since we're using that in our project anyways.

I need to make time to learn and contribute something...

If anyone's interested, I have a few providers, and they look like this:

using System.Collections.Generic;
using Client.Project.Facade;
using Client.Project.Models;
using Client.Project.Site.Controllers;
using MvcSiteMapProvider;
using Category = Client.Project.Models.View.Category;

namespace Client.Project.Site.Extensions
{
    public class CategoryDynamicNodeProvider : DynamicNodeProviderBase
    {
        public override IEnumerable<DynamicNode> GetDynamicNodeCollection(ISiteMapNode node)
        {
            List<DynamicNode> dynamicNodes = new List<DynamicNode>();

            //Get the currently active cultures
            List<CultureInfo> languages = new Localization().GetCultures();

            foreach (CultureInfo cultureInfo in languages)
            {
                //Switch to culture based on display name, eg: EN
                Language.Switch(cultureInfo.DisplayName);

                //Initialize a new node
                DynamicNode categoriesNode = new DynamicNode
                    {
                        Key = "Categories_" + cultureInfo.DisplayName,
                        ParentKey = "language" + cultureInfo.DisplayName, //Same as the key in the node provider for the language
                        Title = new BaseController().GetString("home_categories"), //Get the translated title
                        Controller = "Categories",
                        Action = "Index"
                    };
                categoriesNode.RouteValues.Add("language", cultureInfo.DisplayName);

                dynamicNodes.Add(categoriesNode);

                List<Category> categories = new Categories().GetCategoryTree();

                foreach (Category category in categories)
                {
                    DynamicNode categoryNode = new DynamicNode
                        {
                            Key = "Category_" + category.Id + "_" + cultureInfo.DisplayName,
                            ParentKey = categoriesNode.Key,
                            Title = category.Name, //Translated name of category
                            Controller = "Categories",
                            Action = "Category"
                        };
                    categoryNode.RouteValues.Add("id", category.Id);
                    categoryNode.RouteValues.Add("description", category.Name.ToSeoEncoded()); //Encode the name 
                    categoryNode.RouteValues.Add("language", cultureInfo.DisplayName);

                    dynamicNodes.Add(categoryNode);

                    if (category.Children != null)
                    {
                        foreach (Category subcategory in category.Children)
                        {
                            DynamicNode subcategoryNode = new DynamicNode
                            {
                                Key = "SubCategory_" + subcategory.Id + "_" + cultureInfo.DisplayName,
                                ParentKey = categoryNode.Key,
                                Title = subcategory.Name,
                                Description = subcategory.Name,
                                Controller = "Categories",
                                Action = "SubCategory"
                            };
                            subcategoryNode.RouteValues.Add("language", cultureInfo.DisplayName);
                            subcategoryNode.RouteValues.Add("catid", category.Id);
                            subcategoryNode.RouteValues.Add("name", category.Name.ToSeoEncoded());
                            subcategoryNode.RouteValues.Add("subcatid", subcategory.Id);
                            subcategoryNode.RouteValues.Add("subName", subcategory.Name.ToSeoEncoded());

                            dynamicNodes.Add(subcategoryNode);
                        }
                    }
                }
            }
            return dynamicNodes;
        }
    }
}

@NightOwl888
Copy link
Collaborator

Another thing I just thought of to help you reduce the memory footprint (assuming you are already using the latest version of MvcSiteMapProvider already that was released yesterday) is to disable caching of the resolved URLs.

<appSettings>
    <add key="MvcSiteMapProvider_EnableResolvedUrlCaching" value="false"/>
</appSettings>

This setting causes the SiteMap to resolve the URLs at the point the SiteMap is built if true and store them in the shared cache. If false, the URLs are resolved on demand for each user, so they are not stored in the cache.

Doing the math, if your URLs are an average of 30 bytes long and you have 15,000 of them, disabling this setting should let go of about 440 MB of RAM. It should also decrease the load time of the SiteMap, but it remains to be seen by how much.

On the other hand, it will probably slow down the loading of the pages slightly because each SiteMap node that is visible on the page will have to resolve the URL on each request.

You can also disable MvcSiteMapProvider localization, since you are not using it. This may also help to speed things up a little.

<appSettings>
    <add key="MvcSiteMapProvider_EnableLocalization" value="false"/>
</appSettings>

Question: Are you using the exact same node structure for each locale, or do they differ?

@RadoslavGatev
Copy link

Is there any progress on this problem? I have a website with 600k pages which need to be indexed.. What to do?

@NightOwl888
Copy link
Collaborator

Unfortunately, not. There is a limitation of around 10,000-20,000 nodes in the sitemap with the current design because it loads all nodes into memory and keeps them there.

My suggestion is to use the preservedRouteParameters to force a match on the breadcrumb trail and add just enough nodes to create a high-level menu. Using this technique, you will get a breadcrumb trail for every page. So that takes care of the Menu and SiteMapPath at least. Depending on your requirements, this might also take care of a user sitemap page using the SiteMap HTML helper.

Then you can roll your own SiteMaps XML endpoint that uses paging from your data source so you only pull up the data that you need into memory. The code that generates the endpoint is not very complex and would not be that difficult to duplicate. The SiteMaps XML protocol dictates that no XML file should contain more than 50,000 entries and be no larger than 10 MB. To be on the safe side, we use 35,000 entries as the maximum page size. The tricky part will be to implement the paging mechanism if your pages are not all stored in the same table. You might think about using a Union query to stack the tables first before implementing the paging, but you need to be careful not to load all of that data into memory at once.

As for the URLs, depending on how you store the information in your data source (if you store controller and action name) you might be able to take what you need from the SiteMapNodeUrlResolver to create them or implement ISiteMapNode and use the class directly to generate them. Basically, the SiteMapNodeUrlResover is a thin wrapper around the MVC UrlHelper class. But we use a fake HTTP context object to eliminate the possibility that values from the current context be inserted into the URL (see #213). On the other hand, if you store the URL slugs in your data source you probably don't need to do this - some table joining and string concatenation might be all you need to generate the URLs.

This post might be helpful with generating your own sitemaps XML, but you will definitely need to take into consideration the fact that you don't want to load all of your nodes at once into memory.

I suggest that once you have figured out what is required that you create a blog post of your own to demonstrate to others how this can be achieved.

@NightOwl888
Copy link
Collaborator

FYI - I have created a new issue #345 to discuss requirements to address adding an extension point that can be used to provide URLs that are not registered in the SiteMap object in the sitemaps XML. Your feedback would be appreciated.

@RadoslavGatev
Copy link

Thanks for the help. I read #345 and i look forward to see the new features.

I have started working on your advice from the previous comment. I force matched my breadcrumbs and populated the information up to the root node. This works perfect.
Now i'm trying to implement my own XmlSiteMapResult but i don't know how to inject it because it isn't deriving from any specific interface. Could you please give me any advice?

@NightOwl888
Copy link
Collaborator

The XmlSiteMapResult is injected into a plain old MVC controller called XmlSiteMapController. My suggestion is to completely replace this controller with your own (including your own route registration) and make a clean break from this design, utilizing only the bits you need - that would be the easiest approach (and what I am planning to do in #345). Ideally the ActionResult derived class will only be a shell to make it interact with MVC, delegating its work out to other services.

Another option is that you could implement IXmlSiteMapResultFactory, but unfortunately it has a lot of Create overloads to make. You could copy much of the logic from the existing XmlSiteMapControllerFactory to satisfy these overloads and then inject your own services into that factory (which you can pass into your custom XmlSiteMapResult class constructor in the main Create method(s)). The factory can then be injected into XmlSiteMapController directly, providing it an instance of your custom implementation.

You will either need to implement ISiteMapNode for your custom URL entries so you can utilize the built-in SiteMapNodeUrlResolver, or build your own simplified URL resolver and interface to replace it. Most of what is going on in that class is to prevent MVC from inadvertently adding "ambient values" (query string parameters and route values from the current request) into the URL. Since the request for your sitemaps page is likely to be simple (/sitemaps.xml), you probably don't need all of that extra code; just utilizing the services IUrlHelper and possibly IUrlPath will suffice.

IMHO, following the SRP applies - make a separate class (service) for each responsibility (paging, XML format, XML index creation, URL entry providers, compression, URL resolving, etc). For optimal performance, you should pass a System.IO.Stream (abstract class) as the parameter to populate from service to service, not an interface or collection. If necessary, serialize and stream individual objects (not just XML text) from one service to the next. If pulling the URL entry data from a database, you should also ensure to use streamed and paged data access to ensure only 1 record is in memory at a time and a you have a small memory footprint. Use an abstract factory to create and properly dispose of the streams as shown in DI Friendly Framework.

@RadoslavGatev
Copy link

Thanks for your advices. I've implemented it in that way and it works perfectly. But I have a little problem - on some certain sitemap pages (sitemap-X) i get the following Exception :

System.NullReferenceException: Object reference not set to an instance of an object.
at ASP._Page_Views_Shared__Layout_cshtml.Execute() in e:\inetpub\vhosts\site.com\httpdocs\Views\Shared_Layout.cshtml:line 136
at System.Web.WebPages.WebPageBase.ExecutePageHierarchy()
at System.Web.Mvc.WebViewPage.ExecutePageHierarchy()
at System.Web.WebPages.WebPageBase.ExecutePageHierarchy(WebPageContext pageContext, TextWriter writer, WebPageRenderingBase startPage)
at System.Web.WebPages.WebPageBase.<>c__DisplayClass7.
b__6(TextWriter writer)
at System.Web.WebPages.HelperResult.WriteTo(TextWriter writer)
at System.Web.WebPages.WebPageBase.Write(HelperResult result)
at System.Web.WebPages.WebPageBase.RenderSurrounding(String partialViewName, Action1 body) at System.Web.WebPages.WebPageBase.PopContext() at System.Web.WebPages.WebPageBase.ExecutePageHierarchy(WebPageContext pageContext, TextWriter writer, WebPageRenderingBase startPage) at System.Web.Mvc.RazorView.RenderView(ViewContext viewContext, TextWriter writer, Object instance) at System.Web.Mvc.BuildManagerCompiledView.Render(ViewContext viewContext, TextWriter writer) at System.Web.Mvc.ViewResultBase.ExecuteResult(ControllerContext context) at Site.Filters.CustomHandleErrorAttribute.OnException(ExceptionContext filterContext) at System.Web.Mvc.ControllerActionInvoker.InvokeExceptionFilters(ControllerContext controllerContext, IList1 filters, Exception exception)
at System.Web.Mvc.Async.AsyncControllerActionInvoker.<>c__DisplayClass25.
b__22(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResult1.End() at System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeAction(IAsyncResult asyncResult) at System.Web.Mvc.Controller.<>c__DisplayClass1d.<beginexecutecore> b__18(IAsyncResult asyncResult) at System.Web.Mvc.Async.AsyncResultWrapper.<>c__DisplayClass4.<makevoiddelegate> b__3(IAsyncResult ar) at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResult1.End()
at System.Web.Mvc.Controller.EndExecuteCore(IAsyncResult asyncResult)
at System.Web.Mvc.Async.AsyncResultWrapper.<>c__DisplayClass4.
b__3(IAsyncResult ar)
at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResult1.End() at System.Web.Mvc.Controller.EndExecute(IAsyncResult asyncResult) at System.Web.Mvc.Controller.System.Web.Mvc.Async.IAsyncController.EndExecute(IAsyncResult asyncResult) at System.Web.Mvc.MvcHandler.<>c__DisplayClass8.<beginprocessrequest> b__3(IAsyncResult asyncResult) at System.Web.Mvc.Async.AsyncResultWrapper.<>c__DisplayClass4.<makevoiddelegate> b__3(IAsyncResult ar) at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResult1.End()
at System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult)
at System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result)
at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Do you have any suppositions?

@NightOwl888
Copy link
Collaborator

Without seeing your code, I can only guess. But it looks like you are getting the results from a View instead of the results from an XmlWriter injected into your OutputStream. The error message is a result of the configured layout page trying to inject itself into that view.

In order to prevent this from happening, make sure you are returning an ActionResult derived object that only writes XML, never return a View from your action method. There is a simple example here, although I would recommend using an XmlWriter to write the output instead of a serializer as was done in that post.

I now have a working prototype that allows providers to supply URLs 1 at a time as well as record counts and will automatically page across the providers regardless of how many URLs each one uses. The implementation is incomplete, but there is enough there to see it working. Most of the new code is in the Xml/Sitemaps directory and subdirectories.

Here is some sample code on how to wire it up. Note that the API is guaranteed to change between now and the final release, but you can borrow pieces of this to get your implementation running if you need to.

The Controller

public class SitemapController : Controller
{
    public ActionResult Index(int page = 0)
    {
        return new SitemapXml(page);
    }
}

The ActionResult

This wires up 2 URL entry providers to demonstrate how the automatic paging will work. We override the default page size of 35,000 and make it return just 4 records per page.

using System;
using System.Web;
using System.Web.Mvc;
using MvcSiteMapProvider;
using MvcSiteMapProvider.IO;
using MvcSiteMapProvider.Web.Mvc;
using MvcSiteMapProvider.Xml.Sitemaps;
using MvcSiteMapProvider.Xml.Sitemaps.Paging;
using MvcSiteMapProvider.Xml.Sitemaps.Specialized;

public class SitemapXml
        : ActionResult
{
    public SitemapXml(int page)
    {
        this.page = page;
    }
    private readonly int page;

    public override void ExecuteResult(ControllerContext context)
    {
        // Output content type
        context.HttpContext.Response.ContentType = "text/xml";

        // Wire up our services

        var urlEntryProviders = new IUrlEntryProvider[] {
                new UrlEntryProvider1(),
                new UrlEntryProvider2()
            };

        var mvcContextFactory = new MvcContextFactory();

        var httpStreamFactory = new CompressibleHttpResponseStreamFactory(mvcContextFactory);
        var urlEntryProviderPagingInstructionFactory = new UrlEntryProviderPagingInstructionFactory();
        var sitemapsPagingStrategy = new AutomaticSitemapsPagingStrategy(urlEntryProviders, urlEntryProviderPagingInstructionFactory);

        sitemapsPagingStrategy.MaximumPageSize = 4;

        var urlEntryHelperFactory = new UrlEntryHelperFactory();

        var specializedContentXmlWriterFactoryStrategy = new SpecializedContentXmlWriterFactoryStrategy(new ISpecializedContentXmlWriterFactory[] {
                new ImageContentXmlWriterFactory(),
                new VideoContentXmlWriterFactory()
            });

        var sitemapsXmlWriterFactory = new SitemapsXmlWriterFactory(specializedContentXmlWriterFactoryStrategy);
        var sitemapsIndexXmlWriterFactory = new SitemapsIndexXmlWriterFactory();

        var sitemapsService = new SitemapsService(sitemapsPagingStrategy, httpStreamFactory, urlEntryHelperFactory, sitemapsXmlWriterFactory, sitemapsIndexXmlWriterFactory);

        sitemapsService.Execute(this.page);

    }
}

The Providers

For a demo we are just wiring up 10 hard coded entries in memory with no specialized content. The idea here is that we will typically create a LINQ query in both the GetTotalRecordCount() and GetEntries() methods that pulls the information from the database. The query sent to the database is using paging to get exactly the right records for the current view, and the records are then fed one at a time to the SitemapsService which writes them immediately to the output stream.

using MvcSiteMapProvider;
using MvcSiteMapProvider.Xml.Sitemaps;
using MvcSiteMapProvider.Xml.Sitemaps.Specialized;

public class UrlEntryProvider1
    : IUrlEntryProvider
{
    public int GetTotalRecordCount()
    {
        return 10;
    }

    public void GetEntries(IUrlEntryHelper helper)
    {
        var specializedContent = new List<ISpecializedContent>();
        var entries = new List<IUrlEntry>
        {
            new UrlEntry("http://www.somewhere.com/", specializedContent),
            new UrlEntry("http://www.somewhere.com/someotherpage.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/oops.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/hello.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/goodbye.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/contact.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/about.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/search.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/testing.html", specializedContent),
            new UrlEntry("http://www.somewhere.com/lastpage.html", specializedContent),
        };

        var results = entries.Skip(helper.Skip).Take(helper.Take);

        foreach (var url in results)
        {
            helper.AddUrlEntry(url);
        }

    }
}

public class UrlEntryProvider2
    : IUrlEntryProvider
{
    public int GetTotalRecordCount()
    {
        return 10;
    }

    public void GetEntries(IUrlEntryHelper helper)
    {
        var specializedContent = new List<ISpecializedContent>();
        var entries = new List<IUrlEntry>
        {
            new UrlEntry("http://www.nowhere.com/", specializedContent),
            new UrlEntry("http://www.nowhere.com/someotherpage.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/oops.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/hello.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/goodbye.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/contact.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/about.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/search.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/testing.html", specializedContent),
            new UrlEntry("http://www.nowhere.com/lastpage.html", specializedContent),
        };

        var results = entries.Skip(helper.Skip).Take(helper.Take);

        foreach (var url in results)
        {
            helper.AddUrlEntry(url);
        }

    }
}

@RadoslavGatev
Copy link

Actually I'm not using Views anywhere in the sitemap xml generation. I just got your old XmlSiteMapResult class and extended it to use database. I made some views in the database that are returning the url properies of all pages and i'm using XmlWriter.I load only one url at a time.

It is throwing this exception only on a particular sitemap pages - sitemap-1.xml, sitemap-7.xml, sitemap-17.xml. The occurence of the problem is fixed on this pages. The other pages are working well. Here is the code that is almost the same as yours.

XmlSiteMapController.cs

public class XmlSiteMapController : Controller
{
    public XmlSiteMapController(IUowData data,
        ISiteMapLoader siteMapLoader,
        IUrlPath urlPath,
        ICultureContextFactory cultureContextFactory)
    {
        if (siteMapLoader == null)
            throw new ArgumentNullException("siteMapLoader");
        if (urlPath == null)
            throw new ArgumentNullException("urlPath");
        if (cultureContextFactory == null)
            throw new ArgumentNullException("cultureContextFactory");

        this.siteMapLoader = siteMapLoader;
        this.urlPath = urlPath;
        this.cultureContextFactory = cultureContextFactory;
        this.data = data;
    }

    protected readonly ISiteMapLoader siteMapLoader;
    protected readonly IUrlPath urlPath;
    protected readonly ICultureContextFactory cultureContextFactory;
    protected readonly IUowData data;

    public ActionResult Index(int page = 0)
    {
        return new XmlSiteMapResult(
          page,
          this.DefaultRootNode,
          this.DefaultSiteMapCacheKeys,
          this.DefaultBaseUrl,
          this.DefaultSiteMapUrlTemplate,
          this.siteMapLoader,
          this.urlPath,
          this.cultureContextFactory,
          this.data);
    }

    //... the same properties as in your XmlSitemapController
}

XmlSiteMapResults.cs

public class XmlSiteMapResult
    : ActionResult
{
    public XmlSiteMapResult(
        int page,
        ISiteMapNode rootNode,
        IEnumerable<string> siteMapCacheKeys,
        string baseUrl,
        string siteMapUrlTemplate,
        ISiteMapLoader siteMapLoader,
        IUrlPath urlPath,
        ICultureContextFactory cultureContextFactory,
        IUowData data)
    {
        if (siteMapLoader == null)
            throw new ArgumentNullException("siteMapLoader");
        if (urlPath == null)
            throw new ArgumentNullException("urlPath");
        if (cultureContextFactory == null)
            throw new ArgumentNullException("cultureContextFactory");

        this.Ns = "http://www.sitemaps.org/schemas/sitemap/0.9";
        this.Page = page;
        this.RootNode = rootNode;
        this.SiteMapCacheKeys = siteMapCacheKeys;
        this.BaseUrl = baseUrl;
        this.SiteMapUrlTemplate = siteMapUrlTemplate;
        this.siteMapLoader = siteMapLoader;
        this.urlPath = urlPath;
        this.cultureContextFactory = cultureContextFactory;
        this.db = data;
    }

    protected readonly ISiteMapLoader siteMapLoader;
    protected readonly IUrlPath urlPath;
    protected readonly ICultureContextFactory cultureContextFactory;
    protected readonly HashSet<string> duplicateUrlCheck = new HashSet<string>();
    protected readonly IUowData db = null;

    //.. some other properties

    protected virtual void ExecuteSitemapIndexResult(ControllerContext context)
    {
        // Count the number of pages
        double numPages = Math.Ceiling((double)this.allUrlsCount / MaxNumberOfLinksPerFile);

        // Output content type
        context.HttpContext.Response.ContentType = "text/xml";

        // Generate sitemap index
        var sitemapIndex = new XElement(Ns + "sitemapindex");
        sitemapIndex.Add(GenerateSiteMapIndexElements(Convert.ToInt32(numPages), this.BaseUrl, SiteMapUrlTemplate).ToArray());

        // Generate sitemap
        var xmlSiteMap = new XDocument(
            new XDeclaration("1.0", "utf-8", "true"),
            sitemapIndex);

        // Write XML
        using (Stream outputStream = RetrieveOutputStream(context))
        {
            using (var writer = XmlWriter.Create(outputStream))
            {
                xmlSiteMap.WriteTo(writer);
            }
            outputStream.Flush();
        }
    }

    protected virtual void ExecuteSitemapResult(ControllerContext context, IEnumerable<ISiteMapNode> flattenedHierarchy, int page)
    {
        // Output content type
        context.HttpContext.Response.ContentType = "text/xml";

        // Write XML
        using (Stream outputStream = RetrieveOutputStream(context))
        {
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Encoding = Encoding.UTF8;
            settings.Indent = true;
            using (var writer = XmlWriter.Create(outputStream, settings))
            {
                writer.WriteStartDocument(true);
                writer.WriteStartElement("urlset", Ns.NamespaceName);

                if (page == 1)
                {
                    var localUrls = GenerateUrlElements(context, flattenedHierarchy, BaseUrl);
                    foreach (var url in localUrls)
                    {
                        url.WriteTo(writer);
                    }
                }

                var currentUrlsCommand = this.db.SitemapUrlList.BuildGetRecordsCommand((page - 1) * MaxNumberOfLinksPerFile, MaxNumberOfLinksPerFile);

                try
                {
                    using (SqlDataReader dr = currentUrlsCommand.ExecuteReader())
                    {
                        while (dr.Read())
                        {
                            var siteMapNode = buildSiteMapNode(context,
                                dr["Type"].ToString(), long.Parse(dr["Id"].ToString()),
                                dr["MusicRecordUri"].ToString(), dr["ArtistUri"].ToString());
                            var xmlUrlElement = GenerateUrlElement(siteMapNode);
                            xmlUrlElement.WriteTo(writer);
                        }
                    }
                }
                finally
                {
                    this.db.SitemapUrlList.ReleaseCommand(currentUrlsCommand);
                }

                writer.WriteEndElement();
                writer.WriteEndDocument();
            }
            outputStream.Flush();

        }
    }

    public override void ExecuteResult(ControllerContext context)
    {
        var flattenedHierarchy = new HashSet<ISiteMapNode>();

        // Flatten link hierarchy
        if (SiteMapCacheKeys.Any())
        {
            foreach (var key in SiteMapCacheKeys)
            {
                var siteMap = siteMapLoader.GetSiteMap(key);
                if (siteMap == null)
                {
                    // throw new UnknownSiteMapException(Resources.Messages.UnknownSiteMap);
                }
                this.RootNode = siteMap.RootNode;
                foreach (var item in FlattenHierarchy(this.RootNode, context, siteMap.VisibilityAffectsDescendants))
                {
                    flattenedHierarchy.Add(item);
                }
            }
        }
        else
        {
            foreach (var item in FlattenHierarchy(this.RootNode, context, this.RootNode.SiteMap.VisibilityAffectsDescendants))
            {
                flattenedHierarchy.Add(item);
            }
        }

        this.flattenedHierarchyCount = flattenedHierarchy.LongCount();
        this.urlsFromDatabaseCount = this.db.SitemapUrlList.GetRecordsCount();


        // Determine type of sitemap to generate: sitemap index file or sitemap file
        if (this.allUrlsCount > MaxNumberOfLinksPerFile && Page == 0)
        {
            // Sitemap index file
            ExecuteSitemapIndexResult(context);
        }
        else if (this.allUrlsCount > MaxNumberOfLinksPerFile && Page > 0)
        {
            // Sitemap file for links of page X
            ExecuteSitemapResult(context, flattenedHierarchy, Page);
        }
        else
        {
            // Sitemap file for all links
            ExecuteSitemapResult(context, flattenedHierarchy, 1);
        }
    }

    //.. the same old methods..
}

@NightOwl888
Copy link
Collaborator

I haven't spotted the problem in your code, but the error message clearly indicates it is trying to return a View and attach the Layout.cshtml page to the View.

I suspect this has to do with the way your routes are registered. If these particular URLs are matching a different route in your application than the intended one, they may be routed to a different part of your application than this XmlSiteMapController.

As an experiment, remove all of the registration of routes that deal with the XmlSiteMapController and move them into your RouteConfig.cs file. Make sure that the one in the MvcSiteMapProviderConfig.cs file is removed as shown below.

internal class MvcSiteMapProviderConfig
{
    public static void Register(IDependencyInjectionContainer container)
    {
        // Setup global sitemap loader
        MvcSiteMapProvider.SiteMaps.Loader = container.GetInstance<ISiteMapLoader>();

        // Check all configured .sitemap files to ensure they follow the XSD for MvcSiteMapProvider
        var validator = container.GetInstance<ISiteMapXmlValidator>();
        validator.ValidateXml(HostingEnvironment.MapPath("~/Mvc.sitemap"));

        // Register the Sitemaps routes for search engines
        // XmlSiteMapController.RegisterRoutes(RouteTable.Routes); <-- Remove this line
    }
}

Also remove any registration code that you have added to the startup code for your new controller. Then add the route registration directly to your RouteConfig.cs (or where ever you have your routes registered for your application).

    public class RouteConfig
    {
        public static void RegisterRoutes(RouteCollection routes)
        {
            routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

            routes.MapRoute(
                name: "SitemapDefault",
                url: "sitemap.xml",
                defaults: new { controller = "XmlSiteMap", action = "Index", page = 0 }
            );

            routes.MapRoute(
                name: "SitemapPage",
                url: "sitemap-{page}.xml",
                defaults: new { controller = "XmlSiteMap", action = "Index", page = 0 }
            );

            // Your application's other routes should be here

            routes.MapRoute(
                name: "Default",
                url: "{controller}/{action}/{id}",
                defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional }
            );
        }
    }

As long as you add these 2 routes before any other routes in your application, that should rule out whether your existing routes are interfering with these specific URLs. The XmlSiteMapController's route registration tries to add the routes just above the catch-all route, which may not be the right place for your application. You may have to order them manually in your configuration.

You might also be getting a naming conflict with the existing XmlSiteMapController. If changing the routes doesn't work, you might try changing the name to something else (both in the class and the routes).

@RadoslavGatev
Copy link

Yes, I've already done all the things you said. I've already deleted RegisterRoutes method of the controller and moved the sitemap routes registration to RoutesConfig.cs. I'm sure that my controller is having been called - I tried to return a string with the Content() method of the controller and it's working as expected. I also renamed the controller name especially when i posted the code to be more clear.
It's strange. I'll go further to investigate on this problem. There is the sitemap if you want to see it - Improsong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants