Thread: Offline Wiki Dumps [474-742]

Page 1 of 3 123 LastLast
Results 1 to 10 of 21
  1. #1 Offline Wiki Dumps [474-742] 
    Renown Programmer
    Greg's Avatar
    Join Date
    Jun 2010
    Posts
    1,136
    Thanks given
    233
    Thanks received
    798
    Discord
    View profile
    Rep Power
    1575


    Scraping the live wiki sites or using web.archive.org is no more.

    I contacted Fandom and they fixed the [Only registered and activated users can see links. ] which had been down for the past few years.
    Once extracted the data totals almost 490gb (221342 wiki pages, 0 images) so I wrote a script to filter page revisions by date and namespaces and ran it for some of the most common revisions
    Spoiler for code:

    Code:
    
    import java.io.File
    import java.io.FileNotFoundException
    import java.io.FileReader
    import java.io.FileWriter
    import java.time.LocalDate
    import java.time.format.DateTimeFormatter
    import java.util.*
    import java.util.concurrent.TimeUnit
    import javax.xml.stream.*
    import javax.xml.stream.events.XMLEvent
    
    /**
     * Takes full pages_full.xml containing all pages and history from Special:Statistics
     * Filters pages by namespace and revisions by date
     * Outputs xml
     * When output is zipped with bzip2 can be read with [WikiTaxi](https://www.yunqa.de/delphi/products/wikitaxi/index)
     */
    object RunescapeWikiPagesFullFilter {
    
        private class Parser(val date: LocalDate, dir: String, val namespacesToDump: Set<String>, val message: Boolean) {
    
            val file = File(dir, "runescapewiki-latest-pages-articles-${date.toString().replace(" ", "-")}.xml")
    
            init {
                if (!file.exists()) {
                    file.createNewFile()
                }
            }
    
            val pageEvents = mutableListOf<XMLEvent>()
            val revisionEvents = mutableListOf<XMLEvent>()
            val mostRecentRevisionEvents = mutableListOf<XMLEvent>()
            val eventWriter = XMLOutputFactory.newInstance().createXMLEventWriter(FileWriter(file))!!
            val eventFactory = XMLEventFactory.newInstance()!!
            val redirectPattern = "#(?:REDIRECT|redirect) \\[\\[(.*)]]".toRegex()
    
            var validRevision = false
            var revision = 0
            var type: String? = null
            var pages = 0
            var skipPage = false
            var title: String = ""
    
            val namespaceIndex = mutableMapOf<String, Int>()
            val namespaces = mutableSetOf<Int>()
            var namespaceId: String? = null
    
            fun reset() {
                revision = 0
                type = null
                title = ""
                validRevision = false
                pageEvents.clear()
                revisionEvents.clear()
                mostRecentRevisionEvents.clear()
            }
    
            fun parse(event: XMLEvent) {
                var event = event
                if (skipPage) {
                    if (event.eventType == XMLStreamConstants.END_ELEMENT && event.asEndElement().name.localPart == "page") {
                        skipPage = false
                    }
                    return
                }
    
                when (event.eventType) {
                    XMLStreamConstants.START_ELEMENT -> {
                        type = event.asStartElement().name.localPart
                        when (type) {
                            "namespace" -> namespaceId = event.asStartElement().attributes.next().value
                            "revision" -> revision = 1
                        }
                    }
                    XMLStreamConstants.CHARACTERS -> {
                        when (type) {
                            "namespace" -> {
                                val id = namespaceId?.toInt()
                                if (id != null) {
                                    namespaceIndex[event.asCharacters().toString().trim()] = id
                                    namespaces.add(id)
                                    namespaceId = null
                                }
                            }
                            "timestamp" -> {
                                val timestamp = LocalDate.parse(event.asCharacters().toString(), inputFormatter)
                                validRevision = timestamp.isBefore(date)
                                type = null
                            }
                            "ns" -> {
                                val id = event.asCharacters().toString().toIntOrNull()
                                if (id != null) {
                                    if (namespaces.contains(id)) {
                                        skipPage = true
                                        reset()
                                        return
                                    } else if (message) {
                                        println(title)
                                    }
                                }
                            }
                            "title" -> {
                                val chars = event.asCharacters()
                                if (!chars.isWhiteSpace) {
                                    title = chars.toString()
                                    if (title.startsWith("Template:Signatures/") || title.startsWith("Template:Signature/") || title.startsWith("Template:Userbox/")) {
                                        skipPage = true
                                        reset()
                                        return
                                    }
                                }
                            }
                        }
                    }
                }
    
                when (revision) {
                    2 -> revision = 0
                    1 -> revisionEvents.add(event)
                    else -> pageEvents.add(event)
                }
    
                when (event.eventType) {
                    XMLStreamConstants.END_ELEMENT -> {
                        when (event.asEndElement().name.localPart) {
                            "namespaces" -> {
                                namespaces.removeAll(namespacesToDump.map { namespaceIndex[it] })
                            }
                            "revision" -> {
                                if (validRevision) {
                                    mostRecentRevisionEvents.clear()
                                    mostRecentRevisionEvents.addAll(revisionEvents)
                                }
                                revisionEvents.clear()
                                revision = 2
                            }
                            "siteinfo" -> {
                                pageEvents.forEach {
                                    eventWriter.add(it)
                                }
                            }
                            "page" -> {
                                if (!ignoreEmptyPages || mostRecentRevisionEvents.isNotEmpty()) {
                                    val pageClose = pageEvents.removeAt(pageEvents.lastIndex)
                                    var priorRedirect: String? = null
                                    pageEvents.forEach { event ->
                                        eventWriter.add(event)
                                        if (event.eventType == XMLStreamConstants.START_ELEMENT && event.asStartElement().name.localPart == "redirect") {
                                            priorRedirect = event.asStartElement().attributes.next().value
                                        }
                                    }
    
                                    var priorEventType: String? = null
                                    mostRecentRevisionEvents.forEach {
                                        var event = it
                                        // Replace redirect revision text with title from page redirect tag (fixes redirects for rs3 renames)
                                        if (priorRedirect != null && event.eventType == XMLStreamConstants.CHARACTERS && priorEventType == "text") {
                                            val text = event.asCharacters().toString()
                                            if (text.contains("#redirect", true)) {
                                                val result = redirectPattern.matchEntire(text)?.groupValues?.last()
                                                if (result != null && result != priorRedirect && !result.startsWith("$priorRedirect#")) {
                                                    event = eventFactory.createCharacters(text.replace(result, priorRedirect!!))
                                                }
                                            }
                                        }
                                        priorEventType = if (event.eventType == XMLStreamConstants.START_ELEMENT) event.asStartElement().name.localPart else null
    
                                        eventWriter.add(event)
                                    }
                                    eventWriter.add(pageClose)
                                    eventWriter.flush()
                                    pages++
                                }
                                reset()
                            }
                            "mediawiki" -> eventWriter.add(event)
                        }
                    }
                }
            }
    
            fun finish() {
                println("Total pages: $pages for $date")
                eventWriter.close()
            }
    
            companion object {
                private val inputFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss'Z'", Locale.ENGLISH)!!
            }
        }
    
        private const val ignoreEmptyPages = true
    
        @JvmStatic
        fun main(args: Array<String>) {
            try {
                System.setProperty("entityExpansionLimit", "0")
                System.setProperty("totalEntitySizeLimit", "0")
                System.setProperty("jdk.xml.totalEntitySizeLimit", "0")
                val namespaces = setOf(
                    "",
                    "Template",
                    "Category",
                    "Update",
                    "Exchange",
                    "Charm",
                    "Calculator",
                    "Map",
                    "Transcript"
                )
                val directory = "${System.getProperty("user.home")}\\Downloads\\runescape_pages_full\\"
                val dates = mapOf(
                    474 to LocalDate.of(2007, 11, 12),
                    530 to LocalDate.of(2009, 2, 9),
                    550 to LocalDate.of(2009, 7, 7),
                    562 to LocalDate.of(2009, 9, 18),
                    592 to LocalDate.of(2010, 3, 2),
                    614 to LocalDate.of(2010, 8, 24),
                    634 to LocalDate.of(2011, 1, 31),
                    667 to LocalDate.of(2011, 10, 16),
                    718 to LocalDate.of(2012, 6, 13),
                    742 to LocalDate.of(2012, 11, 19)
                )
                val parsers = dates.map { (revision, date) -> Parser(date, directory, namespaces, false) }
                val factory = XMLInputFactory.newInstance()
                val eventReader = factory.createXMLEventReader(FileReader("${directory}runescape_pages_full.xml"))
                val start = System.currentTimeMillis()
                while (eventReader.hasNext()) {
                    val event = eventReader.nextEvent()
                    parsers.forEach { it.parse(event) }
                }
                parsers.forEach {
                    it.finish()
                }
                println("Took ${TimeUnit.MILLISECONDS.toMinutes(System.currentTimeMillis() - start)} mins")
            } catch (e: FileNotFoundException) {
                e.printStackTrace()
            } catch (e: XMLStreamException) {
                e.printStackTrace()
            }
        }
    }


    How to use

    The dumps are ready to be viewed offline using [Only registered and activated users can see links. ] or you can extract them and parse for data you want using the standard java xml dom.
    Note pages still use the current rs3 names (e.g Dragon claw rather than claws) however redirects will get you to the right page.

    Example of some of the data you could scrap from these:
    • Item/NPC/Object details
    • Combat bonuses
    • Drop tables
    • Charm drop rates
    • Grand exchange prices
    • Calculations
    • Skill data


    And so much more...

    Downloads


    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]

    Dates are the day before the update after the revision wanted.

    [Only registered and activated users can see links. ]
    Last edited by Greg; 11-01-2020 at 09:17 PM.
    Reply With Quote  
     


  2. #2  
    Celestial - Founder
    Classic's Avatar
    Join Date
    Nov 2014
    Posts
    235
    Thanks given
    119
    Thanks received
    47
    Discord
    View profile
    Rep Power
    27
    Great, thank you!
    [Only registered and activated users can see links. ]
    [Only registered and activated users can see links. ]
    Reply With Quote  
     

  3. #3  
    Registered Member
    JayArrowz's Avatar
    Join Date
    Sep 2008
    Posts
    67
    Thanks given
    60
    Thanks received
    64
    Discord
    View profile
    Rep Power
    392
    Good release thanks!
    Reply With Quote  
     

  4. #4  
    Respected Member


    Luke132's Avatar
    Join Date
    Dec 2007
    Age
    32
    Posts
    12,549
    Thanks given
    177
    Thanks received
    5,789
    Discord
    View profile
    Rep Power
    5000
    Great work cheers chief.

    Reply With Quote  
     

  5. #5  
    Cywir RS3 - Coming soon!

    Scythe's Avatar
    Join Date
    Apr 2019
    Posts
    307
    Thanks given
    108
    Thanks received
    123
    Rep Power
    928
    Nice contribution

    Reply With Quote  
     

  6. #6  
    touched like seafood
    Tyluur's Avatar
    Join Date
    Jun 2010
    Age
    23
    Posts
    4,838
    Thanks given
    1,676
    Thanks received
    1,567
    Discord
    View profile
    Rep Power
    1390
    Another one? amazin
    [Only registered and activated users can see links. ] | [Only registered and activated users can see links. ] | [Only registered and activated users can see links. ] (official dog of rune-server)
    Quote Originally Posted by blakeman8192 View Post
    Keep trying. Quitting is the only true failure.
    Reply With Quote  
     

  7. #7  
    ⚔️ Battle614 - Superiority ⚔️

    Battle614's Avatar
    Join Date
    Aug 2020
    Posts
    192
    Thanks given
    47
    Thanks received
    310
    Discord
    View profile
    Rep Power
    308
    Thanks for this

    [Only registered and activated users can see links. ]

    [Only registered and activated users can see links. ]
    Reply With Quote  
     

  8. #8  
    Registered Member
    Join Date
    Nov 2020
    Posts
    10
    Thanks given
    0
    Thanks received
    7
    Discord
    View profile
    Rep Power
    11
    Should I use winRar or winzip

    I downloaded Java/JDK.. now I need server and client files???
    Reply With Quote  
     


  9. #9  
    zen2
    Corey's Avatar
    Join Date
    Feb 2012
    Age
    23
    Posts
    1,388
    Thanks given
    1,090
    Thanks received
    1,451
    Discord
    View profile
    Rep Power
    5000
    Quote Originally Posted by Ezvy View Post
    Should I use winRar or winzip

    I downloaded Java/JDK.. now I need server and client files???
    this has nothing to do with running a server and client
    [Only registered and activated users can see links. ]
    Reply With Quote  
     

  10. #10  
    Registered Member

    Join Date
    Sep 2020
    Posts
    20
    Thanks given
    33
    Thanks received
    23
    Discord
    View profile
    Rep Power
    173
    Hey,

    Thanks for the release I was looking for something like this. How would the community feel about having an online web-application to view the data as opposed to having to download it? If people deem it to be beneficial or novel, would anyone be interested in me pursuing it? It also would not be a very hard job I'm guessing although I have never heard of wikitaxi before and what the structure of the data is. I can always write some program to manipulate it however the community or myself desire. I would prefer to have the data in maybe JSON format and a nice UI to display and search through the data.
    Reply With Quote  
     

Page 1 of 3 123 LastLast

Thread Information
Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)


User Tag List

Similar Threads

  1. [RS3] Runescape Wiki Itembox Dump
    By Greg in forum Configuration
    Replies: 8
    Last Post: 01-18-2021, 03:29 PM
  2. Need full unedited runescape 508 cache
    By dabbbb in forum Help
    Replies: 4
    Last Post: 04-28-2010, 04:47 PM
  3. Full 459 Runescape Website
    By Roboyto in forum Show-off
    Replies: 63
    Last Post: 04-21-2010, 08:38 PM
  4. [591]Full Grand Exchange Item Dump
    By Discardedx2 in forum Configuration
    Replies: 10
    Last Post: 02-21-2010, 08:47 PM
  5. Adding the FULL 100% Runescape Fire Cape
    By newservermaker in forum Models
    Replies: 4
    Last Post: 02-05-2009, 06:27 PM
Tags for this Thread

View Tag Cloud

Posting Permissions
  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •