Jump to content

Forum Upgraded


rich
 Share

Recommended Posts

At first glance, this looks to be an issue with character encoding. The old table uses Latin encoding and the new uses UTF-8, and I noticed some dodgy characters in some of the new posts that may have been line breaks in the old...
I'll have a play later and see if I can get anything useful to happen.

Link to comment
Share on other sites

On 16/02/2016 at 11:30 PM, rich said:

Feel free to see what you can do with this - I will need a SQL script (or php script is ok) back from anyone who wants to take this on, as obviously sending me the SQL back is useless.

Because the upgrade process was destructive, I've written a script that will just replace <pre> blocks in the new version with <pre> blocks from the old version.  Please excuse the php, haven't written that language in 15+ years.

https://gist.github.com/staff0rd/56ae30ae26888377ef4e

Notes;

  • The first sql statement is longer that it needs to be because originally I was counting the occurences of <pre> blocks per post at the SQL level.
  • The script will exit the moment it sees any post where the new version of the post does not have an exact match of <pre> blocks to the old version.  The first of which occurs in the dataset on this post due to the author editing the post after original backup is made.  This then implies, that the script should not actually be run against any post edited after the backup was produced as it does not check the contents of <pre> blocks and could revert edits.  As such, the initial SQL SELECT should be updated to include a where clause that limits the output to posts created/edited prior to the original backup.
Link to comment
Share on other sites

So I've had a dig into the data and it seems to me that there simply arent any line breaks in the code blocks - in either the old or the new versions of the data.
There may be a caveat here: The dump contains both the new and the old tables in a single file. That file is necessarily saved as UTF-8 to accomodate the new table, however the old table is set to latin1_swedish_ci. It's not infeasible that this messes with the characters in the old table - however, I think that's a red herring.

In the old table, code blocks are marked up with a class of '_prettyXprint', vs the standard 'prettyprint' in the new table. I scoured the internetz for reference to that class, but came up blank.
I'm wondering if perhaps the old class was used by the previous version of the forum software, to perhaps run through a modified version of PrettyPrint that somehow included 'beautification' - though again, that's a long shot!

In any case, the situation as it stands is that I can't find any line breaks to work with... However, I've tried running a few sample posts through a javascript beautifier and it seems capable of re-inserting line breaks at sensible points.
I may be able to write a script that would loop through all the posts, pick out code blocks and run them through a server side JS beautifier, saving the result back into the table... I could then either provide a dump of the data for reimport, a script containing a butt-load of 'UPDATE' statements, or I could give access to my own database server and you could write a script to join the tables and update that way. Would any of this be acceptible @rich?

It's worth noting, I'm a ColdFusion developer... all the server side beautifiers seem to be written in NodeJS or Python. I'm sure I'll be able to get something working, but someone with some NodeFu or PythonPower may be able to get this sorted with considerably less pissing about.

 


 

Link to comment
Share on other sites

@GaryS how are you checking for new lines?  There are definitely new lines in the old tables that have been stripped from the new tables...  

Check pid 91887.

If you are able to run the php script I posted with that ID, it will print to the console the old post with linebreaks, the new post without, and the fix.

 

 

Link to comment
Share on other sites

Hmm, yes... you're quite right! It seems that there are new lines in some of the posts, but not in others...
To be clear, there are line breaks in all the posts, but not all the code blocks. Take a look at pid 96 for instance.

I'd still advocate running each post through a server-side beautifier - firstly it'll put line breaks in where there aren't any, and secondly it should have a good go at indenting.

There's still the problem of comments... the beautifiers have no way of knowing where a comment ends and code begins (in the case where line breaks have not survived)
It may be worth running through the script a few times with some string matching to catch obvious cases - such as comments containing the characters: 'function(', etc.

Link to comment
Share on other sites

I see.  pid 1016 in that same thread also supports your findings.  Perhaps @rich updated multiple times during the forum's lifetime and earlier posts were stripped prior to the backup received.  My script does not fix these earlier ones (it has no affect on them), only the later ones where the backup retains the new lines.

Link to comment
Share on other sites

Are code blocks supposed to look so weird? This is what every code block that I've encountered on this site looks like, and it's really hard to read. I'm on Chrome Version 48.0.2564.116 (64-bit) on OSX El Capitan.

code-block.png

Source

Edited by moue
Added browser version.
Link to comment
Share on other sites

 

On 3/4/2016 at 8:51 PM, moue said:

Are code blocks supposed to look so weird? This is what every code block that I've encountered on this site looks like, and it's really hard to read. I'm on Chrome Version 48.0.2564.116 (64-bit) on OSX El Capitan.

code-block.png

Source

I'm on the same platform, most code blocks look fine, there are a few like that but I just assumed that was user error copying and pasting from some system that mucked with line-endings and/or tabs/spaces yada yada yada

Link to comment
Share on other sites

  • 5 weeks later...
  • 1 month later...

I don't want to sound annoying, but is the newlines being stripped issue getting fixed? Seems like every thread I'm trying to read has this issue, making all the code examples so hard to read. This site seems like it'd be such a great resource if the code examples weren't like that.

Link to comment
Share on other sites

  • 2 weeks later...

When you make a new post and insert a code snippet, the default language is set to <not set>.

Is it possible to make the default selection to "JavaScript"?

This forum is mostly aimed at HTML5 development, and Phaser in particular, so that would make more sense.

Link to comment
Share on other sites

  • 2 months later...
  • rich unpinned this topic

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...